[Student-projects] GSoC Varnam

Kevin Martin youcancallmekevin at gmail.com
Thu Feb 27 12:18:26 PST 2014


Hi,

I'm Kevin and I'm a 3rd year computer science student from College of
engineering Trivandrum. I accidentally posted my initial inquiries to the
discussion list of SMC and was thus delayed. I'm interested in improving
the machine learning capabilities of Varnam. I'm a native speaker of
malayalam and I believe this can be considered as an added advantage. I
have built varnam on my machine using the instructions provided at
gitorious and was attempting to read the source. As directed by Navaneeth K
N in a previous thread, I began by reading learn.c.


However, I quickly ran into doubts. Though I think spending a few more
hours reading the code carefully would reveal the answers to me, I'd feel
more comfortable if someone can validate my doubts :

1) Token : A token is an indivisible word. A token is the basic building
block. 'tokens' is an object (instance? I mean the non-OOP equivalent of an
object) of the type varray. 'tokens' contain all the possible patterns of a
token? For example, മലയാളം മലയാളത്തിന്റെ മലയാളത്തിൽ മലയാള would all go
under the same varray instance 'tokens'?. And each word ( for eg മലയാളം )
would occupy a slot at tokens->memory I suppose. Am I right in this regard?

2) I see the data type 'v_' frequently used. However,I could not find its
definition! I missed it, of course. Running ctrl+f on a few source files
did not turn up the definitions. So I thought I would simply ask here! I
would be really grateful if you can tell me where it is defined and why it
is defined (what it does)

regards,

Kevin Martin Jose
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.smc.org.in/pipermail/student-projects-smc.org.in/attachments/20140228/67797ed5/attachment-0002.htm>


More information about the Student-projects mailing list