[Student-projects] GSoC Varnam
Kevin Martin
youcancallmekevin at gmail.com
Thu Feb 27 12:18:26 PST 2014
Hi,
I'm Kevin and I'm a 3rd year computer science student from College of
engineering Trivandrum. I accidentally posted my initial inquiries to the
discussion list of SMC and was thus delayed. I'm interested in improving
the machine learning capabilities of Varnam. I'm a native speaker of
malayalam and I believe this can be considered as an added advantage. I
have built varnam on my machine using the instructions provided at
gitorious and was attempting to read the source. As directed by Navaneeth K
N in a previous thread, I began by reading learn.c.
However, I quickly ran into doubts. Though I think spending a few more
hours reading the code carefully would reveal the answers to me, I'd feel
more comfortable if someone can validate my doubts :
1) Token : A token is an indivisible word. A token is the basic building
block. 'tokens' is an object (instance? I mean the non-OOP equivalent of an
object) of the type varray. 'tokens' contain all the possible patterns of a
token? For example, മലയാളം മലയാളത്തിന്റെ മലയാളത്തിൽ മലയാള would all go
under the same varray instance 'tokens'?. And each word ( for eg മലയാളം )
would occupy a slot at tokens->memory I suppose. Am I right in this regard?
2) I see the data type 'v_' frequently used. However,I could not find its
definition! I missed it, of course. Running ctrl+f on a few source files
did not turn up the definitions. So I thought I would simply ask here! I
would be really grateful if you can tell me where it is defined and why it
is defined (what it does)
regards,
Kevin Martin Jose
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.smc.org.in/pipermail/student-projects-smc.org.in/attachments/20140228/67797ed5/attachment.htm>
More information about the Student-projects
mailing list