<div dir="ltr"><div><div>Hi,<br><br></div>I'm Kevin and I'm a 3rd year computer science student from College of engineering Trivandrum. I accidentally posted my initial inquiries to the discussion list of SMC and was thus delayed. I'm interested in improving the machine learning capabilities of Varnam. I'm a native speaker of malayalam and I believe this can be considered as an added advantage. I have built varnam on my machine using the instructions provided at gitorious and was attempting to read the source. As directed by Navaneeth K N in a previous thread, I began by reading learn.c.<br>

<br><br></div><div>However, I quickly ran into doubts. Though I think spending a few more hours reading the code carefully would reveal the answers to me, I'd feel more comfortable if someone can validate my doubts : <br>

<br></div><div>1) Token : A token is an indivisible word. A token is the basic building block. 'tokens' is an object (instance? I mean the non-OOP equivalent of an object) of the type varray. 'tokens' contain all the possible patterns of a token? For example, മലയാളം മലയാളത്തിന്റെ മലയാളത്തിൽ മലയാള would all go under the same varray instance 'tokens'?. And each word ( for eg മലയാളം ) would occupy a slot at tokens->memory I suppose. Am I right in this regard?<br>

<br></div><div>2) I see the data type 'v_' frequently used. However,I could not find its definition! I missed it, of course. Running ctrl+f on a few source files did not turn up the definitions. So I thought I would simply ask here! I would be really grateful if you can tell me where it is defined and why it is defined (what it does)<br>

<br>regards,<br><br></div><div>Kevin Martin Jose<br></div><div><br><br></div><div><br></div><br></div>