[Student-projects] Varnam can now stem

aboobacker sidheeque mk aboobackervyd at gmail.com
Sun Jun 29 11:10:24 PDT 2014


On Sun, Jun 29, 2014 at 8:52 PM, Kevin Martin <youcancallmekevin at gmail.com>
wrote:

>
>
>
> On Sun, Jun 29, 2014 at 6:53 PM, Vasudev Kamath <kamathvasudev at gmail.com>
> wrote:
>
>>
>> Off topic not related to this discussion.
>>
>> aboobacker sidheeque mk <aboobackervyd at gmail.com> writes:
>>
>> > ഇന്നലെ നമ്മള്‍ ചാറ്റില്‍ ഡിസ്കസ് ചെയ്തതതാണ് , മെയ്ലിങ്ങ് ലിസ്റ്റില്‍
>> കൂടി കൊടുക്കാം എന്നു വച്ചു
>> > :-)
>>
>> Can you please translate this?.. I would suggest you restrain from
>> writing comments or replies in Malayalam, there are mentors on this list
>> who don't understand Malayalam.
>>
>
>> >
>> > Take two similar words ചിരിക്കുക and ഇരിക്കുക , if you stemmed this ,
>> > output will be ചിര and ഇര , but past tense of these words are ചിരിച്ചു ,
>> > ഇരുന്നു respectively . Then how to use this stem for prediction ??
>> ുന്നു is
>> > not suitable for ചിര and ിച്ചു is not suitable for ഇര . In Malayalam
>> > verb alone have ~ 30 different suffix patterns (or paradigms)
>> >
>>
> I thought about what you said yesterday. Strictly speaking, the goal of
> the stemmer is not to find the past tense. But it is true that if
> ചിരിക്കുക stems to ചിര then it wouldn't benefit varnam at all.
>
>>  > Similar case with noun :
>> > തിരുവനന്തപുരം -> തിരുവനന്തപുരത്ത്
>> > മരം->മരത്തില്‍ (not മരത്തില്‍ )
>>
>>
> I did not understand the example about മരം->മരത്തില്‍ . I do not think any
> stemmer can stem nouns properly, as the nouns can have foreign roots.
> When testing with this[1] article, the stemmer stems with an accuracy of
> 89%. However, this is a result of not stemming when stemming is not
> necessary rather than stemming properly where stemming is necessary. But I
> noted that malayali nouns are usually (not always) stemmed correctly.
> eg : കോഴിക്കോട്ടെ : കോഴിക്കോട്
>
my question was not limited to stemmer :-) .  you have to use this stem in
the prediction list by adding suffixes , at that time you can't add ില്‍ to
തിരുവനന്തപുരം and ത്ത് to മരം , my questions was how you gonna handle this
diversity :-) .

>
> [1] ml.wikipedia.org/wiki/തച്ചോളി_ഒതേനൻo
>
>>
>> --l
>>
>> Vasudev Kamath
>> http://copyninja.info
>> Connect on ~friendica: copyninja at samsargika.copyninja.info
>> IRC nick: copyninja | vasudev {irc.oftc.net | irc.freenode.net}
>> GPG Key: C517 C25D E408 759D 98A4  C96B 6C8F 74AE 8770 0B7E
>>
>> _______________________________________________
>> Student-projects mailing list
>> Student-projects at lists.smc.org.in
>> http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in
>>
>>
>
> _______________________________________________
> Student-projects mailing list
> Student-projects at lists.smc.org.in
> http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in
>
>


-- 
Aboobacker MK
GSoC Student
twitter.com/abvayad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.smc.org.in/pipermail/student-projects-smc.org.in/attachments/20140629/eaa1926b/attachment.html>


More information about the Student-projects mailing list