[Student-projects] GSOC'14: Spell checker

Shafeeq K shafeeq94 at gmail.com
Thu Mar 6 08:34:22 PST 2014


Hi,
I'm Shafeeq, a second year CSE student from NSS College of
Engineering, Palakkad.
I am interested in this year's GSOC project "A spell checker for Indic
language that understands inflections". I've been reading and doing a
little of homework for this project, as suggested by the mentor.

I couldn't find any affix rules for malayalam in the corresponding
affix file. Does that mean currently we rely only on the collection of
words for spell check?

Hunspell manual suggests only two-fold suffix stripping. Since it was
mentioned that Indic languages might require as much as 5 levels of
stripping, is Hunspell the way forward? I saw an experimental
indic-stemmer in SILPA. Couldn't we expand it to handle the multilevel
suffix stripping?

About the agglutinations of words and suffixes, I came across a paper
while reading about it [1]. Could you please suggest some other
documents as well?

Thanks.

[1]: http://aclweb.org/anthology//O/O12/O12-1028.pdf


Shafeeq



More information about the Student-projects mailing list