[smc-discuss] malayalam compound word splitter

Merin Francis merinfrancis at yahoo.com
Wed Jan 8 23:40:29 PST 2014



 Hi all,
     

    I am trying to make a compound word splitter using sandhi rules. The base paper (atttached) says split the word into 2, and apply rules based on the last letter of the first word and 1st letter of the 2nd word. I am stuck with the word splitting itself.

    I tried to split it by checking it with the dictionary. mumbil ninnu purakottum purakil ninnu munpottum sramichu. pakshe enikku venda reeethiyil split cheyyan paatunnilla.  I have attached that analysis - the 1st and last word obtained checking from 2 sides. enne ithonnu split cheyyan sahaayikkanam.

My code is also attached.


Thanks and regards,
merin Francis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.smc.org.in/pipermail/discuss-smc.org.in/attachments/20140108/a0f45535/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sandhi.py
Type: text/x-python
Size: 1602 bytes
Desc: not available
URL: <http://lists.smc.org.in/pipermail/discuss-smc.org.in/attachments/20140108/a0f45535/sandhi-0002.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: analysis
Type: application/octet-stream
Size: 910 bytes
Desc: not available
URL: <http://lists.smc.org.in/pipermail/discuss-smc.org.in/attachments/20140108/a0f45535/analysis-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dbConnect.py
Type: text/x-python
Size: 768 bytes
Desc: not available
URL: <http://lists.smc.org.in/pipermail/discuss-smc.org.in/attachments/20140108/a0f45535/dbConnect-0002.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Splitting Compound Words in Malayalam Language.pdf
Type: application/pdf
Size: 750489 bytes
Desc: not available
URL: <http://lists.smc.org.in/pipermail/discuss-smc.org.in/attachments/20140108/a0f45535/SplittingCompoundWordsinMalayalamLanguage-0002.pdf>


More information about the discuss mailing list