[smc-discuss] GSoC Community Bonding - Blog and Contribution

Balasankar C balasankarc at autistici.org
Thu May 5 20:36:42 PDT 2016


During the community bonding period of the GSoC program, the
participants are expected to interact with the community to understand
about the basic work flow of the community, make some small small
contributions to the projects that the community maintain like bugfix
patches and documentations, understand the coding standards, deployment
models, version control systems etc that are being used by the
community, familiarize with the communication methods used by the
community etc.

*0. Blog*
I will be maintaining a blog for documenting and summarizing the work I
do for the GSoC Program, which may be found as a category in my personal

*1. Community Contribution*
I have been contributing to the community before GSoC, in form of
localization and packaging etc. Still, as a contribution to the
community during the GSoC Community Bonding period, I have written a
small library in Python,  that can be used to generate different
Vibhakthi forms of Malayalam words (like രാമൻ -> രാമന്റെ, രാമനെ, രാമനോട്,
രാമനാൽ etc). The project was inspired by Santhoshettan's similar
project[1] using jQuery.i18n library. The code is available in my
personal repo[2] now and I will be pushing it to the organization's repo

The library now uses a rule based approach to generate the Vibhakthi
forms, which is not 100% efficient. It will fail in words ending with
those letters (usually Chillu characters) whose base forms are still
ambiguous. Example, the words ending with ർ whose base form can be
either ര or റ. Different words ending with ർ, that have similar
structure has different results when applying the same vibhakthi.

Example :
അനിവർ + സംബന്ധിക = അനിവറിന്റെ
മലർ + സംബന്ധിക = മലരിന്റെ
കൗരവർ + സംബന്ധിക = കൗരവരുടെ

(Thanks to Santhoshettan for the following info)This shows a drawback of
rule based method and we need to develop a method where the word
etymology is also considered . That can be expected to be done when
Machine Learning techniques become more clear and usable for Malayalam.
For now, since no such library exists, I follow the concept of "99% is
better than 0%" and guess the library is worth using until we can find
something better. I have a plan to use this library during the
development of the spell checker (I will post its proposal in detail,
soon), which I have to dig more on. For now, you can try out the library
at this online demo[3]. Happy if you people can test and report
issues/suggestions etc.

[0] http://balasankarc.in/tech/category/gsoc/
[1] http://thottingal.in/projects/js/jquery.i18n/demo/mlgrammar.html
[2] http://gitlab.com/balasankarc/vibhakthi-generator
[3] http://vibhakthi.balasankarc.in/

Balasankar C

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.smc.org.in/pipermail/discuss-smc.org.in/attachments/20160506/f0829775/attachment-0001.html>

More information about the discuss mailing list