[smc-discuss] Free and open Malayalam dictionary dataset

Balasankar Chelamattath c.balasankar at gmail.com
Tue May 21 23:06:14 PDT 2013


Thanks Kailash.. That is a great news,

Regards,
Balasankar C



2013/5/22 Kailash Nadh <kailash.nadh at gmail.com>

>  Hello all,
> I've just been able publish the semanticised version of Datuk's original
> ASCII Malayalam-Malayalam dictionary digitisation work.
> => http://olam.in/open/datuk
>
> "The Datuk Corpus" is a human readable, parse-ready, Unicode dictionary
> dataset with over 83,000 Malayalam words and over 106,000 definitions. It's
> been in development for over two years. The dataset is an evolution of
> Datuk's original work, and has undergone extensive refinement, corrections,
> and structuring, amounting to tens of thousands of changes. The Github
> repository for the project contains the full text corpus, an SQL dump, and
> a couple Python scripts for parsing and conversion.
>
> This is the same dataset that powers Olam's Malayalam-Malayalam dictionary
> that went live two days ago. Also, Datuk's original work constitutes a
> substantial portion of the Malayalam Wiktionary.
>
>
> Sample entries from the dataset:
>
> ച	ചക്രാംഗി	സം. -അംഗീ	_   36953
> 	നാ.	അരയന്നപ്പിട
> 	നാ.	ചക്രവാകപ്പിട
> 	നാ.	മഞ്ചട്ടി
> 	നാ.	കക്കടകശൃംഗി
>
> പ	പരോക്ഷം	_	_	57697
> 	നാ.	മറവ്
> 	നാ.	പരോക്ഷജ്ഞാനം
> 	നാ.	പ്രത്യക്ഷമല്ലാത്തത്
>
>
> The dataset is licensed under the ODbL<http://opendatacommons.org/licenses/odbl/>,
> inspired by the Open Street Map project.
>
> Hope this is all useful.
>
> Thanks
>
> Kailash
>
> _______________________________________________
> Swathanthra Malayalam Computing discuss Mailing List
> Project: https://savannah.nongnu.org/projects/smc
> Web: http://smc.org.in | IRC : #smc-project @ freenode
> discuss at lists.smc.org.in
> http://lists.smc.org.in/listinfo.cgi/discuss-smc.org.in
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.smc.org.in/pipermail/discuss-smc.org.in/attachments/20130522/d17e274e/attachment-0003.htm>


More information about the discuss mailing list