[smc-discuss] Free and open Malayalam dictionary dataset

Rajeesh K Nambiar rajeeshknambiar at gmail.com
Wed May 22 01:19:45 PDT 2013


On Wed, May 22, 2013 at 7:43 AM, Manilal K M <libregeek at gmail.com> wrote:

>
> ---------- Forwarded message ----------
> From: Kailash Nadh <kailash.nadh at gmail.com>
> To: smc-discuss at googlegroups.com
> Cc:
> Date: Wed, 22 May 2013 10:18:16 +0530
> Subject: Free and open Malayalam dictionary dataset
>  Hello all,
> I've just been able publish the semanticised version of Datuk's original
> ASCII Malayalam-Malayalam dictionary digitisation work.
> => http://olam.in/open/datuk
>
>
Great work!


> "The Datuk Corpus" is a human readable, parse-ready, Unicode dictionary
> dataset with over 83,000 Malayalam words and over 106,000 definitions. It's
> been in development for over two years. The dataset is an evolution of
> Datuk's original work, and has undergone extensive refinement, corrections,
> and structuring, amounting to tens of thousands of changes. The Github
> repository for the project contains the full text corpus, an SQL dump, and
> a couple Python scripts for parsing and conversion.
>
> This is the same dataset that powers Olam's Malayalam-Malayalam dictionary
> that went live two days ago. Also, Datuk's original work constitutes a
> substantial portion of the Malayalam Wiktionary.
>
>
> Sample entries from the dataset:
>
> ച	ചക്രാംഗി	സം. -അംഗീ	_   36953
> 	നാ.	അരയന്നപ്പിട
> 	നാ.	ചക്രവാകപ്പിട
> 	നാ.	മഞ്ചട്ടി
> 	നാ.	കക്കടകശൃംഗി
>
> പ	പരോക്ഷം	_	_	57697
> 	നാ.	മറവ്
> 	നാ.	പരോക്ഷജ്ഞാനം
> 	നാ.	പ്രത്യക്ഷമല്ലാത്തത്
>
>
>
The dataset looks very convenient to convert to RFC2229 format used by
dictd. If anyone is interested to create an offline version of it, see
http://wiki.smc.org.in/Dictionary and contact me for help.


> The dataset is licensed under the ODbL<http://opendatacommons.org/licenses/odbl/>,
> inspired by the Open Street Map project.
>
> Hope this is all useful.
>
> Thanks
>
> Kailash
>
>
>
>


-- 
Cheers,
Rajeesh
http://rajeeshknambiar.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.smc.org.in/pipermail/discuss-smc.org.in/attachments/20130522/1cb6c01a/attachment-0003.htm>


More information about the discuss mailing list