[smc-discuss] Free and open Malayalam dictionary dataset
Rajeesh K Nambiar
rajeeshknambiar at gmail.com
Wed May 22 01:19:45 PDT 2013
On Wed, May 22, 2013 at 7:43 AM, Manilal K M <libregeek at gmail.com> wrote:
>
> ---------- Forwarded message ----------
> From: Kailash Nadh <kailash.nadh at gmail.com>
> To: smc-discuss at googlegroups.com
> Cc:
> Date: Wed, 22 May 2013 10:18:16 +0530
> Subject: Free and open Malayalam dictionary dataset
> Hello all,
> I've just been able publish the semanticised version of Datuk's original
> ASCII Malayalam-Malayalam dictionary digitisation work.
> => http://olam.in/open/datuk
>
>
Great work!
> "The Datuk Corpus" is a human readable, parse-ready, Unicode dictionary
> dataset with over 83,000 Malayalam words and over 106,000 definitions. It's
> been in development for over two years. The dataset is an evolution of
> Datuk's original work, and has undergone extensive refinement, corrections,
> and structuring, amounting to tens of thousands of changes. The Github
> repository for the project contains the full text corpus, an SQL dump, and
> a couple Python scripts for parsing and conversion.
>
> This is the same dataset that powers Olam's Malayalam-Malayalam dictionary
> that went live two days ago. Also, Datuk's original work constitutes a
> substantial portion of the Malayalam Wiktionary.
>
>
> Sample entries from the dataset:
>
> ച ചക്രാംഗി സം. -അംഗീ _ 36953
> നാ. അരയന്നപ്പിട
> നാ. ചക്രവാകപ്പിട
> നാ. മഞ്ചട്ടി
> നാ. കക്കടകശൃംഗി
>
> പ പരോക്ഷം _ _ 57697
> നാ. മറവ്
> നാ. പരോക്ഷജ്ഞാനം
> നാ. പ്രത്യക്ഷമല്ലാത്തത്
>
>
>
The dataset looks very convenient to convert to RFC2229 format used by
dictd. If anyone is interested to create an offline version of it, see
http://wiki.smc.org.in/Dictionary and contact me for help.
> The dataset is licensed under the ODbL<http://opendatacommons.org/licenses/odbl/>,
> inspired by the Open Street Map project.
>
> Hope this is all useful.
>
> Thanks
>
> Kailash
>
>
>
>
--
Cheers,
Rajeesh
http://rajeeshknambiar.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.smc.org.in/pipermail/discuss-smc.org.in/attachments/20130522/1cb6c01a/attachment-0003.htm>
More information about the discuss
mailing list