<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Thanks @Anivar :)<br>
<br>
Yes, the dataset will be updated periodically. Since it's on
Github, anyone can contribute to the main branch any time. If
anyone's interested in contributor/moderator access to the
repository, please let me know.<br>
<br>
In addition, I'm also working on making Olam's English-Malayalam
dataset public.<br>
<br>
Kailash<br>
<br>
On 22/05/2013 7:11 PM, Anivar Aravind wrote:<br>
</div>
<blockquote
cite="mid:CA+nuCJbuV6z+=AsW0+RQjDwboZQt-0AKDFqVC8-p2XfYUMC1Tg@mail.gmail.com"
type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Wed, May 22, 2013 at 11:29 AM,
Kailash Nadh <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:kailash.nadh@gmail.com" target="_blank">kailash.nadh@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div lang="x-western"> Hello all,<br>
I've just been able publish the semanticised version
of Datuk's original ASCII Malayalam-Malayalam
dictionary digitisation work.<br>
=> <a moz-do-not-send="true"
href="http://olam.in/open/datuk" target="_blank">http://olam.in/open/datuk</a><br>
<br>
"The Datuk Corpus" is a human readable, parse-ready,
Unicode dictionary dataset with over 83,000 Malayalam
words and over 106,000 definitions. It's been in
development for over two years. The dataset is an
evolution of Datuk's original work, and has undergone
extensive refinement, corrections, and structuring,
amounting to tens of thousands of changes. The Github
repository for the project contains the full text
corpus, an SQL dump, and a couple Python scripts for
parsing and conversion.<br>
<br>
This is the same dataset that powers Olam's
Malayalam-Malayalam dictionary that went live two days
ago. Also, Datuk's original work constitutes a
substantial portion of the Malayalam Wiktionary.<br>
<br>
<br>
Sample entries from the dataset:<br>
<pre>ച ചക്രാംഗി സം. -അംഗീ _ 36953
നാ. അരയന്നപ്പിട
നാ. ചക്രവാകപ്പിട
നാ. മഞ്ചട്ടി
നാ. കക്കടകശൃംഗി</pre>
<pre>പ പരോക്ഷം _ _ 57697
നാ. മറവ്
നാ. പരോക്ഷജ്ഞാനം
നാ. പ്രത്യക്ഷമല്ലാത്തത്</pre>
<br>
The dataset is licensed under the <a
moz-do-not-send="true"
href="http://opendatacommons.org/licenses/odbl/"
target="_blank">ODbL</a>, inspired by the Open
Street Map project.<br>
<br>
Hope this is all useful.<br>
<br>
Thanks<span class=""></span></div>
</div>
</blockquote>
<div><br>
</div>
<div>Great Work Kailash:-) . This is indeed a great release
. When Public funded projects are wasting money in
creating unreleased datasets (like this <a
moz-do-not-send="true"
href="http://tools.malayalam.kerala.gov.in/">http://tools.malayalam.kerala.gov.in/</a>)
, It is very heartening to see this structured dataset
release. Hope you will periodically update the release
with new contributions. <br>
<br>
</div>
<div>Now we need people for dictd packaging and integrating
this with Silpa's Jabberbot <br>
</div>
<div><br>
</div>
<div>BTW Just thinking about another project . Can anybody
extend Artha(<a moz-do-not-send="true"
href="http://artha.sourceforge.net/wiki/index.php/Artha:About">http://artha.sourceforge.net/wiki/index.php/Artha:About</a>),
the best gTK thesaurus application to support dictd format
? As of now it only supports wordnet and there is no
wordnet for malayalam<br>
<br>
</div>
<div><br>
</div>
<div> ~ Regards<br>
</div>
<div>Anivar<br>
</div>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Swathanthra Malayalam Computing discuss Mailing List
Project: <a class="moz-txt-link-freetext" href="https://savannah.nongnu.org/projects/smc">https://savannah.nongnu.org/projects/smc</a>
Web: <a class="moz-txt-link-freetext" href="http://smc.org.in">http://smc.org.in</a> | IRC : #smc-project @ freenode
<a class="moz-txt-link-abbreviated" href="mailto:discuss@lists.smc.org.in">discuss@lists.smc.org.in</a>
<a class="moz-txt-link-freetext" href="http://lists.smc.org.in/listinfo.cgi/discuss-smc.org.in">http://lists.smc.org.in/listinfo.cgi/discuss-smc.org.in</a>
</pre>
</blockquote>
<br>
</body>
</html>