<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Ah, looks like you beat me to it
Manilal :) Thanks anyway.<br>
<br>
Kailash<br>
<br>
On 22/05/2013 11:13 AM, Manilal K M wrote:<br>
</div>
<blockquote
cite="mid:CAFa-k1hcw32=gJTp=ZMEUGSx1Rhoz-Hj7WRUzPxwzqJGXrNFPw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div>
<table cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td><br>
</td>
</tr>
</tbody>
</table>
</div>
<br>
---------- Forwarded message ----------<br>
From: Kailash Nadh <<a moz-do-not-send="true"
href="mailto:kailash.nadh@gmail.com">kailash.nadh@gmail.com</a>><br>
To: <a moz-do-not-send="true"
href="mailto:smc-discuss@googlegroups.com">smc-discuss@googlegroups.com</a><br>
Cc: <br>
Date: Wed, 22 May 2013 10:18:16 +0530<br>
Subject: Free and open Malayalam dictionary dataset<br>
<div text="#000000" bgcolor="#FFFFFF"> Hello all,<br>
I've just been able publish the semanticised version of
Datuk's original ASCII Malayalam-Malayalam dictionary
digitisation work.<br>
=> <a moz-do-not-send="true"
href="http://olam.in/open/datuk" target="_blank">http://olam.in/open/datuk</a><br>
<br>
"The Datuk Corpus" is a human readable, parse-ready, Unicode
dictionary dataset with over 83,000 Malayalam words and over
106,000 definitions. It's been in development for over two
years. The dataset is an evolution of Datuk's original work,
and has undergone extensive refinement, corrections, and
structuring, amounting to tens of thousands of changes. The
Github repository for the project contains the full text
corpus, an SQL dump, and a couple Python scripts for parsing
and conversion.<br>
<br>
This is the same dataset that powers Olam's
Malayalam-Malayalam dictionary that went live two days ago.
Also, Datuk's original work constitutes a substantial
portion of the Malayalam Wiktionary.<br>
<br>
<br>
Sample entries from the dataset:<br>
<pre>ച ചക്രാംഗി സം. -അംഗീ _ 36953
നാ. അരയന്നപ്പിട
നാ. ചക്രവാകപ്പിട
നാ. മഞ്ചട്ടി
നാ. കക്കടകശൃംഗി</pre>
<pre>പ പരോക്ഷം _ _ 57697
നാ. മറവ്
നാ. പരോക്ഷജ്ഞാനം
നാ. പ്രത്യക്ഷമല്ലാത്തത്</pre>
<br>
The dataset is licensed under the <a moz-do-not-send="true"
href="http://opendatacommons.org/licenses/odbl/"
target="_blank">ODbL</a>, inspired by the Open Street Map
project.<br>
<br>
Hope this is all useful.<br>
<br>
Thanks<br>
<br>
Kailash<br>
</div>
<br>
</div>
<br>
<br clear="all">
<br>
-- <br>
Manilal K M | മണിലാല് കെ എം.<br>
<a moz-do-not-send="true" href="http://libregeek.blogspot.com"
target="_blank">http://libregeek.blogspot.com</a>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Swathanthra Malayalam Computing discuss Mailing List
Project: <a class="moz-txt-link-freetext" href="https://savannah.nongnu.org/projects/smc">https://savannah.nongnu.org/projects/smc</a>
Web: <a class="moz-txt-link-freetext" href="http://smc.org.in">http://smc.org.in</a> | IRC : #smc-project @ freenode
<a class="moz-txt-link-abbreviated" href="mailto:discuss@lists.smc.org.in">discuss@lists.smc.org.in</a>
<a class="moz-txt-link-freetext" href="http://lists.smc.org.in/listinfo.cgi/discuss-smc.org.in">http://lists.smc.org.in/listinfo.cgi/discuss-smc.org.in</a>
</pre>
</blockquote>
<br>
</body>
</html>