[smc-discuss] Speech to text tool for Malayalam

Kavya Manohar sakhi.kavya at gmail.com
Sun Jun 21 22:39:22 PDT 2020


Hi Julin,

We do not have a ready to use, open source  speech to text tool for
Malayalam as of now.

But there are works going on in this domain. I am working on a Kaldi
<https://kaldi-asr.org/> based solution as an academic researcher. This repo
<https://gitlab.com/kavyamanohar/malayalam-spoken-digit-recognizer>demonstrates
the development of a spoken digit recognizer for Malayalam. A past attempt
on Malayalam ASR using CMUSphinx by Sreenadh is available here
<https://github.com/sreecodeslayer/ml-am-lm-cmusphinx>.

SMC has taken initiatives for collecting Malayalam speech corpus
<https://msc.smc.org.in> for training general purpose automatic speech
recognition (ASR) systems. The audio recorded so far is available here
<https://gitlab.com/smc/msc>.
A grapheme to phoneme converter, mlphon <https://gitlab.com/smc/mlphon>
that creates a phonetic lexicon (a dictionary of Malayalam words and its
pronunciation) needed for ASR training is another project.  Mlphon is
available as a Pypi <https://pypi.org/project/mlphon/> library.

Regards

-- 
Kavya Manohar
Research Scholar
College of Engineering Trivandrum
https://kavyamanohar.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.smc.org.in/pipermail/discuss-smc.org.in/attachments/20200622/707a55ae/attachment-0002.html>


More information about the discuss mailing list