[Student-projects] Language model and Acoustic model for Malayalam (Deepa P.Gopinath) as GSOC Project

Deepa P.Gopinath deepapgopinath at gmail.com
Mon Mar 10 02:33:31 PDT 2014


Hello Karan,

I have heard the iiit-speech data base. 1000 selected sentences are there,
spoken as separate sentences. It is good for a TTS system. But for ASR it
might not be very good I think for 2 reasons-
1) since it is isolated sentences, it may be able to recognize speech in
isolated sentences. or in other words, the input speech should have enough
pause in between sentences. 2) the articulation is very slow and
pronunciation very clear and good. In that way it is slightly different
from normal malayalam reading style.  For ASR system we need a speech data
base that resembles a typical malayalam speech.

For ASR, the training database is very important. The results depend on
this.

As you said, Malayalam have similarity with Telugu. So phonetic dictionary
available for telugu, can be adapted for malayalam.

A standard text corpus is not readily available for Malayalam, so far as I
know.

regards


On Mon, Mar 10, 2014 at 4:30 AM, karan singla <ksingla025 at gmail.com> wrote:

> Hello Deepa,
>
> I am Karan, working in LTRC,IIIT-Hyderabad and have also worked in a
> project co-funded by AT&T in making an ASR for Hindi and have tried
> adaptive acoustic modelling for Kannada and Malyalam( results were not
> great )
>
>
> As suggested by you, we can begin with taking a small speech corpus
> available freely available for Malyalam
>
> http://festvox.org/databases/iiit_voices/
>
> Although, this is not sufficient, but just to begin with. We need to
> record more data in the future.
>
> For Acoustic Modelling:
>
> There is a freely available phonetic dictionary for Hindi, in which Hindi
> graphemes have been mapped to English American Phone set as Sphinx is build
> up for English phone set and we don't have enough speech data for creating
> a new model. So adaptation is only possible at first.
>
> As Malayalam is a Dravidian language, I guess there is a phonetic
> dictionary available for Telugu in speech lab at my university but I need
> to check if they can share. So then adapting from Telugu will be a better
> option as it can be called "close" to Malayalam than Hindi.
>
> So after making a model with this dictionary, one need to generate
> phonetic mapping for all the words in the transcription files of speech
> corpus.
>
> For Language Modelling :
> Transcriptions will be  included for sure. I am not aware of a raw text
> available in Malayalam. Is there a raw data avialble ??
>
> Am I thinking right ??
>
> Hoping a reply soon,
> Karan Singla
> LTRC, IIIT-Hyderabad
>
> _______________________________________________
> Student-projects mailing list
> Student-projects at lists.smc.org.in
> http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in
>
>


-- 
Dr. Deepa P.Gopinath
Lecturer in Electronics and Communication
Department of  Electronics Engg.
College of Engineering Thiruvananthapuram
Kerala, India
Mobile- +919446583466
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.smc.org.in/pipermail/student-projects-smc.org.in/attachments/20140310/bf4a3b09/attachment.html>


More information about the Student-projects mailing list