[Student-projects] GSoc CMU Sphinx

Kevin Martin youcancallmekevin at gmail.com
Thu Feb 27 01:21:47 PST 2014


[Possible repost. My previous post bounced]

Hi,

I'm Kevin from College of Engineering Trivandrum and I would like to
implement [3.3] - language model and acoustic model for malayalam speech
recognition using CMU Sphinx as a GSoC project.

I went through the provided material and correct me if I'm wrong - it seems
as if there is little programming involved. Basically we have to provide a
text corpus and the acoustic model needs a lot of recordings. As far as
acoustic modeling is concerned, there are 2 options :

1) Adaptive acoustic modeling - The acoustic model is created by changing
the existing phonemes. Basically, we recognize malayalam by checking for
English phonetics in the speech.

2) Acoustic modeling from scratch - This needs a lot of data, and is
considerably more difficult to implement.

I would like to know which path the mentor was planning to follow. In any
case, I think it is a matter of choosing the right data to feed. My reading
is not thorough (I just skimmed through some docs, will study in detail
soon). So a vague road map would be something like this I suppose :

1. Set up sphinx/associated packages
2. Find a text corpus
3. create vocabulary from text corpus
4. Find sufficient amount of recordings
5. Feed both into sphinx and let it learn

For text corpus, [1] suggested using the wikipedia. But I think movies and
their subtitles provide a better option. It should be possible to strip the
background music and other ambient sounds from the movie audio. Usually
subtitles are in English, which will make it more suitable if we are
following the adaptive acoustic modeling method.

regards,

Kevin Martin Jose

[1] http://www.cs.cmu.edu/~gopalakr/publications/spdatabases_specom05.pdf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.smc.org.in/pipermail/student-projects-smc.org.in/attachments/20140227/db375a29/attachment-0002.htm>


More information about the Student-projects mailing list