[Student-projects] GSoC project on Language model and Acoustic model for Malayalam language for speech recognition system in CMU Sphinx

Khyathi Chandu khyathiraghavi at gmail.com
Sat Mar 22 03:55:51 PDT 2014


Hi,

I want to work on "Language and Acoustic Model for Malayalam in CMU Sphinx".
Where do I need to try and fix bugs. I have submitted my application. What
do I have to do now?

Thank you


On Fri, Mar 21, 2014 at 12:16 PM, Khyathi Chandu
<khyathiraghavi at gmail.com>wrote:

> Hi sir,
> Thank you for suggestions. I really appreciate the point b) regarding
> mapping phoneme set to graphemes as much annotated audio data is not
> available.
>
> As per the remaining:
> a)  There is a limited audio database available. To start with, we can use
> the sample set by LDC-IL (Linguistic Data Consortium for Indian Languages)(
> http://www.ldcil.org/resourcesSampleSpeechCorp.aspx) and the annotated
> speech data available from Speech and Vision Lab of IIIT-H (
> http://speech.iiit.ac.in/index.php/research-svl/69.html). But I think
> some amount of speech recordings and manual transcription also gives
> strength to the project.
>
> c) Another challenge that is to be faced is the lack of availability of
> vast text corpora in Malayalam that could be used for language modeling. My
> idea for compilation of data is to use data from wikipedia pages and
> reliable e news papers like Manorama (
> http://www.manoramaonline.com/cgi-bin/MMOnline.dll/portal/ep/home.do?tabId=0)
> and deshabhimani (http://www.deshabhimani.com/home.php) and also LDCIL
> dataset (http://www.ldcil.org/Corpora/text/Malayalam/MAL1.pdf).
>
>
> Link for the updated proposal is :
>
> http://wiki.smc.org.in/User:Ragha
>
> Feedback and suggestions are highly valued and appreciated.
>
> Thank you
>
>
> On Fri, Mar 21, 2014 at 4:09 AM, Kartik A <kartik.a9111 at gmail.com> wrote:
>
>> Hi Khyati,
>>
>> A few queries about your plan of action. Please correct me if I am wrong.
>>
>> a) Data Compilation :- For an acoustic model audio data is a very
>> significant requirement. Do you have any plan in mind about which databases
>> you can focus on? You mentioned about transcribing from the audio data. So
>> if you plan to take audio data that is 4 hours long so will it be manually
>> transcribed? I think there needs to be setting up of resources before one
>> can even think of training the Sphinx model.
>>
>> b) I guess huge amount of annotated audio data can not be gathered for
>> Malayalam so one has to look into adaptive acoustic modelling for that you
>> have to make a Grapheme to phoneme mapping, which should look like this:
>>     മ ല യാ ളം   :  ma la ya La aM
>> and then map to the phone set Sphinx supports
>>
>> c) Language Model : There are various straight forward approaches, and
>> yeah I agree N-gram is still the best amongst them. But what about
>> compiling data for Language Modelling like a large raw dataset for
>> Malayalam. Is there any such available dataset except the transciptions of
>> audio data?
>>
>>
>>
>> On Fri, Mar 21, 2014 at 12:06 AM, Deepa P.Gopinath <
>> deepapgopinath at gmail.com> wrote:
>>
>>> Hello,
>>>
>>> Time line is better now. End deliverable can be 'Language and acoustic
>>> model', itself I feel. A speech recognition system can be developed within
>>> the constraints of time.
>>>
>>> regards
>>>
>>>
>>> On Thu, Mar 20, 2014 at 8:04 PM, Khyathi Chandu <
>>> khyathiraghavi at gmail.com> wrote:
>>>
>>>> Mam,
>>>>
>>>> I have updated the project proposal based on your suggestions. I have
>>>> mentioned the details of data compilation and modified the time frame. Here
>>>> is the link:
>>>>
>>>> http://wiki.smc.org.in/User:Ragha
>>>>
>>>> I am ready to dedicate any amount of time and include the intricacies
>>>> to the best I can. Kindly expecting your feedback.
>>>>
>>>> Thank you
>>>>
>>>>
>>>>
>>>> On Thu, Mar 20, 2014 at 1:16 PM, Deepa P.Gopinath <
>>>> deepapgopinath at gmail.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> To develop language and acoustic model, we need to compile a
>>>>> sufficient data base. This you haven't considered in your proposal.*
>>>>> I feel you have to reframe your time line*. It seems to be a bit
>>>>> ambitious. After the project we should be able to contribute a good
>>>>> database and a language and acoustic model.
>>>>>
>>>>> do contact after modifying your proposal
>>>>>
>>>>> regards
>>>>>
>>>>>
>>>>> On Wed, Mar 19, 2014 at 1:09 PM, Khyathi Chandu <
>>>>> khyathiraghavi at gmail.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> This is the link of how I would like to proceed with the project.
>>>>>> I want to work on the project "Language model and Acoustic model for
>>>>>> Malayalam language for speech recognition system in CMU Sphinx".
>>>>>>
>>>>>> http://wiki.smc.org.in/User:Ragha
>>>>>>
>>>>>> It would be very helpful if someone could give feedback and give some
>>>>>> suggestions.
>>>>>>
>>>>>> Thank you
>>>>>>
>>>>>> _______________________________________________
>>>>>> Student-projects mailing list
>>>>>> Student-projects at lists.smc.org.in
>>>>>> http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Dr. Deepa P.Gopinath
>>>>> Lecturer in Electronics and Communication
>>>>> Department of  Electronics Engg.
>>>>> College of Engineering Thiruvananthapuram
>>>>> Kerala, India
>>>>> Mobile- +919446583466
>>>>>
>>>>> _______________________________________________
>>>>> Student-projects mailing list
>>>>> Student-projects at lists.smc.org.in
>>>>> http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Student-projects mailing list
>>>> Student-projects at lists.smc.org.in
>>>> http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in
>>>>
>>>>
>>>
>>>
>>> --
>>> Dr. Deepa P.Gopinath
>>> Lecturer in Electronics and Communication
>>> Department of  Electronics Engg.
>>> College of Engineering Thiruvananthapuram
>>> Kerala, India
>>> Mobile- +919446583466
>>>
>>> _______________________________________________
>>> Student-projects mailing list
>>> Student-projects at lists.smc.org.in
>>> http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in
>>>
>>>
>>
>>
>> --
>> Thanks & Regards,
>> Kartik A.
>>
>> _______________________________________________
>> Student-projects mailing list
>> Student-projects at lists.smc.org.in
>> http://lists.smc.org.in/listinfo.cgi/student-projects-smc.org.in
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.smc.org.in/pipermail/student-projects-smc.org.in/attachments/20140322/e588b9bf/attachment-0001.html>


More information about the Student-projects mailing list