<div dir="ltr"><p style="line-height:1.15;margin-top:0pt;margin-bottom:0pt" id="docs-internal-guid-3458cd39-7dfa-e243-a2d6-ec8ba78141ad" dir="ltr"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">I
 am sorry that i couldn't edit the proposal in time since i had my 
university exam on 6th of May . Kindly consider this comment as my 
response to the above requested details.</span></p>
<p> </p>
<p style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;margin-left:36pt" dir="ltr"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">>"During
 this period i will be collecting all the voice data and text 
>corpora required for the acoustic model and language model 
>respectively. Graphemes to phoneme conversion and optimal </span><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">></span>text selection algorithms will be used to optimize the text </span><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">></span>corpora . Choosing appropriate speakers based on data </span><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">></span>statistics will also be done during this period." </span></p>

<p> </p>
<p style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;margin-left:36pt" dir="ltr"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">></span>Can this be a bit more detailed? As per the given timeline, this </span><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">></span>activity is planned for 15 days. Is it that simple? I can see </span><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">></span>technical and logistical overhead that can make your time </span><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">></span>estimate wrong.</span></p>

<p> </p>
<p style="line-height:1.15;margin-top:0pt;margin-bottom:0pt" dir="ltr"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">I have experience working on a project that involved modelling of a closed vocabulory acoustic system.</span></p>

<p> </p>
<p style="line-height:1.15;margin-top:0pt;margin-bottom:0pt" dir="ltr"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">Organisation
 such as CDAC and LDC-IL have already collected text and speech corpora 
for continuous speech recognition in malayalam. I am planning to collect
 data from them. I have already contacted and collected Voice Data (7GB)
 from CDAC. But i found that there is scope for further improvement in 
their data. Looking into the sample data available in the LDC-IL website
 it seems that the data they have collected is more refined and better .</span></p>
<p><br><br></p>
<p style="line-height:1.15;margin-top:0pt;margin-bottom:0pt" dir="ltr"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">But
 if either of the organisation is not willing to upload the data under 
free license , I am planning to conduct a data collection drive. I have 
experience collecting data for a closed vocabulory acoustic system and i
 believe i can complete sufficient data in 15 days .</span></p>
<p><br><br></p>
<p style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;margin-left:36pt" dir="ltr"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">>What do you mean by graphemes to phoneme conversion? is it </span><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">></span>TTS functionality? What kind of text selection algorithm is </span><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">></span>planned and what exactly is the purpose of that. Giving some </span><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">></span>more detail in to that will help us in understanding the </span><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">></span>complexity and reliability of time planned.</span></p>

<p><br><br><br></p>
<p style="line-height:1.15;margin-top:0pt;margin-bottom:0pt" dir="ltr"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">After
 collecting text data (news text). I will be performing grapheme to 
phoneme conversion to split the text into phonemes . We need to have the
 largest number of utterances of less frequent phones in the text corpus
 to achieve maximum utterance variation . For that we use optimum text 
selection algorithm . The algorithm we are planning to implement is 
based on a paper submitted in 2011 International Conference on Asian 
Language Processing ( Link : <a href="http://cse.iitkgp.ac.in/~pabitra/paper/ialp.pdf">cse.iitkgp.ac.in/~pabitra/paper/ialp.pdf</a> 
).</span></p>
<p> </p>
<p style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;margin-left:36pt" dir="ltr"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">>"Once a working acoustic and language model has been </span><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">></span>formed further language specific improvements can be </span><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">></span>performed . Consulting linguists for incorporating Malayalam </span><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">></span>grammar rules  to improve the recognition accuracy of the </span><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">></span>speech recognition system is one such method ."</span></p>

<p><br><br></p>
<p style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;margin-left:36pt" dir="ltr"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">></span>This activity is planned for 12 days.  It is clearly ambitious. Or </span><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">></span>may be your deliverables for this period is that simple. So can </span><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">></span>you give details on what is the deliverable from this 12 days </span><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">></span>and how it helps in next timeline?</span></p>

<p><br><br><br></p>
<p style="line-height:1.15;margin-top:0pt;margin-bottom:0pt" dir="ltr"><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">As
 planned i will use August to address and solve as many of the language 
specific problems mentioned in the project proposal. For that we need 
help of linguists and i have kept apart 12 days (July  16 - July 28th ) 
for the same.This time will be used for two things.</span></p>
<p> </p>
<ol><li><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">Learn to modify the sphinx engine to make language specific improvements in the sphinx engine .</span></li>
<li><span style="font-size:16px;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">Understand and Identify the important linguistic improvements that can be made .</span></li>
</ol>
<p><br><br></p>
NB : I Have put the same as a comment in google-melange too .<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, May 2, 2013 at 4:23 PM, Rahul A.R <span dir="ltr"><<a href="mailto:2ar.rahul@gmail.com" target="_blank">2ar.rahul@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div>Thanks for the feedback .  I have updated my proposal pointing out some of the linguistic  and language specific challenges we will face during the course of our project .. I have made the time line as clear as possible . Kindly go through it .<br>

<br></div><div>Link : <a href="http://wiki.smc.org.in/User:Ar_rahul/GSoC2013/" target="_blank">http://wiki.smc.org.in/User:Ar_rahul/GSoC2013/</a><br><br></div>Regards,<br></div>A.R.Rahul<br></div><div class="HOEnZb"><div class="h5">
<div class="gmail_extra"><br><br><div class="gmail_quote">
On Wed, May 1, 2013 at 6:11 PM, Anivar Aravind <span dir="ltr"><<a href="mailto:anivar.aravind@gmail.com" target="_blank">anivar.aravind@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div>As Deepa pointed , Your proposal still lags understanding of issue of<br>
Language Modeling and Acoustic modeling, its complexities ,<br>
linguistical challenges , and possible issues which need to be<br>
addressed.  It only talks about your awareness/familiarity with tools.<br>
<br>
Since there is not much understanding/domain study , the timeline also<br>
lacks clarity.<br>
</div><div>You have only limited time  do background study and improve your proposal .<br>
<br>
</div><span><font color="#888888">Anivar<br>
</font></span><div><div><br>
<br>
On 5/1/13, Anivar Aravind <<a href="mailto:anivar.aravind@gmail.com" target="_blank">anivar.aravind@gmail.com</a>> wrote:<br>
> As Deepa pointed , Your proposal still lags understanding of issue of<br>
> Language Modeling and Acoustic modeling, its complexities ,<br>
> linguistical challenges , and possible issues which need to be<br>
> addressed.  It only talks about your awareness/familiarity with tools<br>
> .<br>
><br>
> Since there is not much understanding/domain study , the timeline also<br>
> lacks clarity.  These<br>
><br>
> You have  only limited time  do background study and improve your proposal<br>
> .<br>
><br>
><br>
> --<br>
> "[It is not] possible to distinguish between 'numerical' and 'nonnumerical'<br>
> algorithms, as if numbers were somehow different from other kinds of<br>
> precise<br>
> information." - Donald Knuth<br>
><br>
<br>
<br>
--<br>
"[It is not] possible to distinguish between 'numerical' and 'nonnumerical'<br>
algorithms, as if numbers were somehow different from other kinds of precise<br>
information." - Donald Knuth<br>
_______________________________________________<br>
Swathanthra Malayalam Computing discuss Mailing List<br>
Project: <a href="https://savannah.nongnu.org/projects/smc" target="_blank">https://savannah.nongnu.org/projects/smc</a><br>
Web: <a href="http://smc.org.in" target="_blank">http://smc.org.in</a> | IRC : #smc-project @ freenode<br>
<a href="mailto:discuss@lists.smc.org.in" target="_blank">discuss@lists.smc.org.in</a><br>
<a href="http://lists.smc.org.in/listinfo.cgi/discuss-smc.org.in" target="_blank">http://lists.smc.org.in/listinfo.cgi/discuss-smc.org.in</a><br>
<br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>