<div dir="ltr"><div><div><div><br></div>First, please read through this very very useful instructions. The document is lengthy but very elaborate and worth reading.<br><a href="http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3">http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3</a><br>
<br><br>There are a few tools available at this page that will help you make box files from image files and other jobs:<br><br><a href="http://code.google.com/p/tesseract-ocr/wiki/AddOns#Tesseract_box_editors_and_traning_tools">http://code.google.com/p/tesseract-ocr/wiki/AddOns#Tesseract_box_editors_and_traning_tools</a><br>
<br><br></div>Thanks,<br></div>Viswam<br><br><div><div><br><br></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Sun, Oct 20, 2013 at 8:55 PM, Anivar Aravind <span dir="ltr"><<a href="mailto:anivar.aravind@gmail.com" target="_blank">anivar.aravind@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear Baiju ,<br>
<br>
Have you checked this<br>
<a href="https://code.google.com/p/tesseractindic/downloads/detail?name=tesseract_trainer.beta.tar.gz&can=2&q=" target="_blank">https://code.google.com/p/tesseractindic/downloads/detail?name=tesseract_trainer.beta.tar.gz&can=2&q=</a><br>
<br>
THis is a tool to automatically generate the files required by<br>
tesseract-ocr for adding support to a new script. This tool takes as<br>
input a file containing all characters of the alphabet, and a<br>
directory of all different fonts. It then generates several tif images<br>
and corresponding box files, and then proceeds to generate the 5<br>
training files:<br>
<br>
inttemp<br>
normproto<br>
unicharset<br>
Microfeat<br>
pffmtable<br>
<br>
<br>
I dont know all of them needed for cyrrent version .<br>
But I think it is worth to go through Debayan's previous work at<br>
<a href="https://sites.google.com/site/debayanin/hackingtesseract" target="_blank">https://sites.google.com/site/debayanin/hackingtesseract</a><br>
<div class="HOEnZb"><div class="h5"><br>
On Sun, Oct 20, 2013 at 8:43 PM, Baiju M <<a href="mailto:baiju.m.mail@gmail.com">baiju.m.mail@gmail.com</a>> wrote:<br>
> Hi,<br>
><br>
> I am trying to create a boxfile for tesseract. My current target is<br>
> to recognize Rachana typeface. I am experimenting with LibreOffice to<br>
> create a sample TIF file using some Malayalam text.<br>
><br>
> In LibreOffice, what's happening when we use<br>
> Format->Character->Position->Spacing->Expanded for Malayalam<br>
> characters ? What's the logic to identify a character ?<br>
><br>
> Can I get something similar using Pango or any other tool which I can<br>
> use as a library (C/Python) or command-line which does similar to<br>
> LibreOffice ?<br>
><br>
> So far I am fine with result of LibreOffice, but I would like to use<br>
> something which I can automate.<br>
><br>
> Regards,<br>
> Baiju M<br>
> _______________________________________________<br>
> Swathanthra Malayalam Computing discuss Mailing List<br>
> Project: <a href="https://savannah.nongnu.org/projects/smc" target="_blank">https://savannah.nongnu.org/projects/smc</a><br>
> Web: <a href="http://smc.org.in" target="_blank">http://smc.org.in</a> | IRC : #smc-project @ freenode<br>
> <a href="mailto:discuss@lists.smc.org.in">discuss@lists.smc.org.in</a><br>
> <a href="http://lists.smc.org.in/listinfo.cgi/discuss-smc.org.in" target="_blank">http://lists.smc.org.in/listinfo.cgi/discuss-smc.org.in</a><br>
><br>
_______________________________________________<br>
Swathanthra Malayalam Computing discuss Mailing List<br>
Project: <a href="https://savannah.nongnu.org/projects/smc" target="_blank">https://savannah.nongnu.org/projects/smc</a><br>
Web: <a href="http://smc.org.in" target="_blank">http://smc.org.in</a> | IRC : #smc-project @ freenode<br>
<a href="mailto:discuss@lists.smc.org.in">discuss@lists.smc.org.in</a><br>
<a href="http://lists.smc.org.in/listinfo.cgi/discuss-smc.org.in" target="_blank">http://lists.smc.org.in/listinfo.cgi/discuss-smc.org.in</a><br>
<br>
</div></div></blockquote></div><br></div>