[smc-discuss] Boxfile for tesseract and Malayalam letters with expanded spacing

ViswaPrabha (വിശ്വപ്രഭ) viswaprabha at gmail.com
Sun Oct 20 08:37:58 PDT 2013


First, please read through this very very useful instructions. The document
is lengthy but very elaborate and worth reading.
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3


There are a few tools available at this page that will help you make box
files from image files and other jobs:

http://code.google.com/p/tesseract-ocr/wiki/AddOns#Tesseract_box_editors_and_traning_tools


Thanks,
Viswam





On Sun, Oct 20, 2013 at 8:55 PM, Anivar Aravind <anivar.aravind at gmail.com>wrote:

> Dear Baiju ,
>
> Have you checked this
>
> https://code.google.com/p/tesseractindic/downloads/detail?name=tesseract_trainer.beta.tar.gz&can=2&q=
>
> THis is a tool to automatically generate the files required by
> tesseract-ocr for adding support to a new script. This tool takes as
> input a file containing all characters of the alphabet, and a
> directory of all different fonts. It then generates several tif images
> and corresponding box files, and then proceeds to generate the 5
> training files:
>
> inttemp
> normproto
> unicharset
> Microfeat
> pffmtable
>
>
> I dont know all of them needed for cyrrent version .
> But I think it is worth to go through Debayan's previous work at
> https://sites.google.com/site/debayanin/hackingtesseract
>
> On Sun, Oct 20, 2013 at 8:43 PM, Baiju M <baiju.m.mail at gmail.com> wrote:
> > Hi,
> >
> > I am trying to create a boxfile for tesseract.  My current target is
> > to recognize Rachana typeface. I am experimenting with LibreOffice to
> > create a sample TIF file using some Malayalam text.
> >
> > In LibreOffice, what's happening when we use
> > Format->Character->Position->Spacing->Expanded for Malayalam
> > characters ? What's the logic to identify a character ?
> >
> > Can I get something similar using Pango or any other tool which I can
> > use as a library (C/Python) or command-line which does similar to
> > LibreOffice ?
> >
> > So far I am fine with result of LibreOffice, but I would like to use
> > something which I can automate.
> >
> > Regards,
> > Baiju M
> > _______________________________________________
> > Swathanthra Malayalam Computing discuss Mailing List
> > Project: https://savannah.nongnu.org/projects/smc
> > Web: http://smc.org.in | IRC : #smc-project @ freenode
> > discuss at lists.smc.org.in
> > http://lists.smc.org.in/listinfo.cgi/discuss-smc.org.in
> >
> _______________________________________________
> Swathanthra Malayalam Computing discuss Mailing List
> Project: https://savannah.nongnu.org/projects/smc
> Web: http://smc.org.in | IRC : #smc-project @ freenode
> discuss at lists.smc.org.in
> http://lists.smc.org.in/listinfo.cgi/discuss-smc.org.in
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.smc.org.in/pipermail/discuss-smc.org.in/attachments/20131020/2b7c8cd0/attachment-0002.htm>


More information about the discuss mailing list