[smc-discuss] Malayalam LaTeX pdf - copying text from

Nandakumar Edamana nandakumar at nandakumar.co.in
Mon Nov 4 20:55:41 PST 2019


On 11/5/19 8:49 AM, Sasi Kumar wrote:
> Just wondering if something can be done about this:
> https://tex.stackexchange.com/questions/464160/copying-text-from-pdf-created-using-xelatex-containing-malayalam-text?fbclid=IwAR14eWxPDdiXEZ91ZfCv2yytA-yT9yzoNQ2AqdF_aVTqq-eonJS92qFapa8

This same list contains some related older threads:
1. Aisan and other complex text language copy/conversion issue in PDF - 2017
2. PDF -Whether it is an input file or an output file? - 2018

I don't remember (or haven't re-read) the content, but its an issue with
the PDF generation. The generator preserves the appearance, but drops
the actual text.

There is similar difficulty with LibreOffice-produced PDF also, but it's
better than XeLaTeX.

I think the answer posted by nobert is not entirely correct. He is
talking about ligatures and visual replacement (like ക + ് + ക -> ക്ക),
if I understand correctly.But we cannot say that it's totally unrelated
to the issue. There is a pattern. It's ligatures that become broken when
copied from PDF, based on my quick experiments.

One workaround is including annotations in your PDF, which can surely be
copied. This way the PDF can contain both print-ready and copy-ready
content, although as two different things.

BTW, I notice the parameter 'fbclid' in the URL shared. I know it's a
Facebook tracking parameter, but does anybody know whether it is harmful
to the visitors' privacy? If so, how? A quick search didn't yield any
useful info.

Nandakumar Edamana

