[smc-discuss] ചില്ലക്ഷരങ്ങളുടെ രാഷ്ട്രീയം

Santhosh Thottingal santhosh.thottingal at gmail.com
Tue Jan 18 09:11:47 PST 2011


2011/1/18 Jayadevan Raja <jayadevanraja at gmail.com>:
> @Santhosh: Your reply was really informative, detailed and excellent,
> showing the major concerns of people who use unicode, around the world.
>
> But the implementation of Unicode string searches and comparisons in text
> processing software must take into account the presence of equivalent code
> points. In the absence of this feature, users searching for a particular
> code point sequence would be unable to find other visually indistinguishable
> glyphs that have a different, but canonically equivalent, code point
> representation.

Yes, Malayalam already has canonically equivalent code points and it
is properly implemented in GNU/Linux.(I wrote that for GNU C library).
Search and Collation on Malayalam works on GNU/Linux based on this
rules.
You can try this in your latest gnu/linux.
eg:  കോ   ===  കേ + ാ  === ക + ോ

Btw, AFAIK, MS windows does not implement this.

>
> So, aren't atomic chills and non-atomic (composite) chills in Malayalam
> canonically equivalent?

Canonical equivalence is not defined between both type of chillu


>If we use "Normalization Form Canonical Composition"
> everywhere, and consider the chills to be composite characters, isn't the
> problem solved? Isn't the same issue there in almost all major scripts other
> than basic Latin?

Please read  http://thottingal.in/blog/2008/06/02/canonical-equivalence-in-unicode-some-notes/


Thanks
Santhosh Thottingal



More information about the discuss mailing list