opensubscriber
   Find in this group all groups
 
Unknown more information…

b : bengalinux-core@lists.sourceforge.net 6 January 2005 • 8:08PM -0500

[Ankur-core] [Fwd: [indic] Indic sorting ordering]
by Sayamindu Dasgupta

REPLY TO AUTHOR
 
REPLY TO GROUP






-------- Forwarded Message --------
From: Mark Davis <mark.davis@jtcs...>
To: indic@unic...
Subject: [indic] Indic sorting ordering
Date: Tue, 4 Jan 2005 10:53:31 -0800
We have gotten several bugs on Indic collation order (used for sorting,
searching, and matching), and would like to get more information. Most of
the bugs appear to be based on the sorting order of a dictionary going
between the target language and English. We'd like confirmation as to
whether these are indeed the correct orderings to use for these languages.
If you have any more information about the sorting order in these languages,
would you please file a reply to one of the bugs listed below. See also some
additional notes at the bottom of this message.

Tamil:
http://www.jtcsv.com/cgibin/locale-bugs?findid=414
http://www.jtcsv.com/cgibin/locale-bugs?findid=457
(We would especially appreciate feedback from INFITT and Tamil Nadu on these
bugs.)

Punjabi:
http://www.jtcsv.com/cgibin/locale-bugs?findid=413

Assamese:
http://www.jtcsv.com/cgibin/locale-bugs?findid=420

Bengali:
http://www.jtcsv.com/cgibin/locale-bugs?findid=421

Gujarati:
http://www.jtcsv.com/cgibin/locale-bugs?findid=422

Hindi:
http://www.jtcsv.com/cgibin/locale-bugs?findid=423

Kannada:
http://www.jtcsv.com/cgibin/locale-bugs?findid=424

Konkani:
http://www.jtcsv.com/cgibin/locale-bugs?findid=425

Malayalam:
http://www.jtcsv.com/cgibin/locale-bugs?findid=426

Marathi:
http://www.jtcsv.com/cgibin/locale-bugs?findid=427

Oriya:
http://www.jtcsv.com/cgibin/locale-bugs?findid=428

Sanskrit:
http://www.jtcsv.com/cgibin/locale-bugs?findid=429

Telugu:
http://www.jtcsv.com/cgibin/locale-bugs?findid=430

---

Notes:

1. The author filed the bugs using hex notion for the characters. You can
reformat that as characters for readability by copying the contents of the
bug, going to http://oss.software.ibm.com/cgi-bin/icu/tr, pasting in the
"Input" box, selecting "Any" in Source1, selecting "NFC" in Target1, and
hitting the "Transform" button.

2. The XML notation used in the bugs, for those of you unfamiliar with it,
looks like the following (this is in the bug against Gujarati):

<reset>ૐ</reset>
<p>ં</p>
<s>ઁ</s>
<p>ઃ</p>

This means:

<reset>ૐ</reset> // after U+0AD0 (ૐ) GUJARATI OM

<p>ં</p> // put U+0A82 ( ં ) GUJARATI SIGN ANUSVARA in as a primary (letter)
difference,

<s>ઁ</s> // then U+0A81 (  ઁ ) GUJARATI SIGN CANDRABINDU as a secondary
(accent) difference,

<p>ઃ</p> // then U+0A83 ( ઃ ) GUJARATI SIGN VISARGA as a primary difference.

These would be applied on top of the current sorting order. For Gujarati it
is the default Unicode collation order given by
(http://www.unicode.org/charts/collation/), and further customized by
http://unicode.org/cldr/data/common/collation/gu.xml

‎Mark




--

All men dream: but not equally. Those who dream by night in the dusty
recesses of their minds wake in the day to find that it was vanity: but
the dreamers of the day are dangerous men, for they may act their dream
with open eyes to make it possible.

Seven Pillars of Wisdom



-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Bengalinux-core mailing list
Bengalinux-core@list...
https://lists.sourceforge.net/lists/

Bookmark with:

Delicious   Digg   reddit   Facebook   StumbleUpon

opensubscriber is not affiliated with the authors of this message nor responsible for its content.