w:en:Unicode collation algorithm

{{Short description|String collation algorithm}}

__NOTOC__

The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from strings representing text in any writing system and language that can be represented with Unicode. These keys can then be efficiently compared byte by byte in order to collate or sort them according to the rules of the language, with options for ignoring case, accents, etc.{{Cite web |last1=Whistler |first1=Ken |last2=Scherer |first2=Markus |last3=Davis |first3=Mark |author-link3=Mark Davis (Unicode) |date=2022-08-26 |title=UTS #10: Unicode Collation Algorithm |url=https://www.unicode.org/reports/tr10/ |access-date=2023-08-16 |website=Unicode}}

Unicode Technical Report #10 also specifies the Default Unicode Collation Element Table (DUCET). This data file specifies a default collation ordering. The DUCET is customizable for different languages,{{Cite book |last=Hosken |first=Martin |url=https://scriptsource.org/cms/scripts/render_download.php?format=file&media_id=..%2Fsites%2Fs%2Fmedia%2Fdatabase%2Fssproto%2Fentries%2Fpn%2Frn%2Fpnrnlhkrq9_sort_tutorial.pdf&filename=sort_tutorial.pdf |title=Unicode Sort Tailoring: Tutorial |date=2021-09-23 |publisher=SIL Writing Systems Technology |edition=1.3 |pages=2–3 |access-date=2023-08-16}} and some such customizations can be found in the Unicode Common Locale Data Repository (CLDR).{{Cite web |title=CLDR Releases/Downloads |url=https://cldr.unicode.org/index/downloads |access-date=2023-08-16 |website=Unicode CLDR |language=}}

An open source implementation of UCA is included with the International Components for Unicode, ICU.{{Cite web |title=ICU - International Components for Unicode |url=https://icu.unicode.org/home |access-date=2023-08-16 |website=Unicode}}{{Cite web |title=Collations |url=https://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.sqlanywhere.12.0.1/dbadmin/natlang-s-7003956.html |access-date=2023-08-16 |website=SyBooks Online}} ICU supports tailoring, and the collation tailorings from CLDR are included in ICU.{{Cite web |title=Customization |url=https://unicode-org.github.io/icu/userguide/collation/customization/ |access-date=2023-08-16 |website=ICU Documentation |language=}}

See also

References

=Tools=

  • [https://icu4c-demos.unicode.org/icu-bin/locexp?_=en_US&x=col ICU Locale Explorer] An online demonstration of the Unicode Collation Algorithm using International Components for Unicode
  • [https://icu4c-demos.unicode.org/icu-bin/collation.html An ICU collation demo]
  • [http://billposer.org/Software/msort.html msort] A sort program that provides an unusual level of flexibility in defining collations and extracting keys.

{{Unicode navigation}}

Category:String collation algorithms

Collation

Category:Collation

{{algorithm-stub}}

{{standard-stub}}