Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

It is arguable that the most experienced body when it comes to dealing with the problems of different character sets, diacriticals and their influence on collation ordering is the European Union.

Here are the official EU collation sequences for its member countries:

  • Bulgarian: АБВ Г Д ЕЖЗИЙ К Л МН ОП РС Т УФХЦЧШЩЪЬЮЯ
  • Serbia & Montenegro: AÁ BCČ DĎ EÉEFG H IÍ JK L MNŇ OÓ P RŘSŠ TŤ UÚUŮVWXY ZŽ
  • Denmark: A BC D E FG H I JK L MN O PQR S T U VWXY ZÆØÅ
  • Germany: A BC D E FG H I JK L MN O PQR S T U VWXY Z
  • Estonia: A BC D E FG H I JK L MN O PQR SŠZŽT U VWÕÄÖÜXY
  • Greece: Α ΒΓ Δ ΕΖ ΗΘ Ι ΚΛ ΜΝΞ ΟΠ ΡΣ Τ ΥΦΧΨCEGIPRSU
  • United Kingdom: A BC D E FG H I JK L MN O PQR S T U VWXY Z
  • Spain: A BC D E FG H I JK L MN O PQR S T U VWXY Z
  • France: A BC D E FG H I JK L MN O PQR S T U VWXY Z
  • Ireland: AÁ BC D E FG H IÍ L MN OÓ PQR S T UÚ VWXY Z
  • Italy: A BC D E FG H I JK L MN O PQR S T U VWXY Z
  • Lithuania: AĄ BCČ D EĘĖFG H IĮYJK L MN O P R SŠ T UŲŪ V ZŽ
  • Latvia: AĀ BCČ D EĒ FGĢ H IĪ JKĶLĻ MNŅ OO P R SŠ T UŪ V ZŽ
  • Hungary: AÁ BCCsDDzDzsEÉ FG H IÍ JK L MNNyOÓÖŐPQR SSz TTyUÚÜŰV ZZs
  • Malta: A BĊ D E FĠGGħHĦ IIEJK L MN O PQR S T U VWX ŻZ
  • Holland: A BC D E FG H I JK L MN O PQR S T U VWXY Z
  • Poland: AA BCĊ D EĘ FG H I JK LŁ MNŃ OÓ P R SŚ T U W Y ZŻŹ
  • Portugal: A BC D E FG H I JK L MN O PQR S T U VWXY Z
  • Romania: AĂÂBC D E FG H IÎ JK L MN O P R SŞ TŢ U V XY Z
  • Slovak Republic: AÁÄBCČ DĎDzDžEÉ FG HChIÍ JK LĹĽMNŇ OÓÔ PQRŔSŠ TŤ UÚ VWXYÝZŽ
  • Slovenia: A BCČ D E FG H I JK L MN O P R SŠ T U V ZŽ
  • Sweden: A BC D E FG H I JK L MN O PQR S T U VWXY Z ÅÄÖ
  • Finland: A BC D E FG H I JK L MN O PQR S T U VWXY Z ÄÖ

As best I can tell, your invented ordering would be incorrect for every country excepting possibly the UK. In particular note how Denmark, Sweden and Finland order those characters with diacriticals at the end.

Over thirty years ago (or more, its not clear), realising that there is no way to resolve the disparate expectations of all the member countries, the EU took the pragmatic approach to solving this problem.

Using Accented and Other Special Characters in Searching

The EU Inventories contain data in all Community languages except Greek and many of these languages contain accented characters in their alphabet.

All words containing accented characters are displayed as such in both WinSPIRS and WebSPIRS. For the former, you may need to choose a font other than the default font if it does not support the ISO 8859-1 (Latin alphabet No. 1) character set (known elsewhere in this database compendium as ISO Latin-1) for display/printing. All words containing accented or foreign characters (as well as a to z and A to Z) are converted to their upper case equivalents and then indexed as such. The collating sequence chosen for all indices in all languages is that for ISO Latin-1 except that all terms beginning with a numeric character appear at end. This has been done to provide ease and consistency in a multi-lingual and multi-database (i.e. when two or more databases from different languages are selected for retrieval) environment.

The actual collating sequence or character order in all indices is:

-, ., A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z, À, Á, Â, Ã, Ä, Å, Æ, Ç, È, É, Ê, Ë, Ì, Í, Î, Ï, Ñ, Ò, Ó, Ô, Õ, Ö, Ø, Ù, Ú, Û, Ü, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^2: best sort by BrowserUk
in thread best sort by ag4ve

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2024-03-28 19:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found