Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^4: Perl & Unicode: state of the art?

by Jenda (Abbot)
on Oct 08, 2013 at 14:53 UTC ( #1057421=note: print w/ replies, xml ) Need Help??


in reply to Re^3: Perl & Unicode: state of the art?
in thread Perl & Unicode: state of the art?

Let's assume Czech. Dots are used after numbers to denote ordinals (1. = 1st, 8. = 8th, etc.). Dots are used between numbers in dates (3.9.1975 or 3. zř 1975 = September 3rd 1975). Dots are used at the end of abbreviations, though to make things harder some abbreviations are so common that they do not need the dot and if there's an abbreviation at the end of a sentence, you do not double the dot. And of course some sentences end by a question or exclamation mark.

Anything that would not take into account the language would fail on any nontrivial text. Even if it did take the language into account, there would be "false positives" and missed sentences.

Jenda
Enoch was right!
Enjoy the last years of Rome.


Comment on Re^4: Perl & Unicode: state of the art?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1057421]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2014-09-21 17:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (173 votes), past polls