Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^2: getting Unicode character names from string

by Jim (Curate)
on Oct 11, 2012 at 18:23 UTC ( #998507=note: print w/ replies, xml ) Need Help??


in reply to Re: getting Unicode character names from string
in thread getting Unicode character names from string

If the goal is to just do a comparison of the base code-points, you should probably be using Unicode::Collate, at level 1: "alphabetic ordering". The next higher level provides "diacritic ordering", followed by "case ordering" (which combines the previous levels), and finally "tie-breaking".

Are there clear examples of how to do this in the documentation? Or elsewhere on the interwebs? Nothing's jumping out at me when I look/search. Thanks!


Comment on Re^2: getting Unicode character names from string
Re^3: getting Unicode character names from string
by davido (Archbishop) on Oct 11, 2012 at 20:51 UTC

    The synopsis for Unicode::Collate does a reasonable job of setting the stage, but there is a nice discussion in chapter 6 of Programming Perl (the camel book), 4th edition as well. You might also look at the Unicode Technical Standard #10: Unicode Collation Algorithm.

    Here's a brief example of doing comparisons at a lower (more relaxed) level using Unicode::Collate.

    use strict;
    use warnings FATAL => 'utf8';
    use utf8;
    
    use Unicode::Collate;
    binmode STDOUT, ':encoding(UTF-8)';
    
    my( $x, $y, $z ) = qw( α ά ὰ );
    
    my $c = Unicode::Collate->new;
    
    print "\nStrict collation rules: Level 4 (default)\n";
    print "\t cmp('α','ά'): ", $c->cmp( $x, $y ), "\n";
    print "\t cmp('ά','ὰ'): ", $c->cmp( $y, $z ), "\n";
    print "\t cmp('α','ὰ'): ", $c->cmp( $x, $z ), "\n";
    
    my $rc = Unicode::Collate->new( level => 1 );
    
    print "\nRelaxed collation rules: Level 1\n";
    print "\t cmp('α','ά'): ", $rc->cmp( $x, $y ), "\n";
    print "\t cmp('ά','ὰ'): ", $rc->cmp( $y, $z ), "\n";
    print "\t cmp('α','ὰ'): ", $rc->cmp( $x, $z ), "\n\n";
    

    And the output...

    
    Strict collation rules: Level 4 (default)
    	 cmp('α','ά'): -1
    	 cmp('ά','ὰ'): -1
    	 cmp('α','ὰ'): -1
    
    Relaxed collation rules: Level 1
    	 cmp('α','ά'): 0
    	 cmp('ά','ὰ'): 0
    	 cmp('α','ὰ'): 0
    
    

    And if the reason for doing comparisons is to handle sorting, Unicode::Collate does that too (you don't need to explicitly use Perl's core sort).


    Dave

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://998507]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (15)
As of 2014-10-23 16:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (125 votes), past polls