comment on

The synopsis for Unicode::Collate does a reasonable job of setting the stage, but there is a nice discussion in chapter 6 of Programming Perl (the camel book), 4th edition as well. You might also look at the Unicode Technical Standard #10: Unicode Collation Algorithm.

Here's a brief example of doing comparisons at a lower (more relaxed) level using Unicode::Collate.

use strict;
use warnings FATAL => 'utf8';
use utf8;

use Unicode::Collate;
binmode STDOUT, ':encoding(UTF-8)';

my( $x, $y, $z ) = qw( α ά ὰ );

my $c = Unicode::Collate->new;

print "\nStrict collation rules: Level 4 (default)\n";
print "\t cmp('α','ά'): ", $c->cmp( $x, $y ), "\n";
print "\t cmp('ά','ὰ'): ", $c->cmp( $y, $z ), "\n";
print "\t cmp('α','ὰ'): ", $c->cmp( $x, $z ), "\n";

my $rc = Unicode::Collate->new( level => 1 );

print "\nRelaxed collation rules: Level 1\n";
print "\t cmp('α','ά'): ", $rc->cmp( $x, $y ), "\n";
print "\t cmp('ά','ὰ'): ", $rc->cmp( $y, $z ), "\n";
print "\t cmp('α','ὰ'): ", $rc->cmp( $x, $z ), "\n\n";

And the output...


Strict collation rules: Level 4 (default)
	 cmp('α','ά'): -1
	 cmp('ά','ὰ'): -1
	 cmp('α','ὰ'): -1

Relaxed collation rules: Level 1
	 cmp('α','ά'): 0
	 cmp('ά','ὰ'): 0
	 cmp('α','ὰ'): 0

And if the reason for doing comparisons is to handle sorting, Unicode::Collate does that too (you don't need to explicitly use Perl's core sort).

Dave

In reply to Re^3: getting Unicode character names from string by davido
in thread getting Unicode character names from string by csthflk

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Clear questions and runnable code get the best and fastest answer
	PerlMonks