Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^3: Sorting Vietnamese text

by pdenisowski (Acolyte)
on Dec 22, 2013 at 20:11 UTC ( #1068116=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Sorting Vietnamese text
in thread Sorting Vietnamese text

Correction: should be

m số : password, code

m : to carry in the arms

ấm ch : teapot

ầm : loud, noisy


Comment on Re^3: Sorting Vietnamese text
(accidental duplicate, please reap)
by moritz (Cardinal) on Dec 22, 2013 at 21:03 UTC
Re^4: Sorting Vietnamese text
by moritz (Cardinal) on Dec 22, 2013 at 21:04 UTC

    I think that getting Unicode::Collate to work would be the best approach, but here's a hand-rolled one that seems to work the way you want it:

    use utf8;
    use 5.014;
    use warnings;
    use List::Util qw/min/;
    binmode STDOUT, ':encoding(UTF-8)';
    
    my %order;
    {
        my $source = join '', 'aáàảãạăaáàảãạăắ',
                     'ằẳẵặâấầẩẫậbcdđeéèẻẽẹêếềểễ',
                     'ệfghiíìỉĩịjklmnoóòỏõọôốồổ',
                     'ỗộơớờởỡợpqrstuúùủũụưứừửữự',
                     'vwxyýỳỷỹỵz';
        my $cnt = 0;
        $order{$_} = ++$cnt for split //, $source;
        sub vcmp($$) {
            my ($a, $b) = @_;
            for (0..min(length($a), length($b))) {
                my $cmp = ($order{substr $a, $_, 1} // 0)
                          <=> ($order{ substr $b, $_, 1 } // 0);
                return $cmp if $cmp != 0;
            }
            return length($a) <=> length($b);
        }
    }
    
    say for sort { vcmp($a, $b) } ('ầm', 'ãm', 'ấm chè', 'ám số');
    
    
      Thanks! That looks like it might be working correctly.

      Now to ask a dumb question - how do I replace the hard-coded list here

      say for sort { vcmp($a, $b) } ('ầm', 'm', 'ấm ch', 'm số');
      
      with an array like this:

      
      ỷ : (1) to be fat (said of a pig); (2) to depend on
      ỳ : inertia, state of inactivity, stay out, inert, sluggish
      ỳ ạch : to toil, labor with difficulty
      ỷ eo : reproach someone with something
      ỷ lại : to depend, rely on others
      ỷ thế : count on ones power, ones position, ones influence
      yu nhau : to love each other, be in love
      yu qu : precious, valuable
      

      read in from a text file using something like

      open(IN,"test.txt");
      @unsorted=<IN>;
      

      Sorry, I know that's a dumb question, but I keep getting errors saying "Global symbol XYZ requires explicit package name ..." when I try to do this.

      Thanks again!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1068116]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (13)
As of 2014-10-01 19:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (37 votes), past polls