Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Sorting help

by raj123 (Initiate)
on Jun 20, 2009 at 07:35 UTC ( #773196=perlquestion: print w/replies, xml ) Need Help??

raj123 has asked for the wisdom of the Perl Monks concerning the following question:

I want to sort the following text file;

<tag id="12">125</tag>
<tag id="17">15</tag>
<tag id="6">179</tag>
<tag id="7">2</tag>

in to

<tag id="7">2</tag>
<tag id="17">15</tag>
<tag id="12">125</tag>
<tag id="6">179</tag>

Please help me ASAP

Thanks in advance

raj

Replies are listed 'Best First'.
Re: Sorting help
by shmem (Chancellor) on Jun 20, 2009 at 08:07 UTC
Re: Sorting help
by Marshall (Abbot) on Jun 20, 2009 at 15:18 UTC
    The reply by shmem is right, but perhaps a bit advanced for what you need. I would suggest mastering basic sorting before moving into advanced techniques.

    I'll back up and explain sort a bit for you... If we just had: @data = sort @data, this will be just a straight line by line alphabetic sort of the @data list. You don't want that and need a special order.

    Sort allow you to specify subroutine that does a comparison function, returning either 1(a>b), 0(a=b), -1(a<b) (presumably this is different than the $a cmp $b default..Note the difference between $a <=> $b (numeric) and $a cmp $b (alphabetic) ). Perl "automagically" creates these $a and $b values for you. The job of the comparison subroutine is to figure out what to do with them. See the below code...

    #!/usr/bin/perl -w use strict; my @data = (<DATA>); print "Unsorted Data\n"; print @data; @data = sort { my ($tag_id_A,$tag_A) = $a =~ m/(\d+)/g; my ($tag_id_B,$tag_B) = $b =~ m/(\d+)/g; $tag_A <=> $tag_B or $tag_id_A <=> $tag_id_B }@data; print "Sorted Data\n"; print @data; #prints: #Unsorted Data #<tag id="12">125</tag> #<tag id="9">125</tag> #<tag id="17">15</tag> #<tag id="6">179</tag> #<tag id="7">2</tag> #Sorted Data #<tag id="7">2</tag> #<tag id="17">15</tag> #<tag id="9">125</tag> #<tag id="12">125</tag> #<tag id="6">179</tag> __DATA__ <tag id="12">125</tag> <tag id="9">125</tag> <tag id="17">15</tag> <tag id="6">179</tag> <tag id="7">2</tag>
    What happens above is that Perl gives the sort comparison function pairs of lines which it calls $a and $b. I use a match global expression to get the 2 numbers on each of the $a and $b lines. Then comes a comparison section of those values which is just a big logic expression that is nicely formatted on several lines. It uses the 2nd number as the primary sort key, if they are equal (compare is numeric 0), then the second comparison will be executed. This means that "ties" are broken by the first number on the line. I added a case for that in your data. You might go, Hey! This is a subroutine, where is the "return()" statement? By default, Perl returns the value of the last statement in a sub. Normally you would have an explicit return, but this is an exception to the "rule". Here that would "look messy" and therefore from a style point of view, it is not done.

    The post by shmem uses a technique called a "Schwartzian transform". The idea is to pre-compute all of the stuff used to extract the numbers from the line in advance so that we don't have to do it every time that sort wants to compare a couple of lines. For lists of some size, maybe around a couple of dozen things, this speeds things up. However, the above type of code is functionally identical and I hope for you easier to understand. The performance difference typically won't matter (Perl is very good a regular expressions).

    Good luck and happy sorting.

    Update:
    To make it more clear that we are supplying a sub to sort, you can write it like below. This is the way to do it when you have to sort a bunch of different lists, but by the same criteria. The above code uses an anonymous subroutine (a sub which has no mame)..This uses a name for the comparison sub.

    @data = sort by_tags @data; sub by_tags { my ($tag_id_A,$tag_A) = $a =~ m/(\d+)/g; my ($tag_id_B,$tag_B) = $b =~ m/(\d+)/g; $tag_A <=> $tag_B or $tag_id_A <=> $tag_id_B }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://773196]
Approved by herveus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2019-11-18 19:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Strict and warnings: which comes first?



    Results (92 votes). Check out past polls.

    Notices?