Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

how do I sort numerically on sections of data that is alphanumeric

by shaezi (Acolyte)
on Jun 26, 2001 at 22:05 UTC ( [id://91689]=perlquestion: print w/replies, xml ) Need Help??

shaezi has asked for the wisdom of the Perl Monks concerning the following question:

Here are a couple of lines in the array that I'm trying to sort:

A 00006360 A001002002 000000000.00
A 00006360 A001012012 000000000.00
A 00006360 A001054054 000000000.00
A 00005460 A001102002 000000000.00
A 00007360 A003015015 000000000.00
there are 4 columns in each line, and I'm trying to sort (in ascending order) first by the numbers in column two and then by the numbers in column three. The A at the beginning of the numbers in column three needs to be disregarded. I'm new to PERL so I'm having a hard time trying to figure out how to do this. Any help would be greatly appreciated!
Thanks!
shaezi

  • Comment on how do I sort numerically on sections of data that is alphanumeric

Replies are listed 'Best First'.
(Ovid) Re: how do I sort numerically on sections of data that is alphanumeric
by Ovid (Cardinal) on Jun 26, 2001 at 22:26 UTC

    Does this work for you?

    use strict; use Data::Dumper; my @data; while (<DATA>) { chomp; push @data, [ split '\s', $_ ]; } @data = sort { $a->[1] <=> $b->[1] or $a->[2] <=> $b->[2] } @data; print Dumper \@data; __DATA__ A 00006360 A001002002 000000000.00 A 00006360 A001012012 000000000.00 A 00006360 A001054054 000000000.00 A 00005460 A001102002 000000000.00 A 00007360 A003015015 000000000.00

    I'm not sure what you meant by having 'A' in the third column disregarded. If you meant you didn't want it in the data, change the while loop to the following:

    while (<DATA>) { chomp; push @data, [ split '\s', $_ ]; $data[ -1 ][ 2 ] = substr( $data[ -1 ][ 2 ], 1 ); }

    If you want it included but disregarded in the sort, change the sort to the following:

    @data = sort { $a->[1] <=> $b->[1] or substr( $a->[2], 1) <=> substr( $b->[2], 1) } @data;

    Cheers,
    Ovid

    Vote for paco!

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re: how do I sort numerically on sections of data that is alphanumeric
by MZSanford (Curate) on Jun 26, 2001 at 22:43 UTC
    Based on the prior reply, i was thinking about way to work through this. The prior example work great, but the last time i was doing something with data sorted and sub-sorted by fiedls, i went with the following hash-of-a-hash approach. If you do not need the other data, the hash assignment can be changed to equal "1" as oppoed to $_. Once again, this is only really nessasary if you are doing complex aggregate reporting and sorting on the data ... otherwise i would do the above, as it should be faster.
    my %HASH; while (<DATA>) { chomp; $HASH{substr($_,2,8)}{substr($_,12,9)} = $_; } foreach my $col_2_num (sort keys %HASH) { foreach my $col_3_num (sort keys %{ $HASH{$col_2_num} }) { print "$col_2_num : $col_3_num : '$HASH{$col_2_num}{$col_3_num +}'\n"; } }

    - Matt-Fu (MZSanford)
Re: how do I sort numerically on sections of data that is alphanumeric
by nysus (Parson) on Jun 27, 2001 at 00:51 UTC

    Warning: Extreme Newbie Code Ahead!

    #!/usr/bin/perl -w use strict; my @array1; while (<DATA>) { push @array1, $_ } my @multiarray; for my $line (@array1) { push @multiarray, [split /\s/, $line]; } my @sort_ord = sort {$multiarray[$a][1] <=> $multiarray[$b][1] || $mul +tiarray[$a][2] cmp $multiarray[$b][2]} (0..$#array1); my @sorted_multiarray; my $count = 0; for my $order (@sort_ord) { $sorted_multiarray[$order] = $multiarray[$count]; $count++; }
    Sorry guys, I worked too hard on this not to post it. :-)

    $PM = "Perl Monk's";
    $MCF = "Most Clueless Friar Abbot";
    $nysus = $PM . $MCF;

Re: how do I sort numerically on sections of data that is alphanumeric
by cLive ;-) (Prior) on Jun 26, 2001 at 23:48 UTC
    I don't think map or external modules are apt for a newbie!

    Sorts can be placed in sub-routines to make your code simpler to read.

    $a and $b are special sort vars. Read this for more info. <=> compares strings numerically. If they are the same, it returns '0'.

    || (or) only goes to the second comparison if the first is zero.

    So, if you're sure the data is consistant:

    my @sorted_array = sort { sort_me(); } @array; sub sort_me { # get 2nd and 3rd nums my ($a_2,$a_3) = ($a =~ /A (\d+) A(\d+)/); my ($b_2,$b_3) = ($b =~ /A (\d+) A(\d+)/); # sort by second nums. If same, sort by 3rd $a_2 <=> $b_2 || $a_3 <=> $b_3; }

    cLive ;-)

      Even more appropriate for a newbie (because it's less complex and doesn't use regexes) might be:

      my @sorted_array = sort { # compare the second numbers substr($a, 12, 9) <=> substr($b, 12, 9) # compare the third numbers or substr($a, 22, 12) <=> substr($b, 22, 12) } @array;

      That is, for each pair of lines that's being compared, use substr to get the fixed fields, and compare them numerically. The first <=> will return non-0 if the numbers in the second field are different, and the third field won't be compared in that case (because of the or).

Re: how do I sort numerically on sections of data that is alphanumeric
by bikeNomad (Priest) on Jun 26, 2001 at 23:29 UTC
    If you really want the original lines (i.e. if that's your output), and you don't want to separate them into columns later, then you can use this:

    #!/usr/bin/perl -w use strict; my @original = <>; # or whatever. my @lines = map { $_->[0] } sort { $a->[1] <=> $b->[1] or $a->[2] <=> $b->[2] } map { [ $_, unpack('x12a9xa12', $_) ] } @original;

    This is the Schwartzian Transform, with the use of unpack to undo the fixed fields. unpack() is more efficient than multiple substr() calls. Of course, if you want also to unpack the columns, you can simplify it:

    #!/usr/bin/perl -w use strict; my @original = <>; # or whatever. my @lines = sort { $a->[0] <=> $b->[0] or $a->[1] <=> $b->[1] } map { [ unpack('x12a9xa12', $_) ] } @original;

    This leaves you with an array of array references, with the fields already separated. Of course, you'd change the unpack string and array indices to include the data you were interested in.

    Anyway, if you're dealing with fixed-field data, I'd recommend looking at unpack to separate the fields.

Re: how do I sort numerically on sections of data that is alphanumeric
by dragonchild (Archbishop) on Jun 26, 2001 at 23:54 UTC
    First off, an important thing to note is whether or not you're using a single-dimensional array where each element is a string of the form "A ######## A######### #########.##" or a two-dimensional array where $array31 would be "00005460" (in your above example).

    Now, you could sort this as a single-dimensional array (and, in fact, some elegant sorting algorithms do so), but I would recommend converting this to a two-dimensional array.

    There are a number of ways to do this. The best way is to use unpack. So, you'd have something like:

    my @new_array = (); foreach my $element (@array) { my @temp = unpack("A A8 A1A10 A12", $element); push @new_array, \@temp; }

    Once that's done, I suggest doing a sort like:

    my @sorted_array = sort { $a->[1] <=> $b->[1] || $a->[3] <=> $b->[3] } @new_array;

    (Yes, I know I'm 2 hrs behind the curve ... I started typing right away, but meetings suck!)

Re: how do I sort... the KISS principle
by dvergin (Monsignor) on Jun 27, 2001 at 01:56 UTC
    Short answer:        @sorted = sort @unsorted Discussion:
    There are some marvelously creative solutions proposed here. But I must be missing something.

    IF
    The 'A' characters always appear in the two places illustrated...
    and the data is always zero-padded as shown...
    and you want to sort first by the second column...
    and then by the third column...
                                Update: shaezi has confirmed these assumptions.

    Won't a standard sort do the job simply and efficiently?

    my @unsorted = ( 'A 00006360 A001054054 000000000.00', 'A 00006360 A001002002 000000000.00', 'A 00006360 A001012012 000000000.00', 'A 00005460 A001102002 000000000.00', 'A 00007360 A003015015 000000000.00'); my @sorted = sort @unsorted; print "$_\n" for @sorted;
    dragonchild mentioned the single-dimensioned array possibility but then moved away from it. And no one else seems to have considered doing it this way. So it is with some trepidation that I offer this solution. But...
Re: how do I sort numerically on sections of data that is alphanumeric
by Brovnik (Hermit) on Jun 26, 2001 at 23:44 UTC
    Or, there is the Schwartzian Transform.
    # array is in @sorted my @sorted = map {$_->[0] } sort {$a->[2] <=> $b->[2] or substr($a->[3],1) <=> substr($b->[3],1)} map {[$_,split)] } @unsorted;
    Note: If the 'A' in column 3 is always 'A' and the columns are null padded, then you don't need the substr and can use
    sort { $a->[2] <=> $b->[2] or $a->[3] cmp $b->[3] }
    Note the switch to cmp since this now a string.
    --
    Brovnik
Re: how do I sort numerically on sections of data that is alphanumeric
by shaezi (Acolyte) on Jun 27, 2001 at 02:16 UTC
    Folks, first of all I would like to thank everyone who replied. Its brought me a long way in understanding how sorting can work in different ways. Some of them were a little advanced for a newbie :) I thought I'd explain the whole thing:
    The data is being read in from a file. I thought the easiest way to sort it this way would be to read it into an array first and then figure out a way to sort it like I had explained earlier.
    The layout of the data remains consistent and the numbers are always zero-padded.
    I want to sort using the numbers in the second column and then the numbers in the third column. The A in the third column needs to stay there but that column needs to be sorted using the numbers.
    Once the array is sorted I write it back to a file.
    I don't know if this was the simplest way of doing it, but thats how I wanted to go about it.

    Once again thanks all for the help!
    shaezi

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://91689]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-04-20 14:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found