venki has asked for the wisdom of the Perl Monks concerning the following question:

I have a record which are delimited by semicolons. Example
area1:place1:name1 area2:place2:name2
I do know that i can sort it using sort function which is going to sort by seeing the starting characters of the record. But i would like to sort by the second column ( place1 ) of the record. Is it possible by any existing function or do i need to write code to satisfy this criteria.

Thanks for helping me.

Edit by dws to add code tags and expand title

Replies are listed 'Best First'.
(RhetTbull) Re: Sorting comma-delimited records
by RhetTbull (Curate) on May 31, 2002 at 18:02 UTC
    First of all, those are colons, not semicolons. ;-) Sounds like a good place for a Schwartzian Transform by our own merlyn:
    #!/usr/bin/perl use strict; use warnings; my @data = <DATA>; chomp @data; my @sorted = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [ $_, (split /:/)[1] ] } @data; print "data = \n@data\n"; print "sorted = \n@sorted\n"; __DATA__ area1:place1:name1 area1:place4:name2 area3:place3:name3 area5:place2:name2
    data = area1:place1:name1 area1:place4:name2 area3:place3:name3 area5:place2: +name2 sorted = area1:place1:name1 area5:place2:name2 area3:place3:name3 area1:place4: +name2
    Update:For more information on the Schwartzian Transform, read Tom Christiansen's "Far More Than Everything You've Ever Wanted To Know About Sorting" paper.
    Update 2:Changed example data to make it more obvious what was going on.
      thanks. Great help! I really appreciate your timely help guys
Re: Sorting
by Beatnik (Parson) on May 31, 2002 at 17:38 UTC
    Try something like...
    @a=qw(foo:baz foo:bar); print sort { (split(/:/,$a))[1] cmp (split(/:/,$b))[1] } @a;
    altho there are faster ways, like storing each second field in a hash as key :)

    ... Quidquid perl dictum sit, altum viditur.
      Hmm, comparing (split($a))[1] with (split($b))[1] was the first thing that popped into my mind, as well. But isn't that a waste of CPU cycles, splitting an element over and over again each time you wanna compare it to another? The other idea that popped into my head was extracting each "sortable" element once and storing them somewhere, (a few people had suggested a hash), so I guess it's a matter of speed or memory usage, no? For small data sets, this probably wouldn't be an issue, but maybe for larger data sets, it would. Unless the sort routine is more efficient than that, and it optimizes away rather nicely to avoid having to split the same string over and over.

      Just babbling some random thoughts. Anybody have any random answers?


      There are 10 kinds of people -- those that understand binary, and those that don't.

        Ofcourse it's slow... that's why I'm saying a faster way would be using hashes, or complex data structures for that matter... TIMTOWTDI :)

        ... Quidquid perl dictum sit, altum viditur.
        What you are describing is the basic idea behind the Schwartzian Transform. See my write-up elsewhere in this thread for some links with more information. The idea is that you do the expensive operation (in this case, it's split) once and use a data structure to store the result. You then sort on the results and extract the original information when done. Our very own merlyn was the first (AFAIK) to apply his twisted mind to this problem and come up with a very perlish (or lispish depending on your mother tongue) method of doing this in one fell swoop using map.
Re: Sorting
by mfriedman (Monk) on May 31, 2002 at 17:41 UTC
    I would reccomend using an array of arrays and sorting the references to the arrays based on the value of the second element. For the sake of argument, I am going to assume that you have colon-delimited fields, one record per line, and that all the data has been loaded into $data.

    #!/usr/bin/perl -w use strict; my $data = get_data_from_somewhere; # First split the data up into a 2D structure my @struct; for (split /\n/, $data) { push @struct, [ split /:/ ] } # Now we sort the struct on the second element of the nested arrays @struct = sort { $a->[1] cmp $b->[1] } @struct;
Re: Sorting comma-delimited records
by vladb (Vicar) on May 31, 2002 at 17:49 UTC
    You can store your records in a hash (just as Beatnik pointed out :) using each record's second field for the key.
    use strict; use Data::Dumper; my @a = qw(foo:baz:faz foo:bar:fuss); my %h= map{ (split(/\:/,$_))[1] => $_ } @a; print Dumper(\%h); # to force a '\n' printed after each array element. $,="\n"; print @h{keys %h};
    Getting them inside a hash will assure that your records are sorted by the second field in alphabetical order. Here's the output:
    $VAR1 = { 'bar' => 'foo:bar:fuss', 'baz' => 'foo:baz:faz' }; foo:bar:fuss foo:baz:faz

    $"=q;grep;;$,=q"grep";for(`find . -name ".saves*~"`){s;$/;;;/(.*-(\d+) +-.*)$/; $_=["ps -e -o pid | "," $2 | "," -v "," "];`@$_`?{print"+ $1"}:{print" +- $1"}&&`rm $1`; print$\;}
Re: Sorting comma-delimited records
by Ovid (Cardinal) on May 31, 2002 at 18:32 UTC

    Assuming each item is a record in an array, a Schwartzian will do the trick:

    my @new_array = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [ $_, get_sortable_item($_) ] } @old_array; sub get_sortable_item { my $data = shift; return (split /:/, $data, 3)[1]; }


    Update: Whoa! According to timestamps, I'm half an hour late with this node, but I swear that reply wasn't there when I just posted. Hmm... Oh well.

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

(jeffa) Re: Sorting comma-delimited records
by jeffa (Bishop) on May 31, 2002 at 23:53 UTC
    use DBI; use Data::Dumper; use strict; my $dir = '.'; my $file = 'simple_csv'; my $cols = [qw(one two three)]; my $dbh = DBI->connect( "DBI:CSV:f_dir=$dir;csv_eol=\n;csv_sep_char=:;", {RaiseError=>1}, ); $dbh->{csv_tables}->{$file} = { col_names => $cols }; my $sth = $dbh->selectall_arrayref(" select one, two, three from simple_csv order by two "); print Dumper $sth;
    This assumes that you are in the same directory as the CSV file and the CSV file is named 'simple_csv' - note there is no extension in the file name. Read the docs for more info. Here is the sample CSV file i used:




    (the triplet paradiddle with high-hat)