http://www.perlmonks.org?node_id=83509

Stamp_Guy has asked for the wisdom of the Perl Monks concerning the following question:

my @sorted=sort{my $one=substr($a,rindex($a,'|')); my $two=substr($b,rindex($b,'|')); ($one <=> $two) } @database;

This block of code takes my tiny pipe-deliminated database and sorts it by the number in the last field. When it gets to 100, it puts it right in front of the 10. This data comes from a flat file database (don't flame me. I've been over all that about flat files enough.) The number is the last field, so it is possible it has a new line character attached. That's the only thing I can think of that would cause that. I don't know where to put a chomp in this block though.

Anyone else got any ideas? I am totally stumped and could use any help I can get on this.

-Stamp_Guy

Replies are listed 'Best First'.
Re (tilly) 1: Sorting issues
by tilly (Archbishop) on May 26, 2001 at 21:06 UTC
    Your rindex is giving you the index of the last |.

    You then take a substring starting there.

    You are then taking strings that look like "|10" and "|100" and doing a numerical comparison. Neither looks like a number, so that turns into 0 being compared with 0. Add 1 to the rindex and that problem would go away.

    BTW if you turned warnings on, you would have been told about this up front. Secondly if you had used an existing database then the problem would never have arisen. And finally the moral is that when debugging it is really bad to wildly guess at what you think the problem could be. Instead methodically work through what you think should happen and what is happening until you find the discrepancy (which could be anywhere).

Re: Sorting issues
by chipmunk (Parson) on May 26, 2001 at 21:18 UTC
    I built this test script around your code:
    #!/usr/local/bin/perl -w use strict; my @database = <DATA>; my @sorted=sort{my $one=substr($a,rindex($a,'|')); my $two=substr($b,rindex($b,'|')); ($one <=> $two) } @database; print @sorted; __DATA__ a|b|c|10 d|e|f|100 g|h|i|2
    which produced the following output:
    Argument "|10\n" isn't numeric in ncmp at tmp.pl line 9, <DATA> chunk +3. Argument "|100\n" isn't numeric in ncmp at tmp.pl line 9, <DATA> chunk + 3. Argument "|100\n" isn't numeric in ncmp at tmp.pl line 9, <DATA> chunk + 3. Argument "|2\n" isn't numeric in ncmp at tmp.pl line 9, <DATA> chunk 3 +. a|b|c|10 d|e|f|100 g|h|i|2
    The warnings reveal the problem quite conveniently; the substr() is including the pipe character, so all the values being compared are numerically equal to zero and thus to each other.

    You can easily fix this error by adding 1 to each rindex(). However, I would suggest making another change as well. You're doing a fair amount of work in the sort subroutine, which makes it inefficient. A Schwartzian Transform moves the work out of the sort sub:

    my @sorted = map $_->[1], sort { $a->[0] <=> $b->[0] } map [ substr($_,rindex($_,'|')+1), $_ ], @database;

    But, there's another option that would work well here, and should be even more efficient because it uses an optimized sort sub. This one is often called the Guttman-Rosler Transform:

    my $width = 10; # must be at least as big as the longest field being +compared my @sorted = map substr($_, index($_, '|')+1), sort map sprintf("%0${width}d|%s", substr($_,rindex($_,'|')+1) +, $_), @database;
    As you can see, this is very similar to the Schwartzian Transform, but the intermediate value is a string instead of an anonymous array.
Re: Sorting issues
by epoptai (Curate) on May 26, 2001 at 20:59 UTC
    Chomp just before the comparison:
    my @sorted = sort{ my $one=substr($a,rindex($a,'|')); my $two=substr($b,rindex($b,'|')); chomp($one,$two); ($one <=> $two) } @database;

    Update: I only answered the question of where to put a chomp in that block. See the following replies for a more in-depth analysis of the sorting problem.

    --
    Check out my Perlmonks Related Scripts like framechat, reputer, and xNN.

    Edit: chipmunk 2001-05-26

Re: Sorting issues
by the_slycer (Chaplain) on May 26, 2001 at 21:14 UTC
    Your $one and $two are being compared as strings instead of numbers. This is because they both contain a '|' infront of the numbers.
    my @sorted=sort{ my $one=substr($a,rindex($a,'|')); my $two=substr($b,rindex($b,'|')); $one =~ s/\D//g; $two =~ s/\D//g; ($one <=> $two) } @database;
    This sorts it properly. I'm sure there's a better way to get at that last number, but I'm just stripping out anything that's not a digit.

    HTH
      You really don't want to do this much work in the sort block, if the data size is significant. That's a good use for the Schwartzian Transform (or the GSR equivalent).

      -- Randal L. Schwartz, Perl hacker

Re: Sorting issues
by larryk (Friar) on May 27, 2001 at 00:36 UTC
    Forgive my simple brain but am I missing the point?
    #!/usr/bin/perl -w use strict; my @database = <DATA>; sub g($) {(split/\|/,shift)[$#_]} my @sorted = sort {g$a<=>g$b} @database; print @sorted; __DATA__ a|b|c|10 d|e|f|g|h|9 i|100 j|k|2 l|m|n|o|11

    "Argument is futile - you will be ignorralated!"

      You forgot something: You need to a

      chomp @database;
      before the sort. And as merlyn points out this is a good application of the Schwartzian Transform.


      Peter L. BergholdSchooner Technology Consulting, Inc.
      Peter@Berghold.Netwww.berghold.net

        i purposely left out the chomp because it works without.

        ++ for you if you can tell me when the \n will screw up the sort.

        The fix to dump the newlines is: sub g($){(split/\||\n/,shift)[-1]};

        "Argument is futile - you will be ignorralated!"

Re: Sorting issues
by toma (Vicar) on May 27, 2001 at 07:01 UTC
    I've enjoyed good results with the Data::Table module, which provides functions for reading tables from CSV or a database. It can output HTML tables or CSV. It also provides functions for sorting, mapping, reordering, slicing, etc. I like to use a header on the CSV file, so that I can refer to columns by name.
    use Data::Table; my $t= Data::Table::fromCSV("data.csv"); $t->sort('exp', 0, 1); #Sort table by col 'exp', numeric, descending print $t->csv;

    For my test file, this program prints:

    name,exp,level
    merlyn,14272,10
    tilly,13067,10
    dkubb,1076,7
    LD2,807,6
    toma,17,1

    I don't plan on spending any more time debugging CSV-type parsers, and I have an easy migration path for upgrading to using a relational database.

    It should work perfectly the first time! - toma

Re: Sorting issues
by zeidrik (Scribe) on May 28, 2001 at 15:08 UTC
    If You insist on "sort" here is my way to do it:
    for the database as
    1 2 3 |10 a b c |20 e f g |100 x n f |1
    the code
    #!/usr/bin/perl -w use strict; my %H; open(R,"database"); map {/\|(\d+)/; $H{$1}=$_ if $1}<R>; close(R); foreach (sort {$a<=>$b} keys %H){print $H{$_}}
    does exactly:
    x n f |1 1 2 3 |10 a b c |20 e f g |100
    Enjoy...
Re: Sorting issues
by Stamp_Guy (Monk) on May 29, 2001 at 04:12 UTC
    Thanks for the help guys. Chipmonk, thanks for taking the time to explain it. That made a lot more sense.