Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Build Sort dynamically

by libvenus (Sexton)
on Aug 24, 2008 at 02:48 UTC ( [id://706475]=perlquestion: print w/replies, xml ) Need Help??

libvenus has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

i m trying to use ST on a big file with many records separated by '|'.I need to build the key dynamically as there are many queries, though the keys(can go max to 30) for a query would be same.The data to be sorted looks like :-

12312.123|John|johon07@yahoo.com||Sunnyvalle California US|<200010>

24.23|Sam|Sam04@yahoo.com||Hong Kong|<200020>

123123124.123|Tina|tina2007@gmail.com||London|<100030>

123124.123|Lucas|Lucas_fighter2025@yahoo.com||New York US|<200040>

since i want to generalize the sort subroutine how do i construct the sort sub that would first find the type of the key and then contruct the sort expression that would run on a list.

Thanks

Replies are listed 'Best First'.
Re: Build Sort dynamically
by tilly (Archbishop) on Aug 24, 2008 at 03:35 UTC
    There is an old hack from tye at Re: Sorting on Section Numbers that is worth considering. You just pad the numbers to fixed width, and then sort alphabetically. That will sort both numbers and text correctly in most cases. Plus as a bonus it will sort "foo2" before "foo12", which is usually what people want to do.

    But if you're building the sort sub dynamically the simplest thing to do is to build an array of sort closures, then combine them into a sort sub. Re: Fun with complex sorting on arbitrary criteria list. can show you how to combine the closures into a single sort subroutine. (Note that there I don't use the ($$) prototype and here I do. That is a good habit because it is very useful to be able to set up the sort routine in one package and call it in the other. It doesn't work on Perl 5.005, but that is not widely used any more.) As for the individual sort subs, they might look something like this (all code is untested):

    my $field_index = $position{$field}; if ("text" eq $type{$field}) { if ("asc" eq $order{$field}) { push @sort_sub, sub ($$) { $_[0][0][$field_index] cmp $_[1][0][$field_index]; }; } else { push @sort_sub, sub ($$) { $_[1][0][$field_index] cmp $_[0][0][$field_index]; }; } } else { if ("asc" eq $order{$field}) { push @sort_sub, sub ($$) { $_[0][0][$field_index] <=> $_[1][0][$field_index]; }; } else { push @sort_sub, sub ($$) { $_[1][0][$field_index] <=> $_[0][0][$field_index]; }; } }
    And then the actual sort would use a Schwartzian transform like this:
    my @sorted_list = map {$_[1]} sort $sort_routine map {[[split $_, "|"], $_]} @list;
    Update: I had a trailing comma after sort $sort_routine. Now fixed.
      You are prototyping your anonymous subroutine definitions. The perlsub documentation seems to say that prototypes have no effect with such subs. Can you say what is happening here? Is this something peculiar to the sort builtin?

      Also, sort doesn't seem to work properly with a code reference (which I take $sort_routine to be) as its first argument (tested on 5.8.2). Is this something new in 5.10?

        The not working correctly was an extra comma. Sorry for the typo, but as I said, all code is untested. I've removed that and it should work better.

        The prototype is explained in the documentation to sort. If you provide the prototype, then the arguments are passed in @_. If you don't, then they are passed in $a and $b. Passing in $a and $b is marginally more efficient, but the problem comes when you create a sort routine in one package and call it in another. The package in which $a and $b are set is the one you call the sort routine in, and not the one you created the subs in. Which can become tricky. With the prototype, those cross-package problems go away.

        This is potentially important in this case because depending on your overall code organization you may want to have each field know how to sort its own datatype. (Add a date-time field which could be in another language and it will quickly become apparent why you might want a more sophisticated organization.) Which would result in sort routines being spread out across packages.

Re: Build Sort dynamically
by jethro (Monsignor) on Aug 24, 2008 at 03:35 UTC

    Is ST an abbreviation for "sort" ?

    "keys can go max to 30", 30 what? Are you talking about string length?

    "keys for a query would be same", are you saying that there can be duplicate keys?

    Which field is the key?

    If the key is the first field, what do these numbers mean?

    "Find the type of key", what different types of keys do you expect ?

    Maybe an example of what you want to achieve would be helpful

    UPDATE: Strangely Tilly seems to understand what you want. How do you do that, Tilly? ;-)

      'ST' would be Schwartzian Transform in this particular case.

      Quite how that percolated into my brain having only used an ST maybe twice ever, I really don't know... ;-)

      ST is Schwartzian Transform. Knowing that of course you then know everything you need to understand the question. ;)

      If you need a little more information about the ST a Super Search will turn up plenty of stuff around Perlmonks. The canonical node for ST is Schwartzian Transform. There are many good replies to my question about ST that may also be of interest.


      Perl reduces RSI - it saves typing
        Yes, it all wondrously falls into place. Did I mention that I hate abbreviations ;-).
        GF, ST never means Schwartzian Transform, CUL8R
      How I do that is context. Rather than starting with everything I don't understand about the question I start with whatever I do. In this case I understood the data structure, and understood that he wants to do complex sorts on it, sometimes numerically and sometimes asciibetically.

      So I began to talk about how to set up complex sorts on this data structure, with different comparisons on different fields. It was not until after I finished that I realized that he meant Schwartzian transform when he said ST.

Re: Build Sort dynamically
by BrowserUk (Patriarch) on Aug 24, 2008 at 15:13 UTC

    I don't think you'd need the scalar reverse on a big-endian platform:

    #! perl -slw use strict; use Data::Dump qw[ pp ]; my $reFloat = qr[([+-]?(?:\d+\.)?\d+(?:[Ee][+-]?\d+)?)]; sub sortEm ($$) { my( $a, $b ) = @_; my( $i, $comp ) = ( -1, 0 ); $comp = $a->[ $ARGV[ $i ] ] cmp $b->[ $ARGV[ $i ] ] until ++$i > $#ARGV or $comp; $comp; } print $_->[ 0 ] for sort sortEm map { chomp; [ $_, map{ s[$reFloat][ scalar reverse pack 'd', $1]ge; $_ } split '\|' ] } <DATA>; __DATA__ 24.23|Sam|Sam04@yahoo.com||Hong Kong|<200020> 24.23|Sam|Sam04@yahoo.com||Hong Kong|<200000> 123123124.123|Tina|tina2007@gmail.com||London|<100030> 123124.123|Lucas|Lucas_fighter2025@yahoo.com||New York US|<200040> 12312.123|John|johon07@yahoo.com||Sunnyvalle California US|<200010> 12312.123|John|johon07@yahoo.com||Sunnyvalle California US|<200019>
    c:\test>706475 1 24.23|Sam|Sam04@yahoo.com||Hong Kong|<200020> 24.23|Sam|Sam04@yahoo.com||Hong Kong|<200000> 12312.123|John|johon07@yahoo.com||Sunnyvalle California US|<200010> 12312.123|John|johon07@yahoo.com||Sunnyvalle California US|<200019> 123124.123|Lucas|Lucas_fighter2025@yahoo.com||New York US|<200040> 123123124.123|Tina|tina2007@gmail.com||London|<100030> c:\test>706475 3 123124.123|Lucas|Lucas_fighter2025@yahoo.com||New York US|<200040> 24.23|Sam|Sam04@yahoo.com||Hong Kong|<200020> 24.23|Sam|Sam04@yahoo.com||Hong Kong|<200000> 12312.123|John|johon07@yahoo.com||Sunnyvalle California US|<200010> 12312.123|John|johon07@yahoo.com||Sunnyvalle California US|<200019> 123123124.123|Tina|tina2007@gmail.com||London|<100030> c:\test>706475 5 24.23|Sam|Sam04@yahoo.com||Hong Kong|<200020> 24.23|Sam|Sam04@yahoo.com||Hong Kong|<200000> 123123124.123|Tina|tina2007@gmail.com||London|<100030> 123124.123|Lucas|Lucas_fighter2025@yahoo.com||New York US|<200040> 12312.123|John|johon07@yahoo.com||Sunnyvalle California US|<200010> 12312.123|John|johon07@yahoo.com||Sunnyvalle California US|<200019> c:\test>706475 6 123123124.123|Tina|tina2007@gmail.com||London|<100030> 24.23|Sam|Sam04@yahoo.com||Hong Kong|<200000> 12312.123|John|johon07@yahoo.com||Sunnyvalle California US|<200010> 12312.123|John|johon07@yahoo.com||Sunnyvalle California US|<200019> 24.23|Sam|Sam04@yahoo.com||Hong Kong|<200020> 123124.123|Lucas|Lucas_fighter2025@yahoo.com||New York US|<200040> c:\test>706475 6 1 123123124.123|Tina|tina2007@gmail.com||London|<100030> 24.23|Sam|Sam04@yahoo.com||Hong Kong|<200000> 12312.123|John|johon07@yahoo.com||Sunnyvalle California US|<200010> 12312.123|John|johon07@yahoo.com||Sunnyvalle California US|<200019> 24.23|Sam|Sam04@yahoo.com||Hong Kong|<200020> 123124.123|Lucas|Lucas_fighter2025@yahoo.com||New York US|<200040>

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Build Sort dynamically
by salva (Canon) on Aug 25, 2008 at 07:34 UTC
    You can use Sort::Maker or Sort::Key to generate efficient sorters on the fly:
    # untested! use Sort::Key qw(multikeysorter); my @types = qw(int str str str str int); sub gen_sorter { my @ix = map abs($_), @_; my @t; for (@_) { my $t = $types[abs $_]; $t = "-$t" if $_ < 0 push @t, $t; } multikeysorter (sub { my @k = split /\|/; $k[5] =~ tr/><//; @k[@ix] }, @t); } my $sorter = gen_sorter(1,2,-3); # order by the first key, # then the second and then, # the third desc my @sorted = $sorter->(@data);

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://706475]
Approved by lidden
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2024-04-19 14:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found