Beefy Boxes and Bandwidth Generously Provided by pair Networks Cowboy Neal with Hat
laziness, impatience, and hubris
 
PerlMonks  

Sorting a "tuple" by the second value

by thmsdrew (Scribe)
on Apr 09, 2012 at 03:28 UTC ( #964070=perlquestion: print w/ replies, xml ) Need Help??
thmsdrew has asked for the wisdom of the Perl Monks concerning the following question:

I have a huge list of tuples in a text file, each of them separated by a new line, that are of the following format:

blahblah blahblahblah blohblohbloh blohblohbloh hithere byethere foobarfoo barfoobar

... and so on. I need to sort these tuples by the second value, which is separated from the first value by a tab character. I don't want to have to swap the tuples, sort them, then swap them back, because that seems inefficient and dumb, but I can't think of a good way to do it otherwise.

Any tips on accomplishing this task?

Comment on Sorting a "tuple" by the second value
Download Code
Re: Sorting a "tuple" by the second value
by Anonymous Monk on Apr 09, 2012 at 04:17 UTC

      Thanks for that link, it was helpful for sure. Here's how I did it. The first subroutine gets the tuples from the file, splits them into an array, and writes them all into one array.

      sub get_links { my @links; open(my $fh, "<", "links.alpha.sorted.25sample") or die "cannot open < links.alpha.sorted.25sample: $!"; while(<$fh>) { chomp; my $tuple; @$tuple = split(/\s+/, $_); push(@links, $tuple); } return @links; }

      Then I sort by the second value and write it to another file:

      sub sort_and_store { my @links = get_links(); my @sorted_links = sort { $a->[1] cmp $b->[1] } @links; open(my $fh, ">", "sorted.by.destination") or die "cannot open > sorted.by.destination: $!"; foreach my $tuple (@sorted_links) { print $fh "@$tuple\n"; } }

      I've never used the @$blah variable type before, and I couldn't find any information about it. I'm guessing that's how you refer to the arrays in an array of arrays?

        I've never used the @$blah variable type before, and I couldn't find any information about it. I'm guessing that's how you refer to the arrays in an array of arrays?

        Yes, that is one way to do it, known as dereferenceing because $blah is a reference, see references quick reference

        Its nice you picked up a new trick, but I would still use sort :)

        $ cat fafafile 3 6 1 1 2 0 9 1 9 4 4 5 7 3 2 0 5 4 6 7 6 2 9 8 $ sort --dictionary-order fafafile 0 5 4 1 2 0 2 9 8 3 6 1 4 4 5 6 7 6 7 3 2 9 1 9 $ sort --dictionary-order --key=2,2 fafafile 9 1 9 1 2 0 7 3 2 4 4 5 0 5 4 3 6 1 6 7 6 2 9 8

        You could combine the steps in your get_links() and sort_and_store() subroutines into one process.

        knoppix@Microknoppix:~$ cat spw964070.in abc peter def jack ghi zak jkl ben mno mick pqr alan knoppix@Microknoppix:~$ perl -Mstrict -wE ' > open my $inFH, q{<}, q{spw964070.in} > or die qq{open: < spw964070.in: $!\n}; > open my $outFH, q{>}, q{spw964070.out} > or die qq{open: > spw964070.out: $!\n}; > > print $outFH > map { $_->[ 0 ] } > sort { $a->[ 2 ] cmp $b->[ 2 ] } > map { [ $_, split ] } > <$inFH>; > > close $inFH > or die qq{close: < spw964070.in: $!\n}; > close $outFH > or die qq{close: > spw964070.out: $!\n};' knoppix@Microknoppix:~$ cat spw964070.out pqr alan jkl ben def jack mno mick abc peter ghi zak knoppix@Microknoppix:~$

        Sticking to your subroutines, it would be more efficient for get_links() to return a reference to @info than the huge array itself, like so:-

        ... return \ @links; } sub sort_and_store { my $refToLinks = get_links(); my @sorted_links = sort { $a->[1] cmp $b->[1] } @$refToLinks; ...

        I hope these points are helpful.

        Cheers,

        JohnGG

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://964070]
Approved by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2014-04-18 03:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (461 votes), past polls