Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Sorting a "tuple" by the second value

by thmsdrew (Scribe)
on Apr 09, 2012 at 03:28 UTC ( [id://964070]=perlquestion: print w/replies, xml ) Need Help??

thmsdrew has asked for the wisdom of the Perl Monks concerning the following question:

I have a huge list of tuples in a text file, each of them separated by a new line, that are of the following format:

blahblah blahblahblah blohblohbloh blohblohbloh hithere byethere foobarfoo barfoobar

... and so on. I need to sort these tuples by the second value, which is separated from the first value by a tab character. I don't want to have to swap the tuples, sort them, then swap them back, because that seems inefficient and dumb, but I can't think of a good way to do it otherwise.

Any tips on accomplishing this task?

Replies are listed 'Best First'.
Re: Sorting a "tuple" by the second value
by Anonymous Monk on Apr 09, 2012 at 04:17 UTC

      Thanks for that link, it was helpful for sure. Here's how I did it. The first subroutine gets the tuples from the file, splits them into an array, and writes them all into one array.

      sub get_links { my @links; open(my $fh, "<", "links.alpha.sorted.25sample") or die "cannot open < links.alpha.sorted.25sample: $!"; while(<$fh>) { chomp; my $tuple; @$tuple = split(/\s+/, $_); push(@links, $tuple); } return @links; }

      Then I sort by the second value and write it to another file:

      sub sort_and_store { my @links = get_links(); my @sorted_links = sort { $a->[1] cmp $b->[1] } @links; open(my $fh, ">", "sorted.by.destination") or die "cannot open > sorted.by.destination: $!"; foreach my $tuple (@sorted_links) { print $fh "@$tuple\n"; } }

      I've never used the @$blah variable type before, and I couldn't find any information about it. I'm guessing that's how you refer to the arrays in an array of arrays?

        You could combine the steps in your get_links() and sort_and_store() subroutines into one process.

        knoppix@Microknoppix:~$ cat spw964070.in abc peter def jack ghi zak jkl ben mno mick pqr alan knoppix@Microknoppix:~$ perl -Mstrict -wE ' > open my $inFH, q{<}, q{spw964070.in} > or die qq{open: < spw964070.in: $!\n}; > open my $outFH, q{>}, q{spw964070.out} > or die qq{open: > spw964070.out: $!\n}; > > print $outFH > map { $_->[ 0 ] } > sort { $a->[ 2 ] cmp $b->[ 2 ] } > map { [ $_, split ] } > <$inFH>; > > close $inFH > or die qq{close: < spw964070.in: $!\n}; > close $outFH > or die qq{close: > spw964070.out: $!\n};' knoppix@Microknoppix:~$ cat spw964070.out pqr alan jkl ben def jack mno mick abc peter ghi zak knoppix@Microknoppix:~$

        Sticking to your subroutines, it would be more efficient for get_links() to return a reference to @info than the huge array itself, like so:-

        ... return \ @links; } sub sort_and_store { my $refToLinks = get_links(); my @sorted_links = sort { $a->[1] cmp $b->[1] } @$refToLinks; ...

        I hope these points are helpful.

        Cheers,

        JohnGG

        I've never used the @$blah variable type before, and I couldn't find any information about it. I'm guessing that's how you refer to the arrays in an array of arrays?

        Yes, that is one way to do it, known as dereferenceing because $blah is a reference, see references quick reference

        Its nice you picked up a new trick, but I would still use sort :)

        $ cat fafafile 3 6 1 1 2 0 9 1 9 4 4 5 7 3 2 0 5 4 6 7 6 2 9 8 $ sort --dictionary-order fafafile 0 5 4 1 2 0 2 9 8 3 6 1 4 4 5 6 7 6 7 3 2 9 1 9 $ sort --dictionary-order --key=2,2 fafafile 9 1 9 1 2 0 7 3 2 4 4 5 0 5 4 3 6 1 6 7 6 2 9 8

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://964070]
Approved by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (9)
As of 2024-03-28 09:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found