Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: String sorting in Perl

by Laurent_R (Parson)
on Jun 04, 2014 at 17:34 UTC ( #1088694=note: print w/ replies, xml ) Need Help??


in reply to String sorting in Perl

Hi markdavis87,

this is not really sorting but really counting the number of distinct entries and then sorting the counts. The easiest way is to store your lines an a hash, with the full line being the key and the count the value. Assuming your lines are stored in the @data array, you could do this:

my %count_hash; for my $line (@data) { $count_hash{$line} ++; }
Then you only need to sort on line size and count. And you're done.

Edit 17:39: To do the sort, something like this should probably work (untested):

my @sorted_data = sort { length $a <=> length $b || $count_hash{b} <=> + $count_hash{$a} } keys %count_hash;
This is supposed to sort the hash content in ascending order of line lengths and descending order of counts.

Edit 2, 18:30: small typo on the sorting above statement. It should be:

my @sorted_data = sort { length $a <=> length $b || $count_hash{$b} <= +> $count_hash{$a} } keys %count_hash;
(I had $count_hash{b} instead of $count_hash{$b}.)


Comment on Re: String sorting in Perl
Select or Download Code
Re^2: String sorting in Perl
by Limbic~Region (Chancellor) on Jun 04, 2014 at 17:38 UTC
    Laurent_R,
    And you're done.

    It appears the OP is interested in getting the values out in descending order (sorted).

    for my $line (sort {$count_hash{$b} <=> $count_hash{$a}} keys %count_h +ash) { print "$line $count_hash{$line}\n"; }

    Cheers - L~R

      Yes, Limbic~Region, you're right ++, that's why I updated the post immediately after I posted it, but you saw it before I posted the change. Having said that, my understanding on how the sort should be carried out is not exactly the same as yours (I explained it in my update).
        Laurent_R,
        The code should get the counts, order them with the highest counts first, then move to the next value.

        I didn't see anything that implied shorter strings should come first but I agree that "move to the next value" is fairly ambiguous.

        Cheers - L~R

Re^2: String sorting in Perl
by Anonymous Monk on Jun 04, 2014 at 18:02 UTC

    When I try to run your code, I get the following output:

    Use of uninitialized value in numeric comparison (<=>) at ./sorttest.p +l line 12. Use of uninitialized value in numeric comparison (<=>) at ./sorttest.p +l line 12. Use of uninitialized value in numeric comparison (<=>) at ./sorttest.p +l line 12. Use of uninitialized value in numeric comparison (<=>) at ./sorttest.p +l line 12. ALPHA:D 20 letters ABCCEDFFGAACDDEEEEFG ALPHA:D 20 letters ABCCEDFGGAACDDDEEEFG ALPHA:D 20 letters ABCCEDFFGAACDDDEEEFG ALPHA:E 24 letters ABCCEDFFGAACDDDEEEFGAGAD ALPHA:E 24 letters ABCCEDFFGAACDDDEEEFGAGAE

    How can I resolve this, and where are the count values for these?

Re^2: String sorting in Perl
by markdavis87 (Novice) on Jun 04, 2014 at 18:14 UTC

    That last anonymous post was me... Sorry! I figured I should give you the full code I'm using to test this out. Here it is:

    #!/usr/bin/perl -w use strict; my $file = "my path to the file name"; open (FH, "< $file") or die "Can't open $file for read: $!"; my @data = <FH>; close FH or die "Cannot close $file: $!"; my %count_hash; for my $line (@data) { $count_hash{$line} ++; } my @sorted_data = sort { length $a <=> length $b || $count_hash{b} <=> + $count_hash{$a} } keys %count_hash; print @sorted_data; # see if it worked
Re^2: String sorting in Perl
by markdavis87 (Novice) on Jun 04, 2014 at 18:23 UTC

    Ah! I got it.

    my @sorted_data = sort { length $a <=> length $b || $count_hash{b} <=> + $count_hash{$a} } keys %count_hash;

    should be

    my @sorted_data = sort { length $a <=> length $b || $count_hash{$b} <= +> + $count_hash{$a} } keys %count_hash;

    We were just missing a "$" before the "b"...

      Yes, I was going to give the answer, but you found out yourself. Sorry for the typo. I'll update my post to get it right.

        Alright, the last thing I have to ask is how I append the count to the end of each distinct line in the final output. As it stands right now, I get:

        ALPHA:D 20 letters ABCCEDFGGAACDDDEEEFG ALPHA:D 20 letters ABCCEDFFGAACDDDEEEFG ALPHA:D 20 letters ABCCEDFFGAACDDEEEEFG ALPHA:E 24 letters ABCCEDFFGAACDDDEEEFGAGAE ALPHA:E 24 letters ABCCEDFFGAACDDDEEEFGAGAD

        which is great! But I need to have the count of each distinct string, spaced by a tab... I'd like it to look like this:

        ALPHA:D 20 letters ABCCEDFGGAACDDDEEEFG 4 ALPHA:D 20 letters ABCCEDFFGAACDDDEEEFG 3 ALPHA:D 20 letters ABCCEDFFGAACDDEEEEFG 2 ALPHA:E 24 letters ABCCEDFFGAACDDDEEEFGAGAE 7 ALPHA:E 24 letters ABCCEDFFGAACDDDEEEFGAGAD 5

        Is there an easy way to do this?

        Please disregard my final question. I just used "chomp" for each line, and got the results I wanted. Thanks again!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1088694]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (7)
As of 2014-10-26 08:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (152 votes), past polls