Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: Looping through a file, reading each line, and adding keys/editing values of a hash

by Kenosis (Priest)
on Dec 05, 2013 at 04:19 UTC ( #1065702=note: print w/ replies, xml ) Need Help??


in reply to Looping through a file, reading each line, and adding keys/editing values of a hash

You have a great start on your script! As for your 'struggles': 1) yes, and ++ is a perfectly appropriate and a common construct; 2) The value for what? If you mean incrementing the hash value, you're doing it correctly. If you mean $row[18] to get the value of col 19, you're doing it correctly.; and 3) no.

Here are some suggested changes to consider for your script:

use strict; use warnings; my $filename = $ARGV[0]; my %gene_count; open my $fh, '<', $filename or die "Cannot open $filename: $!"; while ( my $line = <$fh> ) { chomp; my @row = split( "\t", $line ); $gene_count{ $row[18] }++ if $row[18]; } close($fh); print "$_ => $gene_count{$_}\n" for sort keys %gene_count;
  • $ARGV{0} -> $ARGV[0]
  • Made a few changes to your open
  • Added chomp because you're splitting on the tab character. If you don't chomp, a newline will be on the end of the array's last element (with the exception of the file's last line).
  •  = ++ -> ++
  • Added if $row[18] to check for 'good' key candidate. This check could be stronger, but is likely sufficient, in this case.
  • Just fyi: The parens of split and close are optional.
  • Added printing the sorted key/value pairs. (Just assumed you wanted to do that... :)

Since you're sending your script the filename from the command line, you can let Perl handle the file i/o. If you split on ' ' (whitespace) you don't need to chomp. Also, you can send split a LIMIT to its splitting, so it's not splitting all columns. Using this LIMIT can significantly speed the splitting process. Given this, the following is functionally equivalent:

use strict; use warnings; my %gene_count; while (<>) { my @rows = split ' ', $_, 20; $gene_count{ $row[18] }++ if $row[18]; } print "$_ => $gene_count{$_}\n" for sort keys %gene_count;

Your original script's logic is good; only minor fixes were needed. You've done well...

Hope this helps!


Comment on Re: Looping through a file, reading each line, and adding keys/editing values of a hash
Select or Download Code
Re^2: Looping through a file, reading each line, and adding keys/editing values of a hash
by Anonymous Monk on Dec 05, 2013 at 05:58 UTC

    So, so helpful! Thank you. (And glad to see that I was on the right track and didn't need any major re-organizing). Thanks again! Cheers, Amelia

      You're most welcome, Amelia!

Re^2: Looping through a file, reading each line, and adding keys/editing values of a hash
by GrandFather (Sage) on Dec 05, 2013 at 08:12 UTC

    Why use a post-increment instead of a pre-increment when the value is not being used? $gene_count{ $row[18] }++ is (imo) better written ++$gene_count{$row[18]} so the increment is obvious.

    True laziness is hard work
      There are different schools of thought there. I strongly favour the postincrement.

        I know its strongly favoured by many. I don't understand why.

        My strong preference is to use pre-increment where possible so the increment operator is more easily seen (trailing stuff is more easily ignored). But maybe I'm missing something important about the post-increment?

        A very minor consideration may be that the post-increment could be slower for some implementations that the pre-increment. The difference is so slight that it would be exceptionally unusual for that to be a consideration.

        True laziness is hard work

      Interesting question.

      One reason is that the OP already attempted a post-increment, and since the two increment types would produce the same outcome, why make the change?

      Another reason is my personal preference for this counting situation. If, for example, I were on a sidewalk, tallying all the red cars that passed by, I wouldn't make a tally mark upon their approach (pre-increment), but rather after they crossed an imaginary line extending across the street from my position (post-increment).

      Perhaps this ultimately boils down to personal preference, in cases like these...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1065702]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (8)
As of 2015-07-01 23:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (25 votes), past polls