Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Looping through a file, reading each line, and adding keys/editing values of a hash

by Kenosis (Priest)
on Dec 05, 2013 at 04:19 UTC ( #1065702=note: print w/ replies, xml ) Need Help??


in reply to Looping through a file, reading each line, and adding keys/editing values of a hash

You have a great start on your script! As for your 'struggles': 1) yes, and ++ is a perfectly appropriate and a common construct; 2) The value for what? If you mean incrementing the hash value, you're doing it correctly. If you mean $row[18] to get the value of col 19, you're doing it correctly.; and 3) no.

Here are some suggested changes to consider for your script:

use strict; use warnings; my $filename = $ARGV[0]; my %gene_count; open my $fh, '<', $filename or die "Cannot open $filename: $!"; while ( my $line = <$fh> ) { chomp; my @row = split( "\t", $line ); $gene_count{ $row[18] }++ if $row[18]; } close($fh); print "$_ => $gene_count{$_}\n" for sort keys %gene_count;
  • $ARGV{0} -> $ARGV[0]
  • Made a few changes to your open
  • Added chomp because you're splitting on the tab character. If you don't chomp, a newline will be on the end of the array's last element (with the exception of the file's last line).
  •  = ++ -> ++
  • Added if $row[18] to check for 'good' key candidate. This check could be stronger, but is likely sufficient, in this case.
  • Just fyi: The parens of split and close are optional.
  • Added printing the sorted key/value pairs. (Just assumed you wanted to do that... :)

Since you're sending your script the filename from the command line, you can let Perl handle the file i/o. If you split on ' ' (whitespace) you don't need to chomp. Also, you can send split a LIMIT to its splitting, so it's not splitting all columns. Using this LIMIT can significantly speed the splitting process. Given this, the following is functionally equivalent:

use strict; use warnings; my %gene_count; while (<>) { my @rows = split ' ', $_, 20; $gene_count{ $row[18] }++ if $row[18]; } print "$_ => $gene_count{$_}\n" for sort keys %gene_count;

Your original script's logic is good; only minor fixes were needed. You've done well...

Hope this helps!


Comment on Re: Looping through a file, reading each line, and adding keys/editing values of a hash
Select or Download Code
Re^2: Looping through a file, reading each line, and adding keys/editing values of a hash
by Anonymous Monk on Dec 05, 2013 at 05:58 UTC

    So, so helpful! Thank you. (And glad to see that I was on the right track and didn't need any major re-organizing). Thanks again! Cheers, Amelia

      You're most welcome, Amelia!

Re^2: Looping through a file, reading each line, and adding keys/editing values of a hash
by GrandFather (Cardinal) on Dec 05, 2013 at 08:12 UTC

    Why use a post-increment instead of a pre-increment when the value is not being used? $gene_count{ $row[18] }++ is (imo) better written ++$gene_count{$row[18]} so the increment is obvious.

    True laziness is hard work
      There are different schools of thought there. I strongly favour the postincrement.

        I know its strongly favoured by many. I don't understand why.

        My strong preference is to use pre-increment where possible so the increment operator is more easily seen (trailing stuff is more easily ignored). But maybe I'm missing something important about the post-increment?

        A very minor consideration may be that the post-increment could be slower for some implementations that the pre-increment. The difference is so slight that it would be exceptionally unusual for that to be a consideration.

        True laziness is hard work

      Interesting question.

      One reason is that the OP already attempted a post-increment, and since the two increment types would produce the same outcome, why make the change?

      Another reason is my personal preference for this counting situation. If, for example, I were on a sidewalk, tallying all the red cars that passed by, I wouldn't make a tally mark upon their approach (pre-increment), but rather after they crossed an imaginary line extending across the street from my position (post-increment).

      Perhaps this ultimately boils down to personal preference, in cases like these...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1065702]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (19)
As of 2014-07-22 16:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (118 votes), past polls