Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Re: grep of readline matching more lines than elements in array

by Kenosis (Priest)
on Dec 03, 2013 at 19:50 UTC ( #1065487=note: print w/replies, xml ) Need Help??

in reply to grep of readline matching more lines than elements in array

If I may, I'd like to offer just a few suggestions which may assist your efforts:

  • In case you haven't, always use strict; use warnings; at the top of your scripts
  • Use the three-argument form of open
  • Use split to get the ID from the file's lines
  • Build a hash from @isos20 and use that to check for an ID match when reading the file

Give the above items, consider the following refactoring:

use strict; use warnings; my $posfile = 'posfile.txt'; my $pos_two = 'pos_two.txt'; my @isos20 = qw/these are the array elements/; my %isos20 = map { $_ => 1 } @isos20; #open file to write filtered lines to open my $OUT2, '>', $pos_two or die "cannot open $pos_two: $!"; #open file to filter open my $IN2, '<', $posfile or die "cannot open $posfile: $!"; while (<$IN2>) { my $comp = ( split /,/ )[0]; print $OUT2 $_ if $isos20{$comp}; } close $IN2; close $OUT2;

Notice that there are fewer instructions within the while loop. Instead of assigning the value of Perl's default scalar $_ to $line, just operate on $_.

The split gets the ID from the file's string by creating a list of string elements and taking the zeroth element of that list. The $isos20{$comp} notation 'looks' for that ID within the hash (constructed using map above), and prints the line to the output file if it's in the hash.

Why use a hash instead of grepping the array for a match? Using a hash is a much faster, more efficient way of detecting a match, in this case. For each line, the entire array is traversed by grep to find a possible match. However, a hash has a very efficient look-up algorithm, so will work better--significantly so, if the array is quite large.

Hope this helps!

Replies are listed 'Best First'.
Re^2: grep of readline matching more lines than elements in array
by Laurent_R (Canon) on Dec 03, 2013 at 21:41 UTC
    I can only agree with the excellent advice offered by Kenosis, you should really follow them, they will save you a lot of debugging time. There is just one point in the re-factored code which could be improved in my view:
    my %isos20 = map { $_ => 1 } @isos20;
    I would give a different name to the hash and the array. Granted, Perl can manage this without any problem, and it will work without any problem in the case in point. But giving the same name to two different entities can lead to difficult to track bugs with more complicated data structure. Sometimes, you goof your data dereferencing and the Perl compiler would be able to tell you about your error if each data structure had its own name, but it does not see the error if two different entities have the same name, so that you get the error at run time instead of compile time, or, worse, that you are not using the data you thought you were using.

      Excellent hash-naming suggestion, Laurent_R. Thank you for adding this.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1065487]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2018-06-25 01:27 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (126 votes). Check out past polls.