http://www.perlmonks.org?node_id=727583


in reply to Make my script faster and more efficient

I can't help but thinking your inner loop is unnecessary. You're scanning through your list of mappings until you find the one for the current line. But you're missing the beauty of hashes - direct access to the value you want..

Consider this:

while (<INPUT2>) { $_ =~ s/[\r\n]+\z//s; # I don't use chomp, sorry my ( $bioC, $contig_id, $pip ) = split( "\t", $_ ); my $origin = $origins{$contig_id}; print RESULTS "$bioC\t$origin\t$pip\n"; }

Update: you could also DIE with an error message in the event that the file contained a code that wasn't in your mapping table:

my $origin = $origins{$contig_id}; if ( ! defined( $origin ) ) { die( "No mapping for \"$contig_id\" found in origins table" ); }

Replies are listed 'Best First'.
Re^2: Make my script faster and more efficient
by BrowserUk (Patriarch) on Dec 03, 2008 at 03:51 UTC
      I've gotten gun-shy of chomp as well. In a mixed Unix/Windows environment, or running Cygwin, it's pretty common for $/ to not be set by default to the same line ending that is used in (some of) the input file(s). Monarch's method is more tolerant of incorrect or inconsistent line endings than chomp.

        Indeed so: CRs can sneak in and ruin your day.

        Looking at:  $_ =~ s/[\r\n]+\z//s ; makes me twitch a bit, since the values of "\r" and "\n" can vary. However, the best known case of  "\n" ne "\x0A" is (old) Mac systems where  "\n" eq "\x0D" and  "\r" eq "\x0A", so  [\r\n] appears safe. [Nevertheless, I haven't found anything that guarantees that "\n" and "\r" are duals. perlport touches on systems where they aren't ASCII at all, but that's a whole other world of pain.]

        Anyway, I favour:  $_ =~ s/\s+\z// ; on the basis that it finesses the issue and gets rid of any other trailing whitespace -- two birds, one stone.

        Mind you, I have seen "\s" defined to be [ \t\n\r\f] -- but current perlreref says it's "whitespace".