Re: Make my script faster and more efficient

I can't help but thinking your inner loop is unnecessary. You're scanning through your list of mappings until you find the one for the current line. But you're missing the beauty of hashes - direct access to the value you want..

Consider this:

  while (<INPUT2>) {
    $_ =~ s/[\r\n]+\z//s; # I don't use chomp, sorry

    my ( $bioC, $contig_id, $pip ) = split( "\t", $_ );
    my $origin = $origins{$contig_id};
    print RESULTS "$bioC\t$origin\t$pip\n";
  }
[download]

Update: you could also DIE with an error message in the event that the file contained a code that wasn't in your mapping table:

  my $origin = $origins{$contig_id};
  if ( ! defined( $origin ) ) {
    die( "No mapping for \"$contig_id\" found in origins table" );
  }
[download]

Comment on Re: Make my script faster and more efficient Select or Download Code

Replies are listed 'Best First'.
Re^2: Make my script faster and more efficient by BrowserUk (Patriarch) on Dec 03, 2008 at 03:51 UTC
# I don't use chomp, sorry Why not? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re^3: Make my script faster and more efficient by quester (Vicar) on Dec 03, 2008 at 06:07 UTC
I've gotten gun-shy of chomp as well. In a mixed Unix/Windows environment, or running Cygwin, it's pretty common for $/ to not be set by default to the same line ending that is used in (some of) the input file(s). Monarch's method is more tolerant of incorrect or inconsistent line endings than chomp.	[reply]
Re^4: Make my script faster and more efficient by gone2015 (Deacon) on Dec 03, 2008 at 13:28 UTC
Indeed so: `CR`s can sneak in and ruin your day. Looking at: `$_ =~ s/[\r\n]+\z//s ;` makes me twitch a bit, since the values of `"\r"` and `"\n"` can vary. However, the best known case of `"\n" ne "\x0A"` is (old) Mac systems where `"\n" eq "\x0D"` and `"\r" eq "\x0A"`, so `[\r\n]` appears safe. [Nevertheless, I haven't found anything that guarantees that `"\n"` and `"\r"` are duals. perlport touches on systems where they aren't ASCII at all, but that's a whole other world of pain.] Anyway, I favour: `$_ =~ s/\s+\z// ;` on the basis that it finesses the issue and gets rid of any other trailing whitespace -- two birds, one stone. Mind you, I have seen `"\s"` defined to be `[ \t\n\r\f]` -- but current perlreref says it's "whitespace".	[reply] [d/l] [select]


Don't ask to ask, just ask
	PerlMonks