Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Make my script faster and more efficient

by monarch (Priest)
on Dec 03, 2008 at 03:35 UTC ( [id://727583]=note: print w/replies, xml ) Need Help??


in reply to Make my script faster and more efficient

I can't help but thinking your inner loop is unnecessary. You're scanning through your list of mappings until you find the one for the current line. But you're missing the beauty of hashes - direct access to the value you want..

Consider this:

while (<INPUT2>) { $_ =~ s/[\r\n]+\z//s; # I don't use chomp, sorry my ( $bioC, $contig_id, $pip ) = split( "\t", $_ ); my $origin = $origins{$contig_id}; print RESULTS "$bioC\t$origin\t$pip\n"; }

Update: you could also DIE with an error message in the event that the file contained a code that wasn't in your mapping table:

my $origin = $origins{$contig_id}; if ( ! defined( $origin ) ) { die( "No mapping for \"$contig_id\" found in origins table" ); }

Replies are listed 'Best First'.
Re^2: Make my script faster and more efficient
by BrowserUk (Patriarch) on Dec 03, 2008 at 03:51 UTC
      I've gotten gun-shy of chomp as well. In a mixed Unix/Windows environment, or running Cygwin, it's pretty common for $/ to not be set by default to the same line ending that is used in (some of) the input file(s). Monarch's method is more tolerant of incorrect or inconsistent line endings than chomp.

        Indeed so: CRs can sneak in and ruin your day.

        Looking at:  $_ =~ s/[\r\n]+\z//s ; makes me twitch a bit, since the values of "\r" and "\n" can vary. However, the best known case of  "\n" ne "\x0A" is (old) Mac systems where  "\n" eq "\x0D" and  "\r" eq "\x0A", so  [\r\n] appears safe. [Nevertheless, I haven't found anything that guarantees that "\n" and "\r" are duals. perlport touches on systems where they aren't ASCII at all, but that's a whole other world of pain.]

        Anyway, I favour:  $_ =~ s/\s+\z// ; on the basis that it finesses the issue and gets rid of any other trailing whitespace -- two birds, one stone.

        Mind you, I have seen "\s" defined to be [ \t\n\r\f] -- but current perlreref says it's "whitespace".

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://727583]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2024-04-23 21:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found