Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
First things first - the script is very readable. I'm impressed! :-) Also, there is almost enough commenting. *winks*

To answer your question - yes, there are a number of places you could optimize for speed (and maintainability!).

  1. The biggest one (for both aspects) is to declare your variables when you use them. This improves maintainability because you're scoping variables to only where you need them. This improves speed because you're not having to clear the arrays each time.
    ### FILL ACHROMS TABLE ########################################### # Take in whole records separated by five dashes instead of a # newline character. $/ = "-----"; while($record = <INFILE>) { # Print out a counter to show how far the filling has progressed. # print "$n\n"; # $n++; @newrecord = (); # Send the record to have any blank lines removed. @newrecord = CleanData( $record ); ----- ### FILL ACHROMS TABLE ########################################### # Take in whole records separated by five dashes instead of a # newline character. $/ = "-----"; while(my $record = <INFILE>) { # Print out a counter to show how far the filling has progressed. # print "$n\n"; # $n++; # Send the record to have any blank lines removed. my @newrecord = CleanData( $record );
  2. The second is to pass around references, not the whole array itself. For example:
    @newrecord = CleanData( $record ); ---- my $newrecord = CleanData($record);
    There, $newrecord is a reference to an array. You'd get your stuff by using $newrecord->[5] instead of $newrecord[5]. If your arrays are large, this is a signficant savings.
  3. CleanData() should be using grep. Something like:
    sub CleanData { my ($record, @cleanrecord, $line, @record); $record = $_[0]; ### Separate the record by newlines. @record = split /\n/, $record; ### Remove empty lines if they exist. foreach $line ( @record ) { chomp( $line ); next if ( $line =~ m/^\s*$/ ); push @cleanrecord, $line; } return ( @cleanrecord ); } ---- sub CleanData { my $record = shift; my @cleanrecord = grep { /\S/ } chomp(split($/, $record)); return \@cleanrecord; }
    That is much faster and makes much clearer what you're doing. The $/ is the newline separator for your system. It makes your script portable to Windows or Mac (for example).
  4. Use array slices. For example:
    $sth_update_exon -> execute( @exondata[1,2,3,4], $geneid, $exondata[0] );
    They're faster as well as more readable.
  5. Don't use fetchrow_array(). Use fetch() with bind_columns() instead. This is how the DBI documentation recommends to do things the fastest. In addition, it seems like you're just using fetchrow_array() to find out if anything matched at all. Using fetch() is definitely quicker in that instance. (There's probably an even faster way just to check for existence, but I don't know it.)
  6. In ParseAChroms(), you're doing a regex against the same variable over and over. In addition, most of those regexes look to be identical. This makes me suspicious. Try something like:
    sub ParseAChroms { my $achromref = shift; my $matches = join '|', ( 'BAC:', 'LENGTH:', 'OSOME:', 'CHR_START:', 'CHR_END:', 'BAC_START:', 'BAC_END:', 'ORIENTATION:', ); my @achrom; my $line = join '', @$achromref; push @achrom, $line =~ /\b([BY]AC\b/; push @achrom, $line =~ /(?:$matches)\s*(.*?)\s*/g; return \@achrom; }
    Now, this assumes that all your stuff is going to be in the right order. If you can't, don't use an array - use a hash.
    sub ParseAChroms { my $achromref = shift; my $matches = join '|', ( 'BAC:', 'LENGTH:', 'OSOME:', 'CHR_START:', 'CHR_END:', 'BAC_START:', 'BAC_END:', 'ORIENTATION:', ); my $line = join '', @$achromref; my %achrom = $line =~ /($matches)\s*(.*?)\s*/og; $achrom{TYPE} = $line =~ /\b([BY]AC\b/o; return \%achrom;
    Now, what's even cooler is that you can access stuff by name, not array index. This is important if you ever plan on changing your input file stuff, maybe adding a new thing you want to track?

    You could even make that regex ahead of time by using qr//, but I'll leave that as an exercise to the reader. And, yes, ParseGenes(), ParseExon(), and ParseSubExon() should be treated the same way.

  7. A few style notes:
    • Your variable names could be improved with either capitalization (like your subroutine names) or underscores. Doesn't matter which one, just be consistent.
    • When you override $/ (or any similar variable), use local and do it within a block. That makes for good maintainable code.
    • Instead of making a variable for each sth, make a hash of sths and access the one you need. Even better would be to have a subroutine somewhere that does this for you.
    • Don't use a variable named temp. Ever. You can find a better name. Also, when you do, it looks like your just using it as a boolean test. If so, then just do the thing in the boolean test. (if and while take boolean tests, in case you're wondering.)
    • Don't use the same variable name for both a scalar and an array (such as the aforementioned temp). It's confusing and leads to (sometimes subtle) bugs.
    • Use heredocs with your SQL statements for prepare(). It's more readable. And, put it into a subroutine (preferably in some other module!) Something like:
      my $match_achrom_statement = << STH_MATCH; SELECT ac_id FROM achroms WHERE ac_id = ? STH_MATCH my $sth_match_achrom = $dbh->prepare($match_achrom_statement);

We are the carpenters and bricklayers of the Information Age.

Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

In reply to Re: Up for Critique by dragonchild
in thread Up for Critique by biograd

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others meditating upon the Monastery: (5)
    As of 2018-03-23 23:16 GMT
    Find Nodes?
      Voting Booth?
      When I think of a mole I think of:

      Results (297 votes). Check out past polls.