Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Hi Monks-

I've been working on a project for my degree, and it's about time to be done with the thing. This script is just one part it. (See my bio for more info.) Anyway, I'm feel like I've never had my code looked over by someone who knows what they're doing, so I posted the db-loading program on my scratchpad, along with a sample of the gene data input. If you feel like you have some spare time, I'd appreciate some constructive criticism.

(I didn't use a generic module because I knew that wouldn't be in the spirit of the project as my advisor saw it. If you'd like to advocate your favorite module, I'll try to study it before the next time I have to do something like this. :) )

My main concern is that the program takes too long to run. (14- 17 hours for 26,000+ records.) There is indexing on several of the fields. I've heard that if I didn't index while inserting data, everything might go faster. I've also heard that the indexing makes the multiple db searches needed go faster during the program, so any gain from indexing at the end is lost from the extra search time within the script. erg. As for the script, I did pass array references to the subroutines instead of copying arrays, but what are some other practical ways that you would optimize it for speed? It blazes on the first thousand records or so, then gets slower as it goes on. I expect that to some degree, since it has to search ever-increasing tables as it progresses, but is 14-17 hours realistic? I am seeing some stuff on optimizing the many regex in Programming Perl (pp 594-9), but I'm not sure the best way to apply it. I have read that tr// is faster than s//, which I could use in a couple places.

Thanks for any comments you can make. If you have questions or need more information, I'll oblige.


In reply to Up for Critique by biograd

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others wandering the Monastery: (10)
    As of 2017-12-15 12:40 GMT
    Find Nodes?
      Voting Booth?
      What programming language do you hate the most?

      Results (431 votes). Check out past polls.