The "if/how" is this:
Your first solution had an outer loop that runs 900,000 times. Inside that outer loop, there's an inner loop that runs 60,000 times. 60k * 900k is 54 billion total iterations inside the inner loop.
The proposed solution created a hash of 60000 elements. Then your 900000 line file is read line by line. Inside of that loop that iterates 900000 times, there's a hash lookup, which is almost free. There are a total of 60000 iterations needed to build the hash, and 900000 iterations needed to test each line of the FASTA file. The amount of work being done is, therefore, 960000 iterations.
Think of loops inside of loops as doing n*m amount of work, whereas loops followed by loops (no nesting) do n+m amount of work. Anytime you have the choice of an algorithm where the order of growth is the mathematical product of two large numbers, or an algorithm where the growth rate is the mathematical sum of the same two numbers, your sense of economic utility should be telling you that the latter will scale up better.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
| & || & |
| < || < |
| > || > |
| [ || [ |
| ] || ] ||