The "if/how" is this:
Your first solution had an outer loop that runs 900,000 times. Inside that outer loop, there's an inner loop that runs 60,000 times. 60k * 900k is 54 billion total iterations inside the inner loop.
The proposed solution created a hash of 60000 elements. Then your 900000 line file is read line by line. Inside of that loop that iterates 900000 times, there's a hash lookup, which is almost free. There are a total of 60000 iterations needed to build the hash, and 900000 iterations needed to test each line of the FASTA file. The amount of work being done is, therefore, 960000 iterations.
Think of loops inside of loops as doing n*m amount of work, whereas loops followed by loops (no nesting) do n+m amount of work. Anytime you have the choice of an algorithm where the order of growth is the mathematical product of two large numbers, or an algorithm where the growth rate is the mathematical sum of the same two numbers, your sense of economic utility should be telling you that the latter will scale up better.
| [reply] |