|Problems? Is your data what you think it is?|
Reading concurrently two files with different number of linesby frogsausage (Sexton)
|on Apr 10, 2013 at 15:24 UTC||Need Help??|
frogsausage has asked for the
wisdom of the Perl Monks concerning the following question:
Hi, I've been reading you quite a lot for answers about so many questions and always found what I wanted, until now.
I am trying to read two (very) long files in order to compare them in a smart way: checking if some elements (such as value=3.14 vs value="3.14") are swapped on the same line.
- there are a lot of lines that I will be willing to discard as soon as I read them. Therefore, I am trying not to store these in memory as each file can go way beyond 100 000 lines each.
- I might append one or more following line (starting with a +) to the previous line starting with a letter if: this first line doesn't match with the one in the other file, if one of the following isn't matching.
Lines can be such as:
Right now, I am reading them in this very simple way:
When running small test cases, it works really great (swap comparison etc.) However, I have some glitches and I guess that the longest file doesn't have its line read when the end of the shorter file is reached. These glitches are that one of the line starting with a + is the start of a new line in my result print (while it should always be appended after my first line).
I tried changing && to || but it got all messed up. I am thinking of dealing the remaining part of the longest file after the end of the shortest one is reached, however it doesn't sound really clean.
Looking forward reading your thoughts and suggestions!
-FP.S: running Perl 5.8.8