Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re: Reading concurrently two files with different number of lines

by sundialsvc4 (Abbot)
on Apr 11, 2013 at 13:42 UTC ( #1028160=note: print w/replies, xml ) Need Help??

in reply to Reading concurrently two files with different number of lines

Certainly one idea that pops into my head is to write a short filter-script which concatenates the continuation-lines into one contiguous string. It reads line-by-line, noting whether the current line is a continuation, if so appends it to the stashed line, otherwise outputs the stashed line (if any) and stashes the current record.

If you apply this filter to both files in turn, you have now reduced the complexity of the problem considerably because continuation-lines are no longer a concern in the filter-output files.   Now, maybe you can apply tools such as diff to them, and so on.   I’m really warming-up to that idea, as I describe it.

  • Comment on Re: Reading concurrently two files with different number of lines

Replies are listed 'Best First'.
Re^2: Reading concurrently two files with different number of lines
by frogsausage (Sexton) on Apr 11, 2013 at 15:48 UTC
    Actually, I constructed each full line for each file, storing them into an array. At the same time I am pushing them into an array, I am parsing them and storing them into a hash, adding a key containing my line number in the array I just pushed my line into. At the same time, discarded all unwanted lines that don't need to be matched later on.

    Then, it is really easy to compare. First comparing string to string (just like diff) using the line number stored in the hash to know where my full strings are, then I am either discarding the line or going through each key/value pair (created while parsing the file) and decide if they match or not (either discard - deleted from the hash on the fly - or keep). Actually, I could have compared everything using the key/values but, oh well, it is working great though.

    Everything is working really great and I hookup my line reader/concatenation from today with my parsing subroutines from before (with a little adaptation to my new fromat). In the end, my program is twice as short (structures are way less complex) and much more maintainable. And more importantly, running fine.

    Now it is time for QA on it, just to make sure it works in some weird cases I didn't had in my test cases, and it is good to go!

    Thank you all for your suggestions and examples, it really helped!


    P.S: is the hand-written format used to push users to create a Perl script to automatically add the right tags? :p

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1028160]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2018-05-27 05:56 GMT
Find Nodes?
    Voting Booth?