Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

reading two files in parallel

by baxy77bax (Chaplain)
on May 02, 2013 at 10:11 UTC ( #1031722=perlquestion: print w/ replies, xml ) Need Help??
baxy77bax has asked for the wisdom of the Perl Monks concerning the following question:

hi

a very basic question. how to read two files at the same time. Example:

file a file b 1 read a line from a 2 read a line from b 3 read a line from a 4 read a line from b ...
since the files are too large and i have not enough memory to store my files i am forced to do something like this but i hav realized that i don't know how to do this. If i have two nested while loops like this:
while(<F1>){ #read a line while(<F2>){ #read a line last; # go back to the main loop, but how to continue from this p +oint in the next iteration ? } }
how do i continue from where i stopped in the nested loop? Thank you

baxy

UPDATE: Thnx moritz ! that did it :)

Comment on reading two files in parallel
Select or Download Code
Re: reading two files in parallel
by moritz (Cardinal) on May 02, 2013 at 10:17 UTC
      This is a nice way to do it, however I would alaways advice against using $a and $b as variable names as they are "magic" names for sorts and bugs where your variables are then shadowed in a sort can be hard to track down, so better avoid potentially troublesome names.
Re: reading two files in parallel
by Laurent_R (Vicar) on May 02, 2013 at 11:57 UTC

    Nested loops will not give you what you need. Besides Moritz's solution, you could also use one loop on one of the files (but not on the other):

    while (my $a = <F1>) { my $b = <F2>; # do something with $a and $b }
Re: reading two files in parallel
by sundialsvc4 (Monsignor) on May 02, 2013 at 12:36 UTC

    As a slight note, Moritz’s solution as-written does not seem to consider what to do with the leftover records in the longer of the two files.   The necessary additions, to be placed after what is shown, are trivial ... the only trick being to ensure that the first leftover record is processed.

      «As a slight note, Moritz’s solution as-written does not seem to consider what to do with the leftover records...»

      May be this is true. But what is your solution?

      Regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

      This thing is easy if you know that each file will have an exact match of records. Much less easy if you can have missing lines in one of the files or the other.

      I wrote a program doing comparison between to very large files, handling all the cases of records existing in one file and not in the other with all the special cases (file A finished before B, or the other way) is not really trivial.

      I am working on transforming this program into a module as generic as possible, but, unfortunately, it is not ready to be used.

      The necessary additions, to be placed after what is shown, are trivial

      No. The trivial additions are most certainly wrong.

      If F2 is exhausted first, the last line read from F1 inside the loop is lost, because $a is scoped to the block, and last leaves that block.

      Seem like this should have been a reply to moritz's post
Re: reading two files in parallel
by LanX (Canon) on May 02, 2013 at 19:32 UTC
    my ($a,$b); while( $a=<F1>, $b= <F2>, $a or $b) { ... }

    reads till the end of the longer file. Changing to and limits to shorter one.

    Cheers Rolf

    ( addicted to the Perl Programming Language)

    UPDATE Please note

    Since $a and $b are not chomped, no normal input line should ever be false and hence not necessarily tested with defined.

    Better take care if you are using special filehandles allowed to return a simple 0 or null strings!!!

    UPDATE

    safer:

    use strict; use warnings; use Data::Dump qw/pp/; open my $f1, "<", \ join "\n", 1..2; open my $f2, "<", \ join "\n", 1..5; while( defined (my $a=<$f1>) + defined (my $b=<$f2>) ) { $a .=""; $b .=""; chomp($a,$b); print "$a,$b\n"; }
    out
    1,1 2,2 ,3 ,4 ,5

    In boolean context: + is like or, * is like and, just w/o short circuit.

      Beautiful idea, I did not think of this way of doing it, thank you, Rolf, it might make my module simpler... if I finally end up doing it.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1031722]
Approved by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (9)
As of 2014-07-29 09:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (212 votes), past polls