http://www.perlmonks.org?node_id=1031722

baxy77bax has asked for the wisdom of the Perl Monks concerning the following question:

hi

a very basic question. how to read two files at the same time. Example:

file a file b 1 read a line from a 2 read a line from b 3 read a line from a 4 read a line from b ...
since the files are too large and i have not enough memory to store my files i am forced to do something like this but i hav realized that i don't know how to do this. If i have two nested while loops like this:
while(<F1>){ #read a line while(<F2>){ #read a line last; # go back to the main loop, but how to continue from this p +oint in the next iteration ? } }
how do i continue from where i stopped in the nested loop? Thank you

baxy

UPDATE: Thnx moritz ! that did it :)

Replies are listed 'Best First'.
Re: reading two files in parallel
by moritz (Cardinal) on May 02, 2013 at 10:17 UTC
      This is a nice way to do it, however I would alaways advice against using $a and $b as variable names as they are "magic" names for sorts and bugs where your variables are then shadowed in a sort can be hard to track down, so better avoid potentially troublesome names.
Re: reading two files in parallel
by Laurent_R (Canon) on May 02, 2013 at 11:57 UTC

    Nested loops will not give you what you need. Besides Moritz's solution, you could also use one loop on one of the files (but not on the other):

    while (my $a = <F1>) { my $b = <F2>; # do something with $a and $b }
Re: reading two files in parallel
by LanX (Saint) on May 02, 2013 at 19:32 UTC
    my ($a,$b); while( $a=<F1>, $b= <F2>, $a or $b) { ... }

    reads till the end of the longer file. Changing to and limits to shorter one.

    Cheers Rolf

    ( addicted to the Perl Programming Language)

    UPDATE Please note

    Since $a and $b are not chomped, no normal input line should ever be false and hence not necessarily tested with defined.

    Better take care if you are using special filehandles allowed to return a simple 0 or null strings!!!

    UPDATE

    safer:

    use strict; use warnings; use Data::Dump qw/pp/; open my $f1, "<", \ join "\n", 1..2; open my $f2, "<", \ join "\n", 1..5; while( defined (my $a=<$f1>) + defined (my $b=<$f2>) ) { $a .=""; $b .=""; chomp($a,$b); print "$a,$b\n"; }
    out
    1,1 2,2 ,3 ,4 ,5

    In boolean context: + is like or, * is like and, just w/o short circuit.

      Beautiful idea, I did not think of this way of doing it, thank you, Rolf, it might make my module simpler... if I finally end up doing it.
Re: reading two files in parallel
by sundialsvc4 (Abbot) on May 02, 2013 at 12:36 UTC

    As a slight note, Moritz’s solution as-written does not seem to consider what to do with the leftover records in the longer of the two files.   The necessary additions, to be placed after what is shown, are trivial ... the only trick being to ensure that the first leftover record is processed.

      The necessary additions, to be placed after what is shown, are trivial

      No. The trivial additions are most certainly wrong.

      If F2 is exhausted first, the last line read from F1 inside the loop is lost, because $a is scoped to the block, and last leaves that block.

      «As a slight note, Moritz’s solution as-written does not seem to consider what to do with the leftover records...»

      May be this is true. But what is your solution?

      Regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

      This thing is easy if you know that each file will have an exact match of records. Much less easy if you can have missing lines in one of the files or the other.

      I wrote a program doing comparison between to very large files, handling all the cases of records existing in one file and not in the other with all the special cases (file A finished before B, or the other way) is not really trivial.

      I am working on transforming this program into a module as generic as possible, but, unfortunately, it is not ready to be used.

      Seem like this should have been a reply to moritz's post