Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Reading two text files parallelly...

by biswanath_c (Beadle)
on Nov 08, 2010 at 19:11 UTC ( #870169=perlquestion: print w/ replies, xml ) Need Help??
biswanath_c has asked for the wisdom of the Perl Monks concerning the following question:


Hi

Is there a way to read multiple text files paralelly?

I am trying to do this:
open F1, ">$file1" or die "Cannot open file $file1 \n"; open F2, ">$file2" or die "Cannot open file $file2 \n"; while ( <F1>, <F2> ) { $f1 = $_; $f2 = $_;

Will this work? I did not get any error when i ran the script but $f1 and $f2 came out blank!

Any ideas/suggestions?

Comment on Reading two text files parallelly...
Download Code
Re: Reading two text files parallelly...
by JediWizard (Deacon) on Nov 08, 2010 at 19:28 UTC

    You probably want to read each file explicitly and use a "last" (or a breaking variable) to exit the loop when done. something like:

    my $more_to_read = 1; while($more_to_read){ $more_to_read = 0 if(eof(F1) && eof(F2)); my $f1 = <F1>; my $f2 = <F2>; #some processing here }

    UPDATE: You should probably use strict and warnings, it will give you better diagnostic info on what actually went wrong


    They say that time changes things, but you actually have to change them yourself.

    —Andy Warhol

Re: Reading two text files parallelly...
by ww (Bishop) on Nov 08, 2010 at 19:33 UTC
    No errors?

    Really?

    Despite the lack of a closing curly brace at line 10?

    And how -- with the code you've shown us -- do you know that the variables "came out blank"?

    Worse, how did you expect to read from a file which you appear to have tried to open for output?

    - - for a post that mis-states your effort (or lack thereof).

Re: Reading two text files parallelly...
by JavaFan (Canon) on Nov 08, 2010 at 19:41 UTC
    Taking in mind that while (<F1>) is short for while (defined ($_ = <F1>)), I would write that as:
    while (defined(my $f1 = <F1>) && defined(my $f2 = <F2>)) { ... }
    This assumes you want to stop reading if you reach the end of either file.

    Oh, and you need to change how you're opening the file. You now open the files for writing.

Re: Reading two text files parallelly...
by Hugmeir (Sexton) on Nov 08, 2010 at 19:56 UTC
    For starters, strictures and warnings.. Also, you are opening the files for writing, but using them for reading.

    open my $fh1, "<", $file1 or die "Cannot open file $file1: $!"; open my $fh1, "<", $file2 or die "Cannot open file $file2: $!";

    Then, with while ( <F1>, <F2> ), you'd be reading a line from each file, not assinging it to anything, and entering the loop; $_ is only set while using while (<$file>). (I think.)

    Really, that's quite a bad post you have there. Still, I've never tried something like this, so have a freebie from another newbie (and please try to ignore the awful variable names..):

    use strict; use warnings; use 5.010; use autodie; open my $fh1, '<', 'file1'; open my $fh2, '<', 'file2'; while (my $test1 = <$fh1>, my $test2 = <$fh2>) { chomp($test1 //= ''); chomp($test2 //= ''); say "$test1, $test2"; }
      while (my $test1 = <$fh1>, my $test2 = <$fh2>) This is "comma operator". The "true" or "false" of the "while" only depends upon the state of $test2. The comma operator is a way to combine two what normally would be completely separate statements "together". This should be used only with extreme caution.

      A common use for this is in user prompting loops. Putting the prompt together with the ending condition solves the problem of "how to get the loop started" and at the same time puts the condition that is going to stop the loop right up there where it belongs - right in the statement that runs the loop - not buried somewhere in the "guts".

      Here is an example. We are asking about part numbers. The user prompt is clear. What stops the loop is clear (q,Q,QuIT, etc) and it is right there where it is supposed to be, in the controlling loop statement. In the "guts" of the loop there will be many error messages like "hey, dummy a part number needs 10 digits" or whatever, followed by a "next;" statement - that restarts the loop and re-prompts the user.

      while ( (print "Enter Part No and Qty: "), (my $line=<STDIN>) !~ /^\s*q(uit)?\s*$/i ) {...}
      Some would argue that it is better to move the ending condition into the loop and make it the first statement. Well ok thats one way. In that case we have:
      last if $line !~/^\s*q(uit)?\s*$/i;
      I would argue against that as "do forever" loops should be reserved for servers, GUI event loops and such.

      In any event, the OP doesn't appear to be in need of such things.

        This is "comma operator". The "true" or "false" of the "while" only depends upon the state of $test2.

        Oh, completely missed that.. All my tests had the second file longer than the first. ++ for the explanation! If I may take another moment of your time, I tried fixing that code from before, although now I think that I'm using one of those endless loops you adviced against:

        while (my ($test1, $test2) = (scalar <$fh1>, scalar <$fh2>)) { last unless defined $test1 or defined $test2; ... }
        I figure that the simplest fixes (for me, that is: Feel free to kick me on the right path) would be to either use the defined-or on the results of the readlines, replacing the undefs with the empty list, or move everything into a sub.

        ..actually, rather than talk about it:

        defined-or'ing

        while ( my $test1, $test2 ) = (scalar <$fh1> // (), scalar <$fh2> // ( +) ) { ... }
        sub'd
        sub read_double_file { my ($fh1, $fh2) = @_; my ($line_file_1, $line_file_2) = (scalar <$fh1>, scalar <$fh2>); (defined $line_file_1 or defined $line_file_2) ? ($line_file_1 //= +'', $line_file_2 //= '') : (); } while ( my ($line_from_1, $line_from_2) = read_double_file($fh1, $fh2) + ) { ... }
        Is there a better approach? (For whatever reason, trying to solve this has been quite fun -- I'm even thinking about meddling with objects, just so I don't have to pass the filehandles over and over to the subs) Are the attempts so far horribly, terribly wrong?
Re: Reading two text files parallelly...
by 7stud (Deacon) on Nov 08, 2010 at 19:59 UTC

    ...also you should be:

    1) Using the three arg form of open().

    2) You should NOT be using a bareword filehandle.

    This is the modern way to open a file:

    open my $F1, '<', $file1 or die "Cannot open file $file1 \n"; while (my $line = <$F1>) { ... }

    And this:

    while (defined(my $f1 = <F1>) && defined(my $f2 = <F2>)) { ... }

    should be more like:

    while ( defined(my $line1 = <$F1>) and defined(my $line2 = <$F2>) ) { ... }
      ...and including $! in the error message is the postmodern way ;)
      open my $f1, '<', $file1 or die "Cannot open file '$file1': $!";
        Whoops!

        And the postpostmodern way…

        use English qw( -no_match_vars ); open my $f1, '<', $file1 or die "Cannot open file '$file1': $OS_ERROR\n";

      I'm new to Perl as well, and all of the examples that I've come across instruct you to open a file like this:

      open (FH, "<$file1") or die "blahblah";

      Would you (or anyone for that matter) mind elaborating as to why 1) and 2) should be used?

      Much appreciated in advance.

        See the first four items of Perl Best Practices Chapter 10 (I/O), namely:

        • Don't use bareword filehandles
        • Use indirect filehandles
        • If you have to use a package filehandle, localize it first
        • Use either the IO::File module or the three-argument form of open

        Using lexical file handles is better style because:

        • They are local variables and so avoid the generally evil programming practice of keeping state in global variables.
        • They close themselves automatically when their lexical variable goes out of scope.
        • They avoid the confusion of barewords. IMHO, barewords are an unfortunate Perl feature. Admittedly, they're handy when writing poems or playing golf. Historically, they probably exist only because of the improbable chance circumstance of Larry sitting right next to a brilliant poet (Sharon Hopkins) when designing early versions of Perl at JPL in the early 1990s and being led astray by her beautiful poetry readings.

        The old two-argument form of open is subject to various security exploits as described at Opening files securely in Perl.

        With open (FH, "<$file1") or die "blahblah"; there is a possible security issue because maybe the variable $file1 might contain something that would cause problems. What goes in to the open() function is the interpolated string. You learned right, always use the < or > etc. In your example, if $file1 was ">ImportantFile", the open() will fail because the argument would evaluate to "<>ImportantFile" and open() won't like that filename! If you forgot the "<", then then open to ">ImportantFile" might succeed and it would be deleted or other various bad things could happen.

        So if you use:  open (FH, '<', $file1) || die; putting the file mode explicitly there is a "small" thing that could save you big problems later.

        If you are writing small one or two page programs, using a bare word like FH is no big deal. However, be aware that Perl like C (you have do it this way in C), can use lexical variables for filehandles. So you can open($infile, '<', $somefile)... and pass $infile to a subroutine just like any other Perl variable.

        As far as $! in "die" messages, you might or might not want to put that there. Part of this depends upon how descriptive your part of the "die" message is! Ok, try some code:

        #!/usr/bin/perl -w use strict; open(FH, '<', "bad") || die "your textXXX OS says: $!\n"; while (<FH>){} # prevents warning FH only used once
        See what $! has to say. My OS prints, "your textXXX OS says: No such file or directory". I figure that "your textXXX" is way more important. Something like "can't open Budget.csv" is way more to the point than "No such file or directory"- most of the time the OS text is meaningless for the average user. Also notice what adding the trailing "\n" to the die message does.

        Update: This thread has morphed into something else from the OP's original question. But I figure it is ok to comment on some of the comments to the comments!

Re: Reading two text files parallelly...
by Khen1950fx (Canon) on Nov 09, 2010 at 02:18 UTC
    I used Parallel::Runner.
    #!/usr/bin/perl use strict; use warnings; use Parallel::Runner; my $file1 = '/root/Desktop/files/file1'; my $file2 = '/root/Desktop/files/file2'; my $runner = Parallel::Runner->new(2); $runner->run( sub { open(FILE1, '<', $file1) or die "Cannot open file $file1\n"; if (<FILE1>) { print "File open\n"; } else { print "File is not open\n"; } }); $runner->run( sub { open(FILE2, '<', $file2) or die "Cannot open file $file2\n"; if (<FILE2>) { print "File open\n"; } else { print "File is not open\n"; } }); $runner->finish;
Re: Reading two text files parallelly...
by Jim (Curate) on Nov 09, 2010 at 03:59 UTC

    The word is parallelerally.

Re: Reading two text files parallelly...
by Marshall (Prior) on Nov 09, 2010 at 09:42 UTC
    The first problem is that you are opening the file handles: F1 and F2 for writing. And you specifiy that for example the files named $file1 and $file2 should be deleted and new files should be started with those names - so what you have is basically nonsense code. You can't read a file that you just "zero'ed" with empty contents!

    #Maybe this what you mean???? open (F1, '<' ,$file1) or die "Cannot open file $file1 for reading\n"; open (F2, '<', $file2) or die "Cannot open file $file2 for reading\n";
    If you are trying to compare or match files on a line by line basis, this can become complex.

    What do you want to happen if say file1 has fewer lines than file2?

    I didn't test this, but it appears that your "while statement" does:

    while ( <F1>, <F2> ) { # throw away a line from file F1 - never to be used # end the loop if there are no more lines in F2 # set $_ to the next line from F2 # <F1> plays no role here - it is a No-Operation: NoOp }
    So this simplifies into:
    while ( <F2> ) { # $_ is a line from $file2 }
    Which I suspect is not what you want. What do you want?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://870169]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (11)
As of 2014-09-17 19:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (98 votes), past polls