Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

How to process multiple input files?

by rnaeye (Pilgrim)
on May 22, 2011 at 18:58 UTC ( #906190=perlquestion: print w/ replies, xml ) Need Help??
rnaeye has asked for the wisdom of the Perl Monks concerning the following question:

Hi!

Script below inserts a line above SECOND

tag in the text file. This script works fine when I have single input file. I would like to process hundreds of files at once. I have tried to put the script within a foreach loop, but it did not work. I was wondering if anyone could help.

Thank you.

My script:

#!/usr/bin/perl use strict; use warnings; $^I = ".bak"; undef $/; my $count = 0; my $line = <>; $line =~ s { (<\/div>) } { if (++$count == 2){ "\t<?php include(\$_SERVER['DOCUMENT_ROOT' +].\"\/includes\/footer.php\"); ?>\n\n".$1; } else { $1; } }gex; print $line;

Sample input file:

<html lang="en"> <body> <!-- a lot of text here --> <div id="masthead" > <!-- a lot of text here --> </div> <!-- ############################################### --> <div id="wrapper" > <!--a lot of text here--> </div> </body> </html>

Comment on How to process multiple input files?
Select or Download Code
Re: How to process multiple input files?
by John M. Dlugosz (Monsignor) on May 22, 2011 at 19:30 UTC
    As written, the <> construct will read from each file name given on the command line, in turn. You don't need to do anything else; just list more than one file on the command line.

      It only processes first file in the command line.

        Ah, you are only reading once. I see what you were asking now.

        Making a while loop out of it like so:

        my $line; while (defined ($line = <>)) {
        will repeat until there are no files left.
Re: How to process multiple input files?
by jwkrahn (Monsignor) on May 22, 2011 at 20:42 UTC

    You need to reset $count for each file.    Something like this (UNTESTED):

    #!/usr/bin/perl use strict; use warnings; $^I = ".bak"; undef $/; my $count = 0; while ( my $line = <> ) { $line =~ s{ (<\/div>) } { ++$count == 2 ? "\t<?php include(\$_SERVER['DOCUMENT_ROOT'].\"\/includes +\/footer.php\"); ?>\n\n$1" : $1 }gex; print $line; $count = 0; }

      thanks for so much. works great.

      I worry that an empty file will stop it prematurely. Or a file might contain just "0" or somesuch, but that's less likely. Since he's slurping whole files rather than reading lines, I think it would be prudent to test for defined. (Hmm, what does the normal line-oriented read do if an empty file is in the list? Maybe it's always an issue.)

      update: never mind. In production code I would have simply written defined to be sure, but looking through the docs I see that this construct is special even in the case of explicit assignment. I know that the quick while(<>) tests for defined, or started to at some specific version of Perl (I remember the classic Camel book explaining how lines are never False because they end in "\n"), but wasn't sure that applied when assignment was being made.

      In general, I rely less on special cases and magical meanings in well-written production code than in a quick one-liner. Declaring variables, and not using $_ much falls into the same category, so I somehow was thinking the magic was not in effect.

        I think it would be prudent to test for defined.

        The code I posted:

        while ( my $line = <> ) {

        does test for defined.

Re: How to process multiple input files?
by jaredor (Deacon) on May 22, 2011 at 20:49 UTC

    Try using the while construct with the <> operator. Something like

    while (my $line = <>) { ... }

    Oops, after submission I saw jwkrahn responded in more detail. That comment should solve (both) your problems, which I now understand to be 1) looping over command line file names, and 2) Modifying the second line of each file. One thing you might do instead of maintaining your own counter would be to use the built-in line counter. The special $. line number variable will be properly maintained from file to file. (will not be properly maintained with the <> operator unless you take special steps as described in the link given. Thank you again jwkrahn.)

      He'll always have a line-count of 1, since he's slurping the files. The counter variable is used to count how many times the replacement is triggered with the /g option, not the number of "lines" read (he only reads one "line" in the original!).

      Putting the declaration of $counter inside the loop should do the trick simply. A better solution might be to rewrite the regex to find the second occurrence of </div> rather than finding all of them and only substituting the second, and "inserting" the content directly rather than repeating the found stuff in the replacement.

        Thanks for pointing out my errors. I simply did not read the code closely enough.

        All your responses in this thread were good. I learned something. Good work.

        He's always have a line-count of 1, since he's slurping the files.

        $. contains the current record count, and since each file is one record it will be incremented for each file and so will not always be 1.    Unless of course you reset $. or close ARGV at the end of each file.

Re: How to process multiple input files?
by graff (Chancellor) on May 22, 2011 at 20:51 UTC
    I have tried to put the script within a foreach loop, but it did not work.

    So, I'm guessing that you didn't try it this way:

    #!/usr/bin/perl use strict; use warnings; for my $f ( @ARGV ) { local $/; open( I, '<', $f ); open( O, '>', "$f.bak" ); my $count = 0; my $line = <I>; $line =~ s{ (<\/div>) } { if (++$count == 2){ "\t<?php include(\$_SERVER['DOCUMENT_ROOT'].\"\/incl +udes\/footer.php\"); ?>\n\n".$1; } else { $1; } }gex; print O $line; }
    That works for me. (BTW, I'm compulsive about making the indentation look right -- seems silly, but it's really helpful to keep code less illegible.)

    If you have so many files that you can't fit them all as args on a command line, there's the unix "xargs" tool:

    ls | xargs your_prog ## or use "find ... | xargs your_prog"
Re: How to process multiple input files?
by Anonymous Monk on May 22, 2011 at 23:48 UTC
Re: How to process multiple input files?
by John M. Dlugosz (Monsignor) on May 23, 2011 at 00:42 UTC
    Oh, also your technique to find the second occurrence of something and do something to it is a bit strange. You could use the search /g in a loop and have normal code rather than the inside of evaluated replacement. But, you can locate the second occurrence directly and not need that kind of code.

    You want to insert something just before the second </div>, right? Something like this (untested!):

    my $replacement= '\t<?php include(\$_SERVER['DOCUMENT_ROOT'].\"\/inclu +des\/footer.php\"); ?>\n\n'; s{ </div> .*? \K (?=</div>) } { $replacement } x;
    Note that you don't use /g so don't keep checking all the rest of the divs, and you don't use $1 or anything in the replacement but "insert" it without replacing any of the stuff used to find that spot.

    The \K means that what came before is just context and not included in what gets replaced. The (?=pattern) does the same for what follows. Nothing is "in" the region replaced. See also the use of lazy quantifiers.

    The whole program becomes:

    #!/usr/bin/perl use strict; use warnings; $^I = ".bak"; # same as -i option undef $/; # slurp whole files! my $replacement= '\t<?php include(\$_SERVER['DOCUMENT_ROOT'].\"\/inclu +des\/footer.php\"); ?>\n\n'; my $filecontents; while (defined ($filecontents=<>)) { $filecontents =~ s { </div> .*? \K (?=</div>) } { $replacement } x; print $filecontents; }
    I added comments and changed the name of the variable from $line because nobody else noticed that this is not a single line. As written, it was confusing and hard to read because of built-in assumptions people make about idioms and style.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://906190]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (20)
As of 2014-07-31 13:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (248 votes), past polls