Beefy Boxes and Bandwidth Generously Provided by pair Networks Cowboy Neal with Hat
Perl: the Markov chain saw
 
PerlMonks  

Re: Large file processed line by line

by btrott (Parson)
on Jun 19, 2001 at 05:02 UTC ( #89533=note: print w/ replies, xml ) Need Help??


in reply to Large file processed line by line

The general idiom is to use a while loop to iterate over the lines in the file, reading in one line at a time and processing it, then moving on to the next.

Something like this:

open FH, "foo" or die "Can't open foo: $!"; while (<FH>) { ## current line is in $_, process it } close FH or warn "Error closing foo: $!";
Depending on your situation, you might also want to check out the -p and -n command line flags to perl (perlrun).

If these are files specified on the command line, you can use the special construct:

while (<>) { ## line is in $_ }
This might be useful in a situation like
$ process.pl foo.txt bar.txt baz.txt
to process each of the files on the command line.


Comment on Re: Large file processed line by line
Select or Download Code
Re: Re: Large file processed line by line
by coolmichael (Deacon) on Jun 19, 2001 at 07:01 UTC
    Don't forget about the everfaithful -i.bak command line flag. It's one of my favourites. It edits a file "inplace" one line at a time. This should delete everything that doesn't contain the string "foo" (but I haven't tested it. sorry)
    perl -epi.bak "print if(m/foo/);" foo.txt bar.txt baz.txt
    Update:
    Read on for the correct answer. I really should have tested it. Thanks Mirod and Btrott. ++ to both of you. -- for me.
      Right, -i is quite cool. But all -i does is open the file for in-place editing; it doesn't "edit the file one line at a time". If you notice, you also have the -p option in the above command line; that's actually the switch that's doing the line-by-line processing.

      You really should have tested it:

      • -e should be immediatelly followed by the script to run,
      • -p prints the current line, so you don't have to do it yourself, -n is what you want in this case.

      This (tested!) script would work as:

      perl -i.bak -n -e"print if(m/foo/);" foo.txt bar.txt baz.txt

      From perldoc perlrun:

      -n causes Perl to assume the following loop around your program, which makes it iterate over filename arguments somewhat like sed -n or awk: LINE: while (<>) { ... # your program goes here } "BEGIN" and "END" blocks may be used to capture control before or after the implicit program loop, just as in awk. -p causes Perl to assume the following loop around your program, which makes it iterate over filename arguments somewhat like sed: LINE: while (<>) { ... # your program goes here } continue { print or die "-p destination: $!\n"; } If a file named by an argument cannot be opened for some reason, Perl warns you about it, and moves on to the next file. Note that the lines are printed automatically. An error occurring during printing is treated as fatal. To suppress printing use the -n switch. A -p overrides a -n switch. "BEGIN" and "END" blocks may be used to capture control before or after the implicit loop, just as in awk.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://89533]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (13)
As of 2014-04-23 20:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (554 votes), past polls