Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

doing a search and replace on a large file

by Anonymous Monk
on Apr 14, 2004 at 12:26 UTC ( #345018=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a large file (about 67,000 lines of data).
I would like to delete the lines that do not have a certain pattern in them, and write the remaining lines to a new file.
Any ideas about how I might do this?

Thanks in advance


  • Comment on doing a search and replace on a large file

Replies are listed 'Best First'.
Re: doing a search and replace on a large file
by matija (Priest) on Apr 14, 2004 at 12:30 UTC
    Just about the way you outlined:
    open(INPUT,"<$bigfile") ||die "Could not open $bigfile:$!\n"; open(OUTPUT,">$newfile")||die "Could not open $newfile:$!\n"; while (<INPUT>) { print OUTPUT $_ # note no semicolon here... unless /whatever condition matches the patterns you want/; } close(INPUT) || die "Could not close $bigfile: $!\n"; close(OUTPUT)|| die "Could not close $newfile: $!\n";

      Minor tweak: since the OP wants to "delete" lines that do not match a certain pattern, I'd change the unless into an if. That way, you "save" the lines that do match the pattern ;-)

      open IN, "<$bigfile" or die "Can't open $bigfile: $!\n"; open OUT, ">$newfile" or die "Can't open $newfile: $!\n"; while(<IN>) { print OUT $_ if /the pattern in question/; } close OUT; close IN;

      All code is usually tested, but rarely trusted.
      i thought what was needed was to (A) delete matching lines in file...and (B) write the matching lines to another file. If instead you could read the file and produce two output files, one file for the matching lines, and one file for the rest (unmatched lines), then the objective can be achieved with a slight modification to matija code like so:
      open(OUTPUT,">matchingfile.txt")||die; open(OUTPUT2,">nonmatchingfile.txt")||die; while (<>) { chomp; if(/what you want to match/) { print OUTPUT $_ . "\n"; } else { print OUTPUT2 $_ . "\n"; } print OUTPUT $_ # note no semicolon here... unless /whatever condition matches the patterns you want/; } close(OUTPUT)|| die "Could not close $!\n"; close(OUTPUT2)|| die "Could not close $!\n";
      ...and run the prog like this: perl <yourinputfile
Re: doing a search and replace on a large file
by hardburn (Abbot) on Apr 14, 2004 at 13:12 UTC

    That's a one-liner, provided your pattern is reasonably small. You don't need to open filehandles if you use perl's command line options and the shell to your advantage:

    perl -lne 'print if /pattern here/' old_file.txt > new_file.txt

    : () { :|:& };:

    Note: All code is untested, unless otherwise stated

      Or, if you want to overwrite the file use the '-i' cmdline option for in-place editting:
      perl -pi -e's/pattern here//g' file
      Update: reread question, you don't want to overwrite the file, so you don't want this.
Re: doing a search and replace on a large file
by pelagic (Priest) on Apr 14, 2004 at 12:48 UTC
    An example how I skip on a couple of conditions:
    use strict; my ($inputfile, $outputfile) = @_; open (OUT, ">$outputfile") || die "could not open $outputfile\n"; open (IN, "<$inputfile") || die "could not open $inputfile\n"; while (<IN>) { chomp; # no newline s/^--.*//; # no oracle comments s/^prompt.*//; # no oracle prompt lines s/^\s+//; # no leading white s/\s+$//; # no trailing white s/\s+/ /; # replace series of white with one spac +e next unless length; # anything left? print OUT $_, "\n"; } close IN; close OUT;

Re: doing a search and replace on a large file
by blue_cowdawg (Monsignor) on Apr 14, 2004 at 13:35 UTC

        Any ideas about how I might do this?

    Here's one way...

    #!/usr/bin/perl -w use Tie::File; use strict; my @ry=(); tie @ry,"Tie::File","mybigfile" or die "mybigfile:$!"; @ry = grep /mypattern/,@ry; untie @ry;

    Please note: that was off the top of my head and untested.

Re: doing a search and replace on a large file
by graff (Chancellor) on Apr 15, 2004 at 02:02 UTC
    Most of the solutions above are fine -- especially hardburn's one-liner (that one got my ++!) -- but folks who know about unix command-line tools know that this is usually just a job for the "grep" command:
    grep 'pattern to be kept' old.file > new.file
    Of course, perl offers so much that "grep" can't do: more powerful regexes, support for multiple character encodings, and liberation from the old "every record must be just one line of text" mind-set. How about a Perl version of grep?

    Well, I'm sure I'm not the only who has done this -- I just couldn't stop myself... Here it is: grepp -- Perl version of grep (I wrote it a year or so ago, have been using it regularly on solaris, linux and macosx -- should work fine on ms-windows -- and finally got around to posting it here).

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://345018]
Approved by davis
Discipulus was outside The Clash concert in 1985..
[marto]: Wolfsbane , now I'm having flashbacks
[choroba]: Isn't Using PerlPod Creatively rather a meditation?
[choroba]: I don't see a question
[1nickt]: ugh, I stuck my head in the bass bin for 30 seconds on a dare at Ted Nugent at Hammersmith Odeon. Yes, I am 40% deaf now.
[johngg]: My daughter is incredibly jealous of my wife who got to see The Clash at Brixton many years ago. They went to see The Vaccines there recently.
[1nickt]: But the bands are even louder! I saw Spearhead (Michael Franti) at an outdoor show and had to walk a mile away to not feel pain in my chest! Babies were crying ... I asked the sound engineer why it was necessary to have the bass so loud and he laughed...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (9)
As of 2017-03-24 12:13 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (301 votes). Check out past polls.