Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

doing a search and replace on a large file

by Anonymous Monk
on Apr 14, 2004 at 12:26 UTC ( #345018=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi.
I have a large file (about 67,000 lines of data).
I would like to delete the lines that do not have a certain pattern in them, and write the remaining lines to a new file.
Any ideas about how I might do this?

Thanks in advance

C

Comment on doing a search and replace on a large file
Re: doing a search and replace on a large file
by matija (Priest) on Apr 14, 2004 at 12:30 UTC
    Just about the way you outlined:
    open(INPUT,"<$bigfile") ||die "Could not open $bigfile:$!\n"; open(OUTPUT,">$newfile")||die "Could not open $newfile:$!\n"; while (<INPUT>) { print OUTPUT $_ # note no semicolon here... unless /whatever condition matches the patterns you want/; } close(INPUT) || die "Could not close $bigfile: $!\n"; close(OUTPUT)|| die "Could not close $newfile: $!\n";

      Minor tweak: since the OP wants to "delete" lines that do not match a certain pattern, I'd change the unless into an if. That way, you "save" the lines that do match the pattern ;-)

      open IN, "<$bigfile" or die "Can't open $bigfile: $!\n"; open OUT, ">$newfile" or die "Can't open $newfile: $!\n"; while(<IN>) { print OUT $_ if /the pattern in question/; } close OUT; close IN;
      --
      b10m

      All code is usually tested, but rarely trusted.
      i thought what was needed was to (A) delete matching lines in file...and (B) write the matching lines to another file. If instead you could read the file and produce two output files, one file for the matching lines, and one file for the rest (unmatched lines), then the objective can be achieved with a slight modification to matija code like so:
      open(OUTPUT,">matchingfile.txt")||die; open(OUTPUT2,">nonmatchingfile.txt")||die; while (<>) { chomp; if(/what you want to match/) { print OUTPUT $_ . "\n"; } else { print OUTPUT2 $_ . "\n"; } print OUTPUT $_ # note no semicolon here... unless /whatever condition matches the patterns you want/; } close(OUTPUT)|| die "Could not close $!\n"; close(OUTPUT2)|| die "Could not close $!\n";
      ...and run the prog like this: perl prog.pl <yourinputfile
Re: doing a search and replace on a large file
by pelagic (Curate) on Apr 14, 2004 at 12:48 UTC
    An example how I skip on a couple of conditions:
    use strict; my ($inputfile, $outputfile) = @_; open (OUT, ">$outputfile") || die "could not open $outputfile\n"; open (IN, "<$inputfile") || die "could not open $inputfile\n"; while (<IN>) { chomp; # no newline s/^--.*//; # no oracle comments s/^prompt.*//; # no oracle prompt lines s/^\s+//; # no leading white s/\s+$//; # no trailing white s/\s+/ /; # replace series of white with one spac +e next unless length; # anything left? print OUT $_, "\n"; } close IN; close OUT;

    pelagic
Re: doing a search and replace on a large file
by hardburn (Abbot) on Apr 14, 2004 at 13:12 UTC

    That's a one-liner, provided your pattern is reasonably small. You don't need to open filehandles if you use perl's command line options and the shell to your advantage:

    perl -lne 'print if /pattern here/' old_file.txt > new_file.txt

    ----
    : () { :|:& };:

    Note: All code is untested, unless otherwise stated

      Or, if you want to overwrite the file use the '-i' cmdline option for in-place editting:
      perl -pi -e's/pattern here//g' file
      Update: reread question, you don't want to overwrite the file, so you don't want this.
Re: doing a search and replace on a large file
by blue_cowdawg (Prior) on Apr 14, 2004 at 13:35 UTC

        Any ideas about how I might do this?

    Here's one way...

    #!/usr/bin/perl -w use Tie::File; use strict; my @ry=(); tie @ry,"Tie::File","mybigfile" or die "mybigfile:$!"; @ry = grep /mypattern/,@ry; untie @ry;

    Please note: that was off the top of my head and untested.

Re: doing a search and replace on a large file
by graff (Chancellor) on Apr 15, 2004 at 02:02 UTC
    Most of the solutions above are fine -- especially hardburn's one-liner (that one got my ++!) -- but folks who know about unix command-line tools know that this is usually just a job for the "grep" command:
    grep 'pattern to be kept' old.file > new.file
    Of course, perl offers so much that "grep" can't do: more powerful regexes, support for multiple character encodings, and liberation from the old "every record must be just one line of text" mind-set. How about a Perl version of grep?

    Well, I'm sure I'm not the only who has done this -- I just couldn't stop myself... Here it is: grepp -- Perl version of grep (I wrote it a year or so ago, have been using it regularly on solaris, linux and macosx -- should work fine on ms-windows -- and finally got around to posting it here).

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://345018]
Approved by davis
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (9)
As of 2014-07-28 11:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (196 votes), past polls