Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Search and delete lines based on string matching

by Anonymous Monk
on Mar 13, 2007 at 13:44 UTC ( [id://604526]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a file 'A' which contains n lines of single word strings like

hye bye bin . . n

Now I want to look for these words in some other file 'B' and delete those lines wherever they are found and write the new file as 'C'

Till now I am trying to use this..

#!/usr/local/bin/perl foreach $file (@ARGV) { # open a file and assign the filehandle F open(F, $file) or die("can't open myfile.txt: $!\n"); # read in the whole file into an array of lines @lines = (); while(<F>) { push(@lines, $_); } close(F); # close the filehandle foreach (@lines) { my $string = "@lines"; open(my $infile,"<", file1) or die $!; open (my $outfile,">>", file2) or die $!; while (<$infile>) { if ($_ !~/$string/) { print $outfile $_; } } close $infile; close $outfile; } }

But its giving me error and also not giving results.

Please help !!!

20070313 Janitored by Corion: Added formatting, code tags, as per Writeup Formatting Tips

Replies are listed 'Best First'.
Re: Search and delete lines based on string matching
by imp (Priest) on Mar 13, 2007 at 14:20 UTC
    In addition to davorg's advice above you should also always use both strict and warnings, as they can help you identify many common problems.

    If you are searching for the words from file A in file B then you will need a different regex. The code you provided is using the entire file A as the regex.

    Here's an example that uses one pattern file, one input file, one output file:

    use strict; use warnings; if (@ARGV != 3) { print "Usage: $0 <pattern file> <input file> <output file>\n"; exit; } my ($pattern_filename, $source_filename, $dest_filename) = @ARGV; open my $pattern_fh, '<', $pattern_filename or die "Failed to open $pa +ttern_filename: $!"; my @tokens = (); while (my $line = <$pattern_fh>) { push @tokens, split /\s/, $line; } # Create a pattern with alternation of tokens, wrapped in a non-captur +ing group, # and a requires word break before and after the word to prevent match +ing pieces # of other words my $pattern = '\b(?:' . join('|', @tokens) . ')\b'; print "Search pattern: $pattern\n"; open my $infile, "<", $source_filename or die "Failed to open $source +_filename: $!"; open my $outfile,">>", $dest_filename or die "Failed to open $dest_f +ilename: $!"; while(my $line = <$infile>) { if ($line !~/$pattern/) { print "adding: $line"; print $outfile $line; } } close($infile); close($outfile);
      Thanks a lot buddy... It did helped..
      Just adding to the above problem..Suppose in the same code I have to add instead of deleting the strings matched.. Like strings that are in A but not in B and I want to add them using some string like "This is new addition" before adding those strings to file B and writing out in C.
        Here's an approach that initializes a hash of output tokens, prepopulated with marker text for new additions. The output data is overridden for existing entries, and a sorted list is appended to the specified output file (open with '>' instead of '>>' if you don't want this). Note that it isn't fully compatible with the delete code I posted earlier, since that code didn't take comments into account.
        #!/usr/local/bin/perl use strict; use warnings; if (@ARGV != 3) { print "Usage: $0 <pattern file> <input file> <output file>\n"; exit; } my ($pattern_filename, $source_filename, $dest_filename) = @ARGV; open my $pattern_fh, '<', $pattern_filename or die "Failed to open $pa +ttern_filename: $!"; my %output_tokens = (); while (my $line = <$pattern_fh>) { chomp $line; $output_tokens{$line} = "$line # Added by script"; } print "Expected tokens: ", join(', ', keys %output_tokens), "\n"; open my $infile, "<", $source_filename or die "Failed to open $source +_filename: $!"; open my $outfile,">>", $dest_filename or die "Failed to open $dest_f +ilename: $!"; while(my $line = <$infile>) { chomp $line; $output_tokens{$line} = $line; } for my $token (sort keys %output_tokens) { print $output_tokens{$token}, "\n"; } close($infile); close($outfile);
        As suggested above, What code have you tried?
        How is your input data formatted?
        1. bin den mig
        2. bin
          den
          mig
        3. bin deg
          mig
Re: Search and delete lines based on string matching
by davorg (Chancellor) on Mar 13, 2007 at 13:54 UTC

    It's really difficult to help you as your code is pretty much unreadable. You should edit your node to be <code> tags around your source code.

    It's also a bad idea to say "But its giving me error" without telling us what the error says.

    I assume that the words you are trying to filter end up in @lines (not the best name for that variable!), but it looks to me as tho' all of the elements in that array will still have newline characters on the end - which makes it harder for them to match other text.

    But, like I say, it's hard to be sure what the problem is until you tidy up the node and give us some better information.

Re: Search and delete lines based on string matching
by ptum (Priest) on Mar 13, 2007 at 14:00 UTC

    Since you posted as Anonymonk, you can't go back and edit your post, but next time, please use

    <code>
    tags.

    What error are you seeing? You didn't tell us.

    Is this a homework problem? We don't mind helping, but we're not particularly inclined to do your homework.

    To solve a problem like this, I would generally read in the contents of file A into a hash, since you just want to use those words as a lookup. Then I would open files B and C, step through the contents of file B a line at a time, and, whenever the line of B contains a word in my hash, drop it on the floor -- otherwise, write that line to file C. I don't think that opening the file handles inside your loop is a good idea.

    You're not really clear as to whether file B contains single words or longer strings -- if longer strings, then you might want to split the line into individual tokens (which can then be individually compared to your hash from file A) or (if the number of words in file A is small enough) you may prefer to build a regular expression by which you evaluate each string. A little more detail might help us to help you more effectively.

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Search and delete lines based on string matching
by jdporter (Chancellor) on Mar 13, 2007 at 15:39 UTC

    Sounds like you're trying to reimplement fgrep -v -F.

    Here's a quick-and-dirty:

    use Getopt::Long; GetOptions( 'file=s' => \my $patfile ); chomp( my @del = do { local @ARGV = ($patfile); <> } ); my %del; @del{ @del } = (); $, = $\ = $/; print grep { chomp; not exists $del{$_} } <>;

    call it like so:

    perl this_script.pl -f A < B > C

    (given A, B, C, per your root post)

    A word spoken in Mind will reach its own level, in the objective world, by its own weight

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://604526]
Approved by kyle
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2024-03-19 06:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found