Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Search and delete lines based on string matching

by Anonymous Monk
on Mar 13, 2007 at 13:44 UTC ( #604526=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a file 'A' which contains n lines of single word strings like

hye bye bin . . n

Now I want to look for these words in some other file 'B' and delete those lines wherever they are found and write the new file as 'C'

Till now I am trying to use this..

#!/usr/local/bin/perl foreach $file (@ARGV) { # open a file and assign the filehandle F open(F, $file) or die("can't open myfile.txt: $!\n"); # read in the whole file into an array of lines @lines = (); while(<F>) { push(@lines, $_); } close(F); # close the filehandle foreach (@lines) { my $string = "@lines"; open(my $infile,"<", file1) or die $!; open (my $outfile,">>", file2) or die $!; while (<$infile>) { if ($_ !~/$string/) { print $outfile $_; } } close $infile; close $outfile; } }

But its giving me error and also not giving results.

Please help !!!

20070313 Janitored by Corion: Added formatting, code tags, as per Writeup Formatting Tips

Comment on Search and delete lines based on string matching
Select or Download Code
Re: Search and delete lines based on string matching
by davorg (Chancellor) on Mar 13, 2007 at 13:54 UTC

    It's really difficult to help you as your code is pretty much unreadable. You should edit your node to be <code> tags around your source code.

    It's also a bad idea to say "But its giving me error" without telling us what the error says.

    I assume that the words you are trying to filter end up in @lines (not the best name for that variable!), but it looks to me as tho' all of the elements in that array will still have newline characters on the end - which makes it harder for them to match other text.

    But, like I say, it's hard to be sure what the problem is until you tidy up the node and give us some better information.

Re: Search and delete lines based on string matching
by ptum (Priest) on Mar 13, 2007 at 14:00 UTC

    Since you posted as Anonymonk, you can't go back and edit your post, but next time, please use

    <code>
    tags.

    What error are you seeing? You didn't tell us.

    Is this a homework problem? We don't mind helping, but we're not particularly inclined to do your homework.

    To solve a problem like this, I would generally read in the contents of file A into a hash, since you just want to use those words as a lookup. Then I would open files B and C, step through the contents of file B a line at a time, and, whenever the line of B contains a word in my hash, drop it on the floor -- otherwise, write that line to file C. I don't think that opening the file handles inside your loop is a good idea.

    You're not really clear as to whether file B contains single words or longer strings -- if longer strings, then you might want to split the line into individual tokens (which can then be individually compared to your hash from file A) or (if the number of words in file A is small enough) you may prefer to build a regular expression by which you evaluate each string. A little more detail might help us to help you more effectively.

      Hey man Sorry for the untidy question. So I have single word strings in both files A and B and that too in a sorted manner. Like A will have bin hye B will have something like bin den mig So C shouldnt have all those things from A which are matched in B.. C should be den mig as bin was matched from A. This is no homework but for some work.Really will appreciate if you can provide the code for your solution: "To solve a problem like this, I would generally read in the contents of file A into a hash, since you just want to use those words as a lookup. Then I would open files B and C, step through the contents of file B a line at a time, and, whenever the line of B contains a word in my hash, drop it on the floor -- otherwise, write that line to file C. I don't think that opening the file handles inside your loop is a good idea. "

        Hmmmm. You didn't answer our question about what error you were seeing from your original code, and (based on the simplicity of the problem) I'm not entirely convinced it isn't homework. Generally, if you want help here at PerlMonks, it is better to show a little more effort, rather than just asking us to provide code. Even so, I'll help to steer you in the right direction with a few untested code snippets.

        Read the contents of file A into a hash:

        use strict; use warnings; my $fh; my $myfile = '/path/to/file/a'; unless (open($fh,"<",$myfile)) { die "Can't open $myfile: $!\n"; } my %delete_words = (); while (<$fh>) { chomp; $delete_words{$_}++; } close($fh);

        So now you have all the words in your delete list in the hash. Next you want to open file B for reading and file C for writing (in much the same way as we opened file A) and step through the lines of file B, one at a time. Each time you have a line of file B, you want to test whether it exists in your hash. If file B contained multiple words per line, you would have to jump through more hoops, but since your file B isn't very complicated, for each line in file B you can just do something like this:

        if (exists($delete_words{$_})) { # do nothing } else { # write to file C }

        That's really all there is to it, except you'll want to explicitly close files B and C.

Re: Search and delete lines based on string matching
by imp (Priest) on Mar 13, 2007 at 14:20 UTC
    In addition to davorg's advice above you should also always use both strict and warnings, as they can help you identify many common problems.

    If you are searching for the words from file A in file B then you will need a different regex. The code you provided is using the entire file A as the regex.

    Here's an example that uses one pattern file, one input file, one output file:

    use strict; use warnings; if (@ARGV != 3) { print "Usage: $0 <pattern file> <input file> <output file>\n"; exit; } my ($pattern_filename, $source_filename, $dest_filename) = @ARGV; open my $pattern_fh, '<', $pattern_filename or die "Failed to open $pa +ttern_filename: $!"; my @tokens = (); while (my $line = <$pattern_fh>) { push @tokens, split /\s/, $line; } # Create a pattern with alternation of tokens, wrapped in a non-captur +ing group, # and a requires word break before and after the word to prevent match +ing pieces # of other words my $pattern = '\b(?:' . join('|', @tokens) . ')\b'; print "Search pattern: $pattern\n"; open my $infile, "<", $source_filename or die "Failed to open $source +_filename: $!"; open my $outfile,">>", $dest_filename or die "Failed to open $dest_f +ilename: $!"; while(my $line = <$infile>) { if ($line !~/$pattern/) { print "adding: $line"; print $outfile $line; } } close($infile); close($outfile);
      Thanks a lot buddy... It did helped..
      Just adding to the above problem..Suppose in the same code I have to add instead of deleting the strings matched.. Like strings that are in A but not in B and I want to add them using some string like "This is new addition" before adding those strings to file B and writing out in C.
        How is your input data formatted?
        1. bin den mig
        2. bin
          den
          mig
        3. bin deg
          mig
        As suggested above, What code have you tried?
        Here's an approach that initializes a hash of output tokens, prepopulated with marker text for new additions. The output data is overridden for existing entries, and a sorted list is appended to the specified output file (open with '>' instead of '>>' if you don't want this). Note that it isn't fully compatible with the delete code I posted earlier, since that code didn't take comments into account.
        #!/usr/local/bin/perl use strict; use warnings; if (@ARGV != 3) { print "Usage: $0 <pattern file> <input file> <output file>\n"; exit; } my ($pattern_filename, $source_filename, $dest_filename) = @ARGV; open my $pattern_fh, '<', $pattern_filename or die "Failed to open $pa +ttern_filename: $!"; my %output_tokens = (); while (my $line = <$pattern_fh>) { chomp $line; $output_tokens{$line} = "$line # Added by script"; } print "Expected tokens: ", join(', ', keys %output_tokens), "\n"; open my $infile, "<", $source_filename or die "Failed to open $source +_filename: $!"; open my $outfile,">>", $dest_filename or die "Failed to open $dest_f +ilename: $!"; while(my $line = <$infile>) { chomp $line; $output_tokens{$line} = $line; } for my $token (sort keys %output_tokens) { print $output_tokens{$token}, "\n"; } close($infile); close($outfile);
Re: Search and delete lines based on string matching
by jdporter (Canon) on Mar 13, 2007 at 15:39 UTC

    Sounds like you're trying to reimplement fgrep -v -F.

    Here's a quick-and-dirty:

    use Getopt::Long; GetOptions( 'file=s' => \my $patfile ); chomp( my @del = do { local @ARGV = ($patfile); <> } ); my %del; @del{ @del } = (); $, = $\ = $/; print grep { chomp; not exists $del{$_} } <>;

    call it like so:

    perl this_script.pl -f A < B > C

    (given A, B, C, per your root post)

    A word spoken in Mind will reach its own level, in the objective world, by its own weight

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://604526]
Approved by kyle
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (3)
As of 2014-09-20 19:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (160 votes), past polls