Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Filtering Source Text File with 2nd Text File of Terms

by Loops303 (Novice)
on Apr 02, 2012 at 22:57 UTC ( #963141=perlquestion: print w/replies, xml ) Need Help??
Loops303 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I am a novice at Perl.

I have a SOURCE text file with a list of strings, such as

and a 2nd text file FILTER TERMS with a list of terms such as


What I want to do is read the 2nd file and using the list of those terms, to filter out the first file.

The desired end result OUTPUT would be

It would not write --- because it matches the "google" --- because it matches the "manish" --- because it matches the ""

Can anyone please help me figure out how to do this?

Here is the code I have thus far (this is the 20th iteration of various attempts, having spent about 5 hours on this already today --- see, I am new at this!)
#!/usr/bin/perl open (F1, "<filterTerms.txt"); open (F2, "<source.txt"); my %terms = (); my %source = (); while (<F1>) { my $term=$_; chomp ($term); $terms{$term}=$term; } while (<F2>) { my $item=$_; chomp ($item); $source{$item}=$item; foreach (keys %source) { if ($source=~m/($term{$term})/) { #do nothing } else { print $1."\n"; } } } close (F1); close (F2);
Thank you.

Replies are listed 'Best First'.
Re: Filtering Source Text File with 2nd Text File of Terms
by Riales (Hermit) on Apr 02, 2012 at 23:49 UTC

    Your main problem is when you check to see if the source matches any of the terms, you're only checking the last term in the file.

    You're also trying to print a match with the $1 but that's not really what you want.

    Beyond that, is there a particular reason you're choosing to use hashes instead of arrays? I would think arrays are more what you want.

    # Building the array of terms: my @terms = (); while (my $term = <F1>) { chomp $term; push @terms, $term; }

    This way, when you are checking each term against the source, you just need to do this:

    # Printing sources that do not match of of the terms: while (my $source = <F2>) { chomp $source; print "$source\n" unless grep { $source =~ /$_/ } @terms; }
      while my $term (<F1>) {

      That is a syntax error.    Perhaps you meant:

      while ( my $term = <F1> ) {

      foreach my $source (<F2>) {

      Why would you read in the whole file instead of just reading one line at a time?    Perhaps you meant:

      while ( my $source = <F2> ) {

        Argh, you're absolutely right. I guess I was just too eager to fire off my response. I'll change my original post.

        Thanks for catching that.
      i think i was considering a hash would allow me to check all the terms at once, as opposed to do it a line at a time and output unfiltered items into the output... but thanks for the tip, i will give it a try. very helpful. this site rules.
Re: Filtering Source Text File with 2nd Text File of Terms
by vitoco (Friar) on Apr 03, 2012 at 17:53 UTC

    Please note that unescaped special characters in strings used as patterns could give unpredictable results!!!

    Example: the term "" will also match lines with "wwwithisisannoyingacom"...

    If the terms from the list are single words, probably the test from previous posts should be:

    print "$source\n" unless grep { $source =~ /\b$_\b/ } @terms;

    where \b is used to check for word boundaries, so "googleeee" won't be matched by "google" term.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://963141]
Approved by ww
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2018-06-19 16:23 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (114 votes). Check out past polls.