Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Filtering out stop words

by Eily (Monsignor)
on Feb 25, 2020 at 14:04 UTC ( #11113404=note: print w/replies, xml ) Need Help??


in reply to Filtering out stop words

Yet another way to do it, which might be way faster if you have very few words to check (in your example you have only one, but this might be done in a loop) is to do it the other way around. Collect the words to be tested first, and then check if the any of the words in your dictionary match:

my @words = get_words_to_check(); my %hash = map { $_ => 1 } @words; while (my $line = <>) { chomp $line; delete $hash{$line} if exists $hash{$line}; # The if exists isn't re +quired here, but it does make it look cleaner } my @good_words = grep { exists $hash{$_} } @words; # Keep the original + order my @good_words_2 = keys %hash; # Don't care about the original order

Or, if borth word lists are sorted, something like this might work:

my $index = 0; LINE: while (my $line = <>) { chomp $line; # While the word in the dictionary is past (or equal) to the word to + check while ($line ge $words[$index]) { # Store the word as OK, unless it is equal to the current dict wor +d push @good_words, $words[$index] unless $line eq $words[$index]; # Use the next word from the list of words to check, if any last LINE if ++$index == @words; } }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11113404]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2020-04-03 20:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    The most amusing oxymoron is:
















    Results (32 votes). Check out past polls.

    Notices?