Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Matching Exact Word

by jonadab (Parson)
on Oct 09, 2014 at 11:01 UTC ( #1103283=note: print w/replies, xml ) Need Help??


in reply to Re^2: Matching Exact Word
in thread Matching a Word Exactly

This will fail in some cases.

Geographically, Guinea is thousands of miles from here. (This fails immediately because of the comma; if the comma were removed, it would still fail.)

If what you want is to match Guinea but not New Guinea or Equatorial Guinea, then what you probably really want is a negative lookbehind assertion that specifically rules out being preceded by "New " or "Equatorial ". Similarly, a negative lookahead assertion at the end can preclude Guinea Pig and Guinnea-Bisseau.

Replies are listed 'Best First'.
Re^2: Matching Exact Word
by Jim (Curate) on Oct 09, 2014 at 18:33 UTC
    If what you want is to match Guinea but not New Guinea or Equatorial Guinea, then what you probably really want is a negative lookbehind assertion that specifically rules out being preceded by "New " or "Equatorial "

    One caveat:  You can't use alternation in the look-behind assertion because variable-length negative look-behind assertion isn't supported. Instead, you must list the alternatives separately. You can, of course, use alternation in the look-ahead assertion.

    use strict; use warnings; my $pattern = qr{ (?<!New\s) (?<!Equatorial\s) Guinea (?![\s-](?:Bissau|pig)) }ix; while (my $text = <DATA>) { my $match = $text =~ m/$pattern/ ? 1 : 0; print "$match $text"; # This prints... # 0 Papua New Guinea # 1 I live in Guinea. # 1 i live in guinea, but i don't have a shift key. # 0 Guinea-Bissau # 0 Guinea Bissau # 0 Equatorial Guinea # 0 I love guinea pigs! } __DATA__ Papua New Guinea I live in Guinea. i live in guinea, but i don't have a shift key. Guinea-Bissau Guinea Bissau Equatorial Guinea I love guinea pigs!
Re^2: Matching Exact Word
by kzwix (Sexton) on Oct 09, 2014 at 11:20 UTC

    You are right. However, even this code may fail, if somebody misspells the country names.

    It is more a linguistic problem than a pattern recognition one, and, as such, seems extraordinary difficult to tackle in a failproof way (which would require an AI, a syntaxic and contextual analysis, etc.)

    However, as you mentioned, using negative look-ahead and negative look-behind assertions should allow him to avoid the most common other words.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1103283]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (4)
As of 2019-12-16 00:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?