Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Regex matching end of sentence

by Dr Manhattan (Beadle)
on Jan 31, 2013 at 09:12 UTC ( #1016262=perlquestion: print w/replies, xml ) Need Help??
Dr Manhattan has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks

I need a regex to match sentences ending with a period, but it has to miss abbreviations that might occur in the middle of the sentence.

For instance if I have a sentence 'I like Mr. Smith's dog.', the regex should not only match the 'I like Mr.' part.

if ($in =~ /(\w+)(\!|\?|\.)(\s)((([A-Z])(\w|\s|\d|\(|\)|\+|\=|\-|\@| +\#|\%|\&|\*|\<|\>|\,|\\|\/|\"|\`|'n)+(\s)(\w+\.))(\w|\s|\d|\(|\)|\+|\ +=|\-|\@|\#|\%|\&|\*|\<|\>|\,|\\|\/|\"|\`|'n)+(\s)(\w+\.))(\s)([A-Z])/ +) { if (!exists ($abbreviations{$9})) { $hash{$5}++; } elsif (!exists ($abbreviations{$12})) { $hash{$4}++; } }

I tried this, but it still doesn't work.

%abbreviations is a list of known abbreviations.

%hash is where correctly matched sentences are stored

Any help would be appreciated

Replies are listed 'Best First'.
Re: Regex matching end of sentence
by tmharish (Friar) on Jan 31, 2013 at 09:34 UTC
Re: Regex matching end of sentence
by Anonymous Monk on Jan 31, 2013 at 10:17 UTC
Re: Regex matching end of sentence
by ww (Bishop) on Jan 31, 2013 at 18:20 UTC
    ...and what if the writer of the sentence(s) has (as I do) a weakness for using ellipsis?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1016262]
Approved by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (7)
As of 2017-02-27 09:15 GMT
Find Nodes?
    Voting Booth?
    Before electricity was invented, what was the Electric Eel called?

    Results (378 votes). Check out past polls.