Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Another regex to solve ...

by pat_mc (Pilgrim)
on Aug 18, 2011 at 16:27 UTC ( #921004=perlquestion: print w/ replies, xml ) Need Help??
pat_mc has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed monks ...

Here's an easy regex question for you ... or is it (I mean: 'easy')?

I want to detect all words ending in a vowel followed by the letter 'p'. So far so good. However, I only want the regex to match those words that do not have a double vowel. The regex should thus match 'step' and 'tip' but not 'stoop' or 'steep'. To me it looks like a combination of backreference and look-behind should do the trick ... just how, though, I can't get together. Can you please advise?

Thanks in advance and kind regards -

Pat

EDIT:
I tried something to the extent of /(?<!\1)([aeiou])p$/ but, of course, this did not work since the variable length look-behind is not available in the - admittedly very old - version of Perl that I'm using.

Comment on Another regex to solve ...
Download Code
Re: Another regex to solve ...
by AR (Friar) on Aug 18, 2011 at 16:34 UTC

    If you define vowel as one of 'a', 'e', 'i', 'o' and 'u' (because I know some other monks will point out that I'm being anglocentric :)), a regex that fits your criteria is:

    /(?<![aeiou])[aeiou]p\b/

    Please adjust for case sensitivity as necessary.

    Edit: Did you mean two vowels in a row or the same vowel twice before the 'p'? The regex above solves the former, but not the latter.

      Hi, AR -

      Thanks for your proposal ... but that's not precisely what I wanted ... I do want words like 'feap' to pass ... only words in which the same vowel is duplicated before the 'p' should be filtered out. Sorry for not making this perfectly clear right from the start.

      Any alternative suggestions from your side then?

      Cheers -

      Pat
        /((\b|[^aeiou])[aeiou]|[eiou]a|[aiou]e|[aeou]i|[aeiu]o|[aeio]u)p\b/

        but it's not even remotely elegant. I'll keep working on it.

Re: Another regex to solve ...
by toolic (Chancellor) on Aug 18, 2011 at 16:35 UTC
    Not the most elegant:
    use warnings; use strict; while (<DATA>) { chomp; if (/[^aeiou][aeiou]p$/i) { print "$_ match\n"; } else { print "$_ no match\n"; } } __DATA__ carp step tip stoop steep asp food mop up
    prints:
    carp no match step match tip match stoop no match steep no match asp no match food no match mop match up no match
    As you can see, the 2-letter up does not match. Is that ok? UPDATE: I like AR's solution better because it does match for up.
      Yeah, sorry ... same thing: by 'double vowel' I meant the same vowel occurring twice ...
Re: Another regex to solve ... (\2)
by tye (Cardinal) on Aug 18, 2011 at 16:59 UTC
    local $_= "Stop tip stoop put up, tops steep 'creap' sleep soap!"; my @words; push @words, $1 # while /(\b\w*(?!(.)\2)\w[aeiou]p\b)/g; while /(\b(?:\w*(?!(.)\2)\w)?[aeiou]p\b)/g; print "( @words )\n"; __END__ ( Stop tip up creap soap )

    Update: Changed . to \w so would not match, for example, "no-op" (if you want '-' allowed in words, then replace \w with, for example, [-\w] both places). Then: added (?:...)? to match two-letter words (since my original attempt that handled two-letter words fails because (?<!\2) is not smart enough to realize the fixed length of \2).

    - tye        

      Here are some other ways to do it, including one that doesn't work and one that almost works...

      local $_= "Up stop 'Oop' tip stoop put\nup, tops steep 'creap' sleep s +oap!"; my @words; push @words, $1 # Misses "up" if first word in string: # while /(\b\w*(?<=(.))(?!\2)[aeiou]p\b)/gi; # Would work if (?<=...|...) were smarter: # while /(\b\w*(?<=^|(.)(?!\2))[aeiou]p\b)/gsi; # How to work around (?<=...|...) being dumb: # while /(\b\w*(?:(?<=^)|(?<=(.)(?!\2)))[aeiou]p\b)/gsi; # (?<=^) can be shortened to just ^: while /(\b\w*(?:^|(?<=(.)(?!\2)))[aeiou]p\b)/gsi; # Or just skip the complex check for 2-letter words: # while /(\b(?:\w*(?<=(.))(?!\2))?[aeiou]p\b)/gsi; print "( @words )\n"; __END__ ( Up stop tip up creap soap )

      - tye        

Re: Another regex to solve ...
by sundialsvc4 (Abbot) on Aug 18, 2011 at 17:03 UTC

    Can you solve your problem, sufficiently well, using a combination of regexes and procedural code?   One regular expression could, for example, locate all source lines in the document which contain “a vowel that is not immediately followed by another vowel,” leading to an if-statement in which the matching lines are further examined by whatever means seem appropriate.

    Sure, “regex golf” is instructive.   It can even be entertaining.   But it can also devolve into a waste of time...

      The answer would be "Yes" on all accounts :-)
Re: Another regex to solve ...
by johngg (Abbot) on Aug 18, 2011 at 17:49 UTC

    I've not done much testing but this seems to work by putting the capture in the look-behind.

    knoppix@Microknoppix:~$ perl -E ' > for ( qw{ soap creep top groat loop } ) > { > say unless m{(?<=([aeiou]))\1p\z}; > }' soap top groat knoppix@Microknoppix:~$

    I hope this is helpful.

    Cheers,

    JohnGG

          say unless m{(?<=([aeiou]))\1p\z};

      That allows eg 'help' or slurp'.

      Update: I've no idea why though, or why it also allows 'hops' and 'hoops'...

      Update 2: Got it! It also of course allows 'rabbit', or any other string that doesn't end in 'p' preceded by whatever. It was the unless that momentarily confused me. :)

        Yes, that was a pretty woeful attempt on my part. I must have been thinking about a sub-set of words all ending with 'p' rather than the general case :-(

        Cheers,

        JohnGG

      Hi, johngg -

      I like the approach of back-referencing the match from the look-behing ... the only issue I have with the code you propose is that it over-generates in the sense that it will also pass strings all other strings that do not match the regex like 'stp' ... and that, of course, it shouldn't since we only want words to pass that have a non-double vowels in front of the word-terminal 'p'.
Re: Another regex to solve ...
by Not_a_Number (Parson) on Aug 18, 2011 at 17:58 UTC
    my @tests = qw/ p up hip hop hoop heap help hops /; for ( @tests ) { say if reverse =~ /^p([aeiou])(?!\1)/; }
      Gooooood thinking, Not_a_Number!!!!

      I like this one ... reversing the string before matching it ... how neat is that? Excellent! Goooood on ya! And thanks for the idea. This is coooooool stuff!

      And who ever said 'regex golfing was a waste of time'? Far from it ... it's like Perl philosophy ... pure aesthetics ... sheer bliss. Thanks agani for this near little twist!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://921004]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2014-10-02 08:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (51 votes), past polls