Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: regular expressions

by toolic (Bishop)
on Jun 06, 2015 at 20:47 UTC ( [id://1129313]=note: print w/replies, xml ) Need Help??


in reply to regular expressions

Your grep filters out entire lines. Do you have multiple words on each line? If so, all you need is one word on a line to have 4 consecutive consonants to get a match.

Another way, using a negated character class:

use warnings; use strict; use Data::Dumper; my @words; while (<DATA>) { chomp; push @words, grep { /[^aeiouy]{4}/i } split; } print Dumper(\@words); __DATA__ abc def ghi AAAAAA jlkm opqr jhggjyg 123 annn jkjkkj bcdefgh

Prints:

$VAR1 = [ 'jlkm', 'jhggjyg', 'jkjkkj' ];

Replies are listed 'Best First'.
Re^2: regular expressions
by Laurent_R (Canon) on Jun 07, 2015 at 15:34 UTC
    I do not think that a negated character class is a good idea for looking for groups of consonants, because, for example, it will pick groups of digits, as shown below under the Perl debugger:
    DB<1> $_ ="123 annn jkjkkj bcdefgh 2015 "; DB<2> push @words, grep { /[^aeiouy]{4}/i } split; DB<3> x \@words; 0 ARRAY(0x600500b18) 0 'jkjkkj' 1 2015 DB<4>

      I agree that doubly-negated character classes can be very tricky, but with care, they can be managed to good effect.

      I think of it this way: Start with  [^\W] which is the same as  [\w] (or just \w). As you point out, this includes digits and _ (underscore) as well as alphas. "Subtract", as it were, the digits with  [^\W\d] and underscore with  [^\W\d_] and you're left with all alpha characters. Then subtract your chosen vowels  [^\W\d_aeiouyAEIUOY] and you're done!

      c:\@Work\Perl\monks>perl -wMstrict -le "my $s = '123 annn xyzzy wwwewww xxx9xxx vvv_vvv eieio p pp ppp 2015 v +wxz vwxzpdq'; ;; my $consonant = qr{ [^\W\d_aeiouyAEIUOY] }xms; ;; printf qq{'$_' } for $s =~ m{ $consonant{4,} }xmsg; " 'vwxz' 'vwxzpdq'

      All this is easier to manage, IMHO, with POSIX character classes or Unicode properties (if you're brave enough to venture out onto the thin, slippery ice of Unicode); both the following definitions work the same in the code above:
          my $consonant = qr{ [^[:^alpha:]aeiouyAEIUOY] }xms;
          my $consonant = qr{ [^\P{PosixAlpha}aeiouyAEIUOY] }xms;
      YMMV. See perlrecharclass, perluniprops.

      (See also the experimental Extended Bracketed Character Classes of version 5.18+; I can't give any examples using these ATM.)


      Give a man a fish:  <%-(-(-(-<

        I agree with you, doubly-negated character classes can be tricky but can also be very useful. I was really reacting to the patterns proposed by Anonymous Monk and by toolic which were just not quite right.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1129313]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (8)
As of 2024-04-18 10:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found