Re: regex to find vowels in anyorder
by BrowserUk (Patriarch) on Dec 19, 2011 at 15:45 UTC
|
@words = do{ local @ARGV = 'words.txt'; <> }; chomp @words;;
m[(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] and print for @words;;
aboideau
aboideaus
aboideaux
aboiteau
aboiteaus
aboiteaux
absolutive
absolutize
absolutized
absolutizes
abstemious
abstemiously
abstemiousness
...
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
| [reply] [d/l] |
|
You can anchor that regex to the start of the string with ^ and have it fail faster; in practice I haven't been able to measure a difference, so either perl is clever enough, or the whole thing is determined by IO performance.
| [reply] [d/l] |
|
[0] Perl> $c =0; $t = time; m[(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] and
+ ++$c for @words; printf "Found $c matches in %.5f seconds\n", (time(
+)-$t);;
Found 1905 matches in 0.14288 seconds
[0] Perl> $c =0; $t = time; m[(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] and
+ ++$c for @words;
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.12593 seconds
[0] Perl> $c =0; $t = time; m[(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] and
+ ++$c for @words;
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.13659 seconds
[0] Perl> $c =0; $t = time; m[(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] and
+ ++$c for @words;
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.14437 seconds
[0] Perl> $c =0; $t = time; m[(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] and
+ ++$c for @words;
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.13993 seconds
[0] Perl> $c =0; $t = time; m[(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] and
+ ++$c for @words;
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.13856 seconds
[0] Perl> $c =0; $t = time; m[^(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] an
+d ++$c for @words;
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.13786 seconds
[0] Perl> $c =0; $t = time; m[^(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] an
+d ++$c for @words;
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.13947 seconds
[0] Perl> $c =0; $t = time; m[^(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] an
+d ++$c for @words;
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.12269 seconds
[0] Perl> $c =0; $t = time; m[^(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] an
+d ++$c for @words;
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.13944 seconds
[0] Perl> $c =0; $t = time; m[(?=^.*a)(?=^.*e)(?=^.*i)(?=^.*o)(?=^.*u)
+] and ++$c for @words;
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.12400 seconds
[0] Perl> $c =0; $t = time; m[(?=^.*a)(?=^.*e)(?=^.*i)(?=^.*o)(?=^.*u)
+] and ++$c for @words;
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.14011 seconds
[0] Perl> $c =0; $t = time; m[(?=^.*a)(?=^.*e)(?=^.*i)(?=^.*o)(?=^.*u)
+] and ++$c for @words;
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.13754 seconds
[0] Perl> $c =0; $t = time; m[(?=^.*a)(?=^.*e)(?=^.*i)(?=^.*o)(?=^.*u)
+] and ++$c for @words;
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.13191 seconds
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
| [reply] [d/l] |
Re: regex to find vowels in anyorder
by SuicideJunkie (Vicar) on Dec 19, 2011 at 15:04 UTC
|
if (/a/ and /e/ and /i/ and /o/ and /u/ and (rand()>0.5 or /y/))
{
...
}
| [reply] [d/l] |
|
I wouldn't do this as an "anded if" but in a loop with an array of vowels. As soon as the script has found the first vowel the script would continue with the next vowel but if the vowel wasn't found it would stop the whole search as it's not interesting whether the other vowels are in the string.
The performance of my solution depends on the distribution of the strings. Statistically I'm too weak to tell you what is faster. The longer your strings are the worse is my solution. If your search contains a lot of strings and there are some with a length less than number of vowels you could skip the whole test as in a less-than-five(six)-letter-string there is evidently no chance for five(six) different vowels.
| [reply] |
|
I wouldn't do this as an "anded if" but in a loop with an array of vowels. As soon as the script has found the first vowel the script would continue with the next vowel but if the vowel wasn't found it would stop the whole search as it's not interesting whether the other vowels are in the string.
Uhm, in which way does an "anded if" continue with a next vowel after a previous vowel wasn't found?
| [reply] |
|
|
|
|
|
As JavaFan and ww are discussing, your post seems to imply that you think the anded if does not short circuit when the first vowel is not found. That is not the case.
A loop to put each vowel into regexes sequentially would have the advantage of not hardcoding the vowels, though it would do at least as much work.
| [reply] [d/l] |
|
Not a homework. but this question raised when we were discussing about regex between perl developers...
I am looking for one expression to find it.
| [reply] |
|
So, what code did the discussion between you Perl developers produce? What advantages are there to it? What disadvantages? If you are developers, you will surely have written code. If none of you developers has an idea, it surprises me that none of you developers has a question about perlre.
| [reply] |
|
| [reply] [d/l] |
Re: regex to find vowels in anyorder
by toolic (Bishop) on Dec 19, 2011 at 16:34 UTC
|
| [reply] |
Re: regex to find vowels in anyorder (obfuscated)
by eyepopslikeamosquito (Archbishop) on Dec 19, 2011 at 21:01 UTC
|
This one can be easily solved without using a regex.
For example, a one-liner featuring the good ol' tr (aka y) operator, punctuated with just & characters (sorry couldn't resist):
perl -ne 'y&a&&&&y&e&&&&y&i&&&&y&o&&&&y&u&&&&print' words.txt
| [reply] [d/l] [select] |
Re: regex to find vowels in anyorder
by AnomalousMonk (Archbishop) on Dec 19, 2011 at 19:31 UTC
|
... all 5 vowels in it a,e,i,o,u.
Just as a cautionary side-note, the character set [a,e,i,o,u] (which is what I assume was originally posted without awareness of the effect of square brackets) includes ',' (comma) as a vowel! Please see Markup in the Monastery and Writeup Formatting Tips.
| [reply] [d/l] |
|
I don't think the OP was posting a character set -- he's just listing what he considers vowels.
| [reply] |
Re: regex to find vowels in anyorder
by Not_a_Number (Prior) on Dec 19, 2011 at 20:20 UTC
|
open my $fh, '<', 'whatever'; # or whatever
my %hash = ( a => 0, e => 1, i => 2, o => 3, u => 4 );
while ( <$fh> ) {
chomp;
my $copy = $_;
my @array;
while ( my $char = lc chop $copy ) {
no warnings 'uninitialized';
$array[ $hash{ $char } ] = 1;
}
say if ( grep $_, @array ) == 5;
}
Pretty fast, too, you'll find...
Update: Hmm, if you take out the line no warnings 'uninitialized'; (and therefore remove use warnings or whatever from the start of the code), it seems to run nearly 20% faster still...
Comments welcome.
Update 2: Oops, just realised that my code doesn't work! Change the contents of the inner while loop to:
$array[ $hash{ $char } ] = 1 if defined $hash{ $char };
Which is what I had originally, before playing with no warnings 'uninitialized';. And then I didn't test properly.
Mea culpa. | [reply] [d/l] [select] |
Re: regex to find vowels in anyorder
by kennethk (Abbot) on Dec 19, 2011 at 15:28 UTC
|
Since you have 5 independent, overlapping searches you wish to perform simultaneously, you necessarily need something that won't consume letters on matching (assuming you don't want to use embedded code to cache results in a hash). Variable width match without consuming == look-ahead, so I'd start looking there. Looking ahead and looking behind. | [reply] |
Re: regex to find vowels in anyorder
by pvaldes (Chaplain) on Dec 19, 2011 at 15:46 UTC
|
if (/a/i){
if(/e/i){
if(/i/i){
if(/o/i){
if(/u/i){
print "we found the five vowels";
}
}
}
}
}
| [reply] [d/l] |
|
I think BrowserUk's solution is going to be faster. But past that I would not use nested if's when you mean "and" or "&&". Get rid of 4 unnecessary levels of indentation.
| [reply] |
|
I think BrowserUk's solution is going to be faster.
That would not be my guess, unless it's the /i that's killing the performance. /a/ will not use the regexp engine, the optimizer will do it. If speed is an issue, and you want to be case sensitive, my bet would go to:
if ((/a/ || /A/) && (/e/ || /E/) && (/i/ || /I/) && (/o/ || /O/) && (/
+u/ || /U/)) { ... }
but I'm too lazy to come up with a good benchmark (which should test for both match and non-match). And if the query set would be English words, I'd order the vowels from least frequently occurring to most frequently (probably u-i-o-a-e, but I'd have to look that up), in order to fail faster.
Of course, it's also very likely speed doesn't matter at all. | [reply] [d/l] [select] |
A reply falls below the community's threshold of quality. You may see it by logging in. |