You don't always have to use regexes

As Perl programmers, we love our regular expressions. It's one of the things that makes Perl so Perly. However, they're not always necessary.

If you're writing something like

if ( $value =~ /^true$/i )
[download]

then write it as

if ( lc $value eq "true" )
[download]

instead.

xoxo,
Andy

Comment on You don't always have to use regexes Select or Download Code

Replies are listed 'Best First'.
Re: You don't always have to use regexes by kvale (Monsignor) on Feb 23, 2005 at 16:26 UTC
Using the simplest op that gets the job done is always good advice, both for speed and readability. But for those that are addicted to regexes, the above situation won't bite speed too hard. The regex engine optimizes a fixed string to a Boyer Moore match, which is a tad slower than string equality: `use Benchmark qw(:all) ; my $value = 'FALSE'; my $count = 10_000_000; cmpthese($count, { 'regex' => sub { $value =~ /^true$/i }, 'eq' => sub { lc $value eq "true" }, });` [download] yields `Benchmark: timing 10000000 iterations of eq, regex... 1048% perl boyer.pl Benchmark: timing 10000000 iterations of eq, regex... eq: 9 wallclock secs ( 8.98 usr + 0.00 sys = 8.98 CPU) @ 11 +13585.75/s (n=10000000) regex: 16 wallclock secs (16.31 usr + 0.00 sys = 16.31 CPU) @ 61 +3120.78/s (n=10000000) Rate regex eq regex 613121/s -- -45% eq 1113586/s 82% --` [download] Unless that match is inside a tight loop, program performance will not be too degraded, -Mark	[reply] [d/l] [select]
Re: You don't always have to use regexes by spurperl (Priest) on Feb 23, 2005 at 16:02 UTC
It's quite interesting to time this and see just how much performance is gained. Additionally, I'm curious whether the regex engine has, or planned to have optimizations on "static" expressions like this ? Additionally, usage of `substr` can save quite a few regular expressions here and there. But the rule of thumb should be: use whatever seems more natural for the problem at hand, and optimize only if necessary.	[reply]
Re: You don't always have to use regexes by VSarkiss (Monsignor) on Feb 23, 2005 at 16:22 UTC
Overuse of regexes is one of my favorite pet peeves also. As you point out, `eq` will sometimes do everything you need. Other times all you need is `index`. For example, if your regex didn't have anchors: `if ( $value =~ /true/i )` You could write instead `if ( index( lc $value, "true" ) >= 0 )` Do not rebuke them with harsh words ... but rather lead them gently - with URLs - so that they may learn wisdom.	[reply] [d/l] [select]
Re^2: You don't always have to use regexes by kvale (Monsignor) on Feb 23, 2005 at 16:32 UTC
I think that for the index case the situation is not so clear. Both the regex engine and index() will use the same Boyer-Moore routine and for me personally, the regex version is more readable. But as always, YMMV. `use Benchmark qw(:all) ; my $value = 'FALSE'; my $count = 1_000_000; cmpthese($count, { 'regex' => sub { $value =~ /^true$/i }, 'eq' => sub { lc $value eq "true" }, 'index' => sub { index( lc $value, "true" ) >= 0 }, });` [download] yields `Benchmark: timing 1000000 iterations of eq, index, regex... eq: 1 wallclock secs ( 0.89 usr + 0.00 sys = 0.89 CPU) @ 11 +23595.51/s (n=1000000) index: 2 wallclock secs ( 1.65 usr + 0.00 sys = 1.65 CPU) @ 60 +6060.61/s (n=1000000) regex: 2 wallclock secs ( 1.63 usr + 0.00 sys = 1.63 CPU) @ 61 +3496.93/s (n=1000000) Rate index regex eq index 606061/s -- -1% -46% regex 613497/s 1% -- -45% eq 1123596/s 85% 83% --` [download] Update: As AM has pointed out (thank you!), the benchmark above has a bug. Using the tests `'regex' => sub { $value =~ /true/i }, 'regex_anch' => sub { $value =~ /^true$/i }, 'eq' => sub { lc $value eq "true" }, 'index' => sub { index( lc $value, "true" ) >= 0 },` [download] I get the results Benchmark: timing 1000000 iterations of eq, index, regex, regex_anch.. +. eq: 1 wallclock secs ( 0.88 usr + 0.00 sys = 0.88 CPU) @ 11 +36363.64/s (n=1000000) index: 0 wallclock secs ( 1.65 usr + 0.00 sys = 1.65 CPU) @ 60 +6060.61/s (n=1000000) regex: 0 wallclock secs ( 1.08 usr + 0.00 sys = 1.08 CPU) @ 92 +5925.93/s (n=1000000) regex_anch: 2 wallclock secs ( 1.59 usr + 0.00 sys = 1.59 CPU) @ 62 +8930.82/s (n=1000000) Rate index regex_anch regex eq index 606061/s -- -4% -35% -47% regex_anch 628931/s 4% -- -32% -45% regex 925926/s 53% 47% -- -19% eq 1136364/s 87% 81% 23% -- [download] with the surprising result that the regex w/o the anchor is faster than the anchored version. Multiple runs yield similar results. As the AM says, one could try many different regex-value combos, but I expect the results to be not far different, precisely because both index and regex engine use the same BM function. -Mark	[reply] [d/l] [select]
Re^3: You don't always have to use regexes by Anonymous Monk on Feb 24, 2005 at 02:59 UTC
You Benchmark is significantly flawed for the question asked. The OR (original replier) wanted to compare `index(lc $value,"true")` with `$value =~ /true/i;` In addition, to fairly benchmark one should try multiple test case (set `$value` to "true", a short string, and a longer string in your test, and in a fair test, set it to: 'true', 'ashortstringthentrue', 'averylongstringthentrue', and different size strings without 'true' in them.	[reply] [d/l] [select]
Re^2: You don't always have to use regexes by holli (Abbot) on Feb 23, 2005 at 16:39 UTC
I benchmarked this and it yields an interesting result. `index()` is (a bit) faster than a regex. If it´s used in combination with `lc()`, as in your example, the regex with the i-modifier is faster. use strict; use warnings; use Benchmark; my $value = "somewhere here true is there!"; timethese ( 9000000, { 'index' => sub { index( $value, "true" ) }, 'regex' => sub { $value =~ /true/ }, } ); timethese ( 9000000, { 'index' => sub { index( lc $value, "true" ) }, 'regex' => sub { $value =~ /true/i }, } ); Benchmark: timing 9000000 iterations of index, regex... index: 2 wallclock secs ( 2.02 usr + 0.00 sys = 2.02 CPU) @ 44 +46640.32/s (n=9000000) regex: 4 wallclock secs ( 2.40 usr + -0.01 sys = 2.39 CPU) @ 37 +60969.49/s (n=9000000) Benchmark: timing 9000000 iterations of index, regex... index: 4 wallclock secs ( 4.55 usr + 0.00 sys = 4.55 CPU) @ 19 +79762.43/s (n=9000000) regex: 3 wallclock secs ( 3.68 usr + 0.00 sys = 3.68 CPU) @ 24 +48313.38/s (n=9000000) [download] Update: Ack. I really need to learn to type faster. holli, /regexed monk/	[reply] [d/l] [select]
Re: You don't always have to use regexes by Anonymous Monk on Feb 24, 2005 at 03:32 UTC
Code Smarter: Compulsory linke to Japhy's node making the same sugestion, and more. Edited by davido: fixed broken link.	[reply]
Re: You don't always have to use regexes by ysth (Canon) on Feb 24, 2005 at 19:34 UTC
A proper translation of `if ( $value =~ /^true$/i )` would be: `if ( lc $value eq "true" \|\| lc $value eq "true\n" )` [download] (except that the former potentially sets $&, $`, and $' and the last-successful-regex).	[reply] [d/l] [select]
Re^2: You don't always have to use regexes by petdance (Parson) on Feb 25, 2005 at 03:02 UTC
Yes, but that check for "\n" is really irrelevant. It's required to be functionally identically, but not semantically. Semantics are the real issue here. The regex is saying "Do you have a string that matches the beginning of the string, then t, r, u, e and then the end of the string", and the compare is saying "Is the string the word 'true'?" "Is this the word I want" is the real intent. xoxo, Andy	[reply]
Re^3: You don't always have to use regexes by ysth (Canon) on Feb 25, 2005 at 04:01 UTC
My point was that that is not what the regex is saying. Just my own personal bonnet-bee, but people misinterpret $ way too often, and I feel it deserves publicity whenever it comes up.	[reply]
Re: You don't always have to use regexes by PetaMem (Priest) on Feb 24, 2005 at 21:57 UTC
I suppose, the whole meaning of this example is to show how to programm efficiently - not wasting system ressources (here: CPU time). If this is so, I'd like to put emphasis on the fact, that NO ONE here seems to see a problem in the `"true"` expression. Please do not use interpolation if you do not need it. Try your benchmarks with `'true'` again. Update: Of course I did the benchmarks before posting this node. The speed differences are not extraordinary but constantly about 5% Read more... (4 kB) Bye PetaMem All Perl: MT, NLP, NLU	[reply] [d/l] [select]
Re^2: You don't always have to use regexes by Tanktalus (Canon) on Feb 24, 2005 at 22:52 UTC
Actually, many of us saw it. But we also saw this: Re: To Single Quote or to Double Quote: a benchmark. The point is, the difference in speed is practically meaningless. In the grand scheme of the transition from `$value =~ /true/i` to `lc $value eq "true"`, changing that to `lc $value eq 'true'` is going to have a demonstrably small effect.	[reply] [d/l] [select]
Re^3: You don't always have to use regexes by bmann (Priest) on Feb 24, 2005 at 23:41 UTC
And to support your point, an invariant string inside double-quotes gets compiled down to a single quoted string. Any time wasted is not wasted at run-time. `$cat print.pl print 'Hello'; print "Hello"; # compiles to 'Hello' print "Hello $_"; $perl -MO=Deparse print.pl print 'Hello'; print 'Hello'; print "Hello $_"; print.pl syntax OK` [download] 5.005_03, 5.6.1 and 5.8.4 produce identical results.	[reply] [d/l]
Re^4: You don't always have to use regexes by Ven'Tatsu (Deacon) on Feb 25, 2005 at 14:28 UTC
Re^2: You don't always have to use regexes by petdance (Parson) on Feb 25, 2005 at 03:06 UTC
I suppose, the whole meaning of this example is to show how to programm efficiently - not wasting system ressources (here: CPU time). Absolutely not. That has nothing to do with it. CPU efficiencies on the scale that we're talking about are irrelevant. The point is to use the construct that most closely matches the semantics of what you're trying to achieve. If you're wondering if one string is the word "true", then that's not a pattern match, it's a string comparison. xoxo, Andy	[reply]
Re^3: You don't always have to use regexes by PetaMem (Priest) on Feb 25, 2005 at 15:35 UTC
If you're wondering if one string is the word "true", then that's not a pattern match, it's a string comparison. Ok, I second that. Probably I was mislead by the immediate popup of benchmarks in this thread. Bye PetaMem All Perl: MT, NLP, NLU	[reply]

Back to Meditations