Style question: regex versus string builtin function

eyepopslikeamosquito has asked for the wisdom of the Perl Monks concerning the following question:

From some code I inherited recently:

if ( $line =~ /$DELIMITER/ ) { ...
[download]

Now, this an accident waiting to happen -- what if $DELIMITER contains a regex metachar? I suppose the obvious fix is:

if ( $line =~ /\Q$DELIMITER/ ) { ...
[download]

though perhaps this is better/faster:

if ( index($line, $DELIMITER) >= 0 ) { ...
[download]

How would you do it?

Comment on Style question: regex versus string builtin function Select or Download Code

Replies are listed 'Best First'.
Re: Style question: regex versus string builtin function by shmem (Chancellor) on Oct 02, 2007 at 07:53 UTC
If `$DELIMITER` was dynamic and could contain a regex, I'd use a m//, otherwise index. --shmem _($_=" "x(1<<5)."?\n".q·/)Oo. G°\ / /\_¯/(q / ---------------------------- \__(m.====·.(_("always off the crowd"))."· ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}	[reply] [d/l]
Re: Style question: regex versus string builtin function by johngg (Canon) on Oct 02, 2007 at 08:57 UTC
If `$DELIMITER` was static and was being tested for more than once in the code I might consider making a compiled regex. `my $rxDELIMITER = qr{\Q$DELIMITER\E}; ... if ( $line =~ $rxDELIMITER ) { ...` [download] I probably reach for regexen too quickly without even considering the use of `index`. I suspect I'm not the only one with a bit of a blind spot there. Cheers, JohnGG	[reply] [d/l] [select]
Re: Style question: regex versus string builtin function by throop (Chaplain) on Oct 02, 2007 at 11:47 UTC
Use `index`. Even after using `\Q`, there are other odd cases lurking. From perlreref `If 'pattern' is an empty string, the last I matched regex is used.` Also `You cannot include a literal $ or @ within a \Q sequence. An unescaped $ or @ interpolates the corresponding variable, while escaping will cause the literal string \$ to be matched. You'll need to write something like m/\Quser\E\@\Qhost/.` The real 'style' question here, though, is Which form is most maintainable, most understandable when somebody looks at it two years from now? And this use of the `$DELIMITER` is going to be rather opaque in either case. Therefore, the most important element of style here is a generous set of comments, explaining why `$DELIMITER` was broken out a separate variable (or constant.) throop Update: lidden's point is well taken; even a zero-width assertion like `\Q` keeps the pattern from being empty. But see the discussion that follows	[reply]
Re^2: Style question: regex versus string builtin function by lidden (Curate) on Oct 02, 2007 at 12:03 UTC
But 'pattern' is not an empty string after using `\Q`.	[reply] [d/l]
Re^3: Style question: regex versus string builtin function by thospel (Hermit) on Oct 02, 2007 at 13:01 UTC
Silly enough this still counts as empty. In general I think the way empty regexes work is just bad design. It should only trigger if the regex is empty at the literal code level, not after all kinds of expansion has been done on the stuff between the delimiters.	[reply]
Re^3: Style question: regex versus string builtin function by kyle (Abbot) on Oct 02, 2007 at 13:55 UTC
A `\Q` does not "fill" an empty regex. `use Test::More 'tests' => 5; ok( 'foo' =~ //, 'empty regex matches' ); ok( 'foo' =~ /foo/, '/foo/ matches' ); ok( !('bar' =~ //), 'repeated match of foo' ); ok( !('bar' =~ /\Q/), 'repeated match with \\Q' ); my $empty = ''; ok( !('bar' =~ /\Q$empty/), 'interpolated empty string same as \\Q' );` [download]	[reply] [d/l] [select]
Re^4: Style question: regex versus string builtin function by throop (Chaplain) on Oct 02, 2007 at 15:24 UTC
Re^5: Style question: regex versus string builtin function by kyle (Abbot) on Oct 02, 2007 at 15:41 UTC
Re^2: Style question: regex versus string builtin function (\Q not an assertion) by lodin (Hermit) on Oct 14, 2007 at 22:55 UTC
a zero-width assertion like `\Q` `\Q` isn't an assertion. It's like `\U` et al. and works in all interpolating quote operators. It's just that one almost always sees it with the regexp operators. An example: `print "\U\Qfoo.bar"; __END__ FOO\.BAR` [download] lodin	[reply] [d/l] [select]
Re: Style question: regex versus string builtin function by thospel (Hermit) on Oct 02, 2007 at 11:33 UTC
I'd definitely go for the regex. If I later would read that code, I'd have to think for half a second about the index code to see that's it's not a "where is this needle", but that it's a "does the needle exist anywhere", while a regex immediately gives that kind of association. If index is faster, that is an implementation detail. If we care, we should just fix the perl optimization code to make them equivalent. But by default clarity not speed is the goal of writing code.	[reply]
Re^2: Style question: regex versus string builtin function by rir (Vicar) on Oct 02, 2007 at 15:02 UTC
that it's a "does the needle exist anywhere", while a regex immediately gives that kind of association `g`, that doesn't work for me. Regexes are inherently more complex to use than the `index` function. There are the various regular expression dialects, there are the modifiers, and there are the global variables upon which they may trample. But, like others, I tend to reach for the match operator. Be well, rir	[reply] [d/l]
Re: Style question: regex versus string builtin function by lima1 (Curate) on Oct 02, 2007 at 11:57 UTC
I use index when I need the match position, otherwise a regex. And it seems that index is NOT faster. Even code like my $pos; if ( $line =~ $regex ) { $pos = length $`; } [download] which gets the match position with a regex is slightly faster (but much uglier of course): Update: For better ways of getting the match position, see How do I retrieve the position of the first occurrence of a match?. Benchmark code: Read more... (2 kB) Benchmark results: `Rate index regex_pos regex regex_compiled_pos rege +x_compiled index 450/s -- -38% -39% -40% + -41% regex_pos 728/s 62% -- -2% -3% + -5% regex 741/s 65% 2% -- -1% + -3% regex_compiled_pos 749/s 66% 3% 1% -- + -2% regex_compiled 763/s 70% 5% 3% 2% + --` [download]	[reply] [d/l] [select]
Re^2: Style question: regex versus string builtin function by oha (Friar) on Oct 02, 2007 at 12:27 UTC
there are some issues about using $`, check perlre. what do you want is m// then pos, this will be faster. Oha update: check the tye's note below	[reply]
Re^3: Style question: regex versus string builtin function (pos) by tye (Sage) on Oct 02, 2007 at 13:53 UTC
Make that `m//g` (note the 'g') in a scalar context and then pos. - tye	[reply]
Re^3: Style question: regex versus string builtin function by lima1 (Curate) on Oct 02, 2007 at 13:17 UTC
Well, you must be careful when you use match variables, especially when you work with big strings. But they aren't slow per se: Update: Thank you all for your comments and suggestions (here and in the CB)! See How do I get what is to the left of my match? for an updated benchmark and better explanations. Read more... (5 kB)	[reply] [d/l] [select]
Re^4: Style question: regex versus string builtin function by ikegami (Patriarch) on Oct 02, 2007 at 14:15 UTC
Re^5: Style question: regex versus string builtin function by lima1 (Curate) on Oct 03, 2007 at 09:17 UTC
Re^4: Style question: regex versus string builtin function by eyepopslikeamosquito (Archbishop) on Oct 02, 2007 at 13:38 UTC
Re: Style question: regex versus string builtin function by apl (Monsignor) on Oct 02, 2007 at 09:45 UTC
I'd definitely use `index`. It's the simplest tool for this problem.	[reply] [d/l]
Re: Style question: regex versus string builtin function by graff (Chancellor) on Oct 02, 2007 at 13:04 UTC
... what if $DELIMITER contains a regex metachar? What if the intention is that metacharacters in the variable should be used as such? What to use depends on what the intention is. For cases where "TMTOWTDI" really applies, the choice of approach is not likely to matter all that much (except to those who are compelled to optimize). For cases where literal-vs.-metachar handling means a difference between success vs. error (or ability vs. inability to do a task), one tool will be better than the other, and whichever one is right, you still have to provide some safeguards and checks to try to handle all contingencies as best you can.	[reply]
Re: Style question: regex versus string builtin function by talexb (Chancellor) on Oct 02, 2007 at 17:33 UTC
While I don't doubt that `index` is faster, I like your first solution better, simply because it's more Perl-ish. You're seeing if a particular delimiter appears on a line. The alternative would (for me) require I look up how `index` works -- it's a logical function to have in a language, I just don't think I've ever used it, so I'm not sure what the parameters are or what it returns. That's just my preference. Alex / talexb / Toronto "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds	[reply] [d/l] [select]

Back to Seekers of Perl Wisdom