|
|
| Welcome to the Monastery | |
| PerlMonks |
Regexes are slow (or, why I advocate String::Index)by japhy (Canon) |
| on May 14, 2004 at 21:44 UTC ( #353505=perlmeditation: print w/ replies, xml ) | Need Help?? |
|
This comes from Truncating Last Sentence. This is a discussion on why regexes might not be so hot in a particular arena, and why my module helps out.
I'd like it to be known that, on a string with many X's in it, doing is not very fast. It tries to match at each X it finds. You'd probably be better off doing Although in that case, it'd be sweet to use Regexp::Keep and say: or do it the two-regex way (since \K isn't core Perl, it's not as fast): Or you could reverse the string: Of course, for this task, a regex is probably the wrong tool. You can use string functions: Here's a benchmark of these methods (leaving out \K, because it's not worth showing): Notice how much better-suited substr() is for this task. You encounter a problem in speed, though, when X is not just a character, but a character class. First of all, our substr() approach fails immediately, because it uses index(), which looks for a substring, not one of a set of characters. Let's do the same benchmark, but change X to the character class of A-Z. This slow-down is caused by the character class. Because we're not looking for a SINGLE character, we can't jump backwards (the regex engine knows how to handle /.*A/ quickly -- it can "jump" backward to an "A", instead of examining each character -- but it can't handle /.*[AB]/ as quickly). So what can we do? We can use String::Index, which gives us functions that act like C's strpbrk(), but can do even more. For those of you not familiar with strpbrk() (whose name I can't decipher), it takes a string to look at and a string of characters to find in that source string. In Perl, it'd be like doing: That is, it returns the earliest location in the source string of one of the characters in the second string. It's uncool that there's no standard C function that does this from the back of the string, or for all characters except those given... That's what the String::Index module was written to do! Let's apply it to this problem and run another benchmark: We are restored. It's not as fast as the original substr() approach because it has to do more work, but it's faster than any other solution.
Back to
Meditations
|
|
||||||||||||||||||||||||||||||