|P is for Practical|
This comes from Truncating Last Sentence. This is a discussion on why regexes might not be so hot in a particular arena, and why my module helps out.
I'd like it to be known that, on a string with many X's in it, doing
is not very fast. It tries to match at each X it finds. You'd probably be better off doing
Although in that case, it'd be sweet to use Regexp::Keep and say:
or do it the two-regex way (since \K isn't core Perl, it's not as fast):
Or you could reverse the string:
Of course, for this task, a regex is probably the wrong tool. You can use string functions:
Here's a benchmark of these methods (leaving out \K, because it's not worth showing):
Notice how much better-suited substr() is for this task.
You encounter a problem in speed, though, when X is not just a character, but a character class. First of all, our substr() approach fails immediately, because it uses index(), which looks for a substring, not one of a set of characters. Let's do the same benchmark, but change X to the character class of A-Z.
This slow-down is caused by the character class. Because we're not looking for a SINGLE character, we can't jump backwards (the regex engine knows how to handle /.*A/ quickly -- it can "jump" backward to an "A", instead of examining each character -- but it can't handle /.*[AB]/ as quickly).
So what can we do? We can use String::Index, which gives us functions that act like C's strpbrk(), but can do even more. For those of you not familiar with strpbrk() (whose name I can't decipher), it takes a string to look at and a string of characters to find in that source string.
In Perl, it'd be like doing:
That is, it returns the earliest location in the source string of one of the characters in the second string. It's uncool that there's no standard C function that does this from the back of the string, or for all characters except those given...
That's what the String::Index module was written to do! Let's apply it to this problem and run another benchmark:
We are restored. It's not as fast as the original substr() approach because it has to do more work, but it's faster than any other solution.
In reply to Regexes are slow (or, why I advocate String::Index) by japhy