|Perl: the Markov chain saw|
(jcwren) Re: Re: Text::Balanced woes..by jcwren (Prior)
|on May 27, 2002 at 04:41 UTC||Need Help??|
Damian, I don't consider myself a world-class Perl programmer by any means, but I do believe I'm capable of reading the documentation.
That being said, I don't see ANYWHERE that it's "obvious" that it matches from the current string position. Read the documentation, trying to ignore the fact that you wrote it. Tell me where you see that it mentions that, or even reasonably implies that. And keep in mind that someone like myself or 914 may be reading it.
jeffa, who is someone I consider an experienced Perl programmer, mentioned in a /msg that he wouldn't have thought of deleting the leading words to see if it would pass. And I only got that idea from mucking around for 1/2 an hour, then running the extgen.pl test case with $DEBUG set.
I'm not critizing the documentation, because there is a lot of good stuff there, but I do think it could be better indicated that they match at the current position.
One final detail. The documentation mentions that it matches valid HTML/XML pairs. Well, HTML allows upper and lower case tags, and as such, <B>/</b> should match. Under XML, where tags are required to be lower case (if I remember correctly), then <B>/</B> should fail anyway, as it's not valid XML.
Update: chromatic pointed me to this text in the Description section
The various "extract_..." subroutines may be used to extract a delimited string (possibly after skipping a specified prefix string). The search for the string always begins at the current "pos" location of the string's variable (or at index zero, if no "pos" position is defined).
However, my interpetation of that is that not that matching will only occur at the start of the string, but rather, there is no implicit offset to the search for a matching tag. It also doesn't indicate that white space will be ignored, although that's not of terrible importance.