http://www.perlmonks.org?node_id=169484


in reply to Re: Text::Balanced woes..
in thread Text::Balanced woes..

Damian, I don't consider myself a world-class Perl programmer by any means, but I do believe I'm capable of reading the documentation.

That being said, I don't see ANYWHERE that it's "obvious" that it matches from the current string position. Read the documentation, trying to ignore the fact that you wrote it. Tell me where you see that it mentions that, or even reasonably implies that. And keep in mind that someone like myself or 914 may be reading it.

jeffa, who is someone I consider an experienced Perl programmer, mentioned in a /msg that he wouldn't have thought of deleting the leading words to see if it would pass. And I only got that idea from mucking around for 1/2 an hour, then running the extgen.pl test case with $DEBUG set.

I'm not critizing the documentation, because there is a lot of good stuff there, but I do think it could be better indicated that they match at the current position.

One final detail. The documentation mentions that it matches valid HTML/XML pairs. Well, HTML allows upper and lower case tags, and as such, <B>/</b> should match. Under XML, where tags are required to be lower case (if I remember correctly), then <B>/</B> should fail anyway, as it's not valid XML.

Update: chromatic pointed me to this text in the Description section

       The various "extract_..." subroutines may be used to
       extract a delimited string (possibly after skipping a
       specified prefix string).  The search for the string
       always begins at the current "pos" location of the
       string's variable (or at index zero, if no "pos" position
       is defined).

However, my interpetation of that is that not that matching will only occur at the start of the string, but rather, there is no implicit offset to the search for a matching tag. It also doesn't indicate that white space will be ignored, although that's not of terrible importance.

--Chris

e-mail jcwren

Replies are listed 'Best First'.
Re: (jcwren) Re: Re: Text::Balanced woes..
by TheDamian (Vicar) on May 27, 2002 at 07:43 UTC
    Well, I think your interpretation is...err...imaginative, but you're without doubt a very smart person so the docs mustn't be clear enough. I'll make sure the next version leaves no room for misinterpretation:
    The various "extract_..." subroutines may be used to extract a delimited substring, possibly after skipping a specified prefix string. By default, that prefix is optional whitespace, but you can change it to whatever you wish (see below). The substring to be extracted must appear at the current "pos" location of the string's variable (or at index zero, if no "pos" position is defined). In other words, the "extract_..." subroutines *don't* extract the first occurance of a substring anywhere in a string (like an unanchored regex would). Rather, they extract an occurance of the substring appearing immediately at the current matching position in the string (like a "\G"-anchored regex would).
      Not to kick a man when hes down ;-) but I think the problem is that your documentation tends to be very tutorial oriented (im thinking P::RD and Text::Balanced) which is excellent if you are working through them from begin to end. But the tutorial style can get in the way when all you want is a quick and dirty. For instance in Text::Balanced you have the the general conventions followed by a page or more for each sub. This is compounded by pod2html which doesnt index =item blocks. (I patched it to add an index of them at the end, which I find quite helpful.)

      Incidentally, this seems to be a failing of many of the better module designers, DBI has IMO similar problems.

      Oh and please dont take this as a negative criticism, its just that a terse, factual reference oriented doc/section can also be very helpful. Adding such a section (as you have already said you will) would be appreciated very much.

      And im well aware that if all you provided was such a reference text, that you'd be innundated with relatively foolish questions...

      Yves / DeMerphq
      ---
      Writing a good benchmark isnt as easy as it might look.

Re: Re: Re: Text::Balanced woes..
by runrig (Abbot) on May 27, 2002 at 05:32 UTC
    XML is required to be case-sensitive, but not necessarily lower-case; you are right about the HTML though... :-)
      Well, that's a fair cop, Guv, since I do suggest it matches HMTL tags at one point in the docs. It will be fixed in the next release (though whether I fix it in favour of XML or HTML remains to be seen! ;-).