Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Weirdness (and bugfix for) XML::DOM ...

by deprecated (Priest)
on Feb 25, 2001 at 14:43 UTC ( #60747=perlquestion: print w/ replies, xml ) Need Help??
deprecated has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Monks... My quest to get the Everything Engine running continues.

t/astress...........Unrecognized escape \d passed through at blib/lib/ +XML/DOM.pm line 136.

This is what I encountered earlier today (`make test`) when building XML::DOM. I will node something in the next couple days regarding my adventures getting this code to work. However, at the moment, it still doesn't. The fix for this particular error (yes it is an error) is very simple:

# blib/lib/XML/DOM.pm line number 136 change: $ReCharRef = "(?:\&#(?:\d+|x[0-9a-fA-F]+);)"; # to ... $ReCharRef = qr!(?:\&#(?:\d+|x[0-9a-fA-F]+);)!;
This strikes me as a particularly careless error on the author's part. However, it doesnt quite end here. When we run make test now, we still have the errors from before:
t/attr..............Can't locate object method "equals" via package "X +ML::Parser::ContentModel" at CmpDOM.pm line 168.
After a bit of hacking around, we see this code in ./CmpDOM.pm:
if (ref ($p1)) { return 0 unless $p1->equals ($p2, $cmp); }
I did a fair amount of bitching this morning in the CB about this. I will spare everyone the rant, but basically, I feel this code is extremely stylistically impaired. There is not one single line of comment in it. After some searching and yelling, we find that $p1 is looking for a method 'equals' in the file /home/perl/lib/site_perl/5.6.0/ppc-linux-thread-multi/XML/Parser/Expat.pm which, according to perl (and correctly), doesnt exist. Getting the feeling that this module was hopelessly borked anyways, I decided to add that method to Parser::Expat. I added this code:
# roughly line 500 sub equals { $_[1] && $_[2] && return $_[0]; undef; }
This is about as kludgey as I am willing to get on code that will actually be used. I interpret this to mean "if the parameters we are passed are both true, return the object that called us (which will mean true to a test since we got here anyways) otherwise, return undef." Sure, its ugly, but the method doesnt exist and the author provides no documentation for what this should actually be doing nor why it is coded as such. The good news is it then clears the syntax. Furthermore, it clears up a "dubious" result from the test. We are still left with two errors:
t/attr..............ok 23/23FAILED test 3 + Failed 1/23 tests, 95.65% okay # and... t/print.............ok 3/3FAILED test 2 + Failed 1/3 tests, 66.67% okay
back in the attr test, we see this code:
my $str = <<END; <!DOCTYPE simpsons [ <!ELEMENT person (#PCDATA)> <!ATTLIST person name CDATA #REQUIRED hair (none | blue | yellow) "yellow" sex CDATA #REQUIRED> ]> <simpsons> <person name="homer" hair="none" sex="male"/> <person name="marge" hair="blue" sex="female"/> <person name="bart" sex="almost"/> <person name="lisa" sex="never"/> </simpsons> END my $parser = new XML::DOM::Parser; my $doc = $parser->parse ($str); assert_ok (not $@); my $out = $doc->toString; $out =~ tr/\012/\n/; # this line is mine... # warn "out - $out | str - $str"; assert_ok ($out eq $str);
The output of this was positively infuriating. As it turns out, the reason this assertion is failing is thus:
# this... hair (none|blue|yellow) 'yellow' # is not equal to this... hair (none | blue | yellow) "yellow" # which is from the HEREDOC above.
We fix this, and the test is passed. The last test, 'print' fails for a similar reason. In t/print.t:
# change this... <!ELEMENT doc (beavis|butthead)*> # to this... <!ELEMENT doc (beavis | butthead)*> # and all is well.
I am really not one to be negative. And I am also totally unfamiliar with XML. But I am going to go waaaaay out on a limb here and say that these errors are criminally negligent. I read recently here in the CB that XML requires strict adherence to RFC's (or whatever) and mandates failure. Personally, if that is the reason this module failed to test correctly, I dont want anything to do with XML. No thank you. However, if this is in fact what I think it is, egregious errors on the part of the author, I am happy to have been able to provide a fix for people having trouble with it.

Before you ask if this is perhaps something I have locally (jptxs was able to successfully install this module), I have two things to say. First, this copy of perl was installed from scratch, by hand, today. Bundle::XML was the first package to be installed because I was so hell-bent on getting XML to work. Second, it is certainly possible that XML::Expat uses some weirdness in its c-code (provided it has any compiled c code at all). This is a PowerPC, so there might be some compilation differences. If that is the case, I apologize for my fiery rhetoric. Nobody, however, should ever have to spend hours fixing somebody elses undocumented, poorly styled, BROKEN code.

off my soapbox, and on to the Engine...
brother dep.

--
transcending "coolness" is what makes us cool.

Comment on Weirdness (and bugfix for) XML::DOM ...
Select or Download Code
Re: Weirdness (and bugfix for) XML::DOM ...
by deprecated (Priest) on Feb 25, 2001 at 14:59 UTC
    ack, you actually want to change:
    <!ELEMENT doc (beavis | butthead)*> # to this... <!ELEMENT doc (beavis|butthead)*>
    sorry. its been a long day. :)
Re: Weirdness (and bugfix for) XML::DOM ...
by mirod (Canon) on Feb 25, 2001 at 17:18 UTC

    Bundle::XML was the first package to be installed

    That might be the problem. I think the latest version of XML::DOM can be found in libxml-enno. I have no idea which one is included in the XML bundle. The only stable combination I have personally tested is XML::DOM 1.25 and XML::Parser 2.27 on Perl 5.005.

    Note that the original author of XML::DOM does not seem to be supporting it anymore and that a new maintainer seems to be working on it, see this message. One annoying thing is that the old version does not work with XML::Parser 2.28 and above but the new version does not work with 2.27 and under (2.27 is the one that comes with the Activestate port).

    If you are interested in Perl and XML you should definitely subscribe to the perl-xml mailing list.

Re: Weirdness (and bugfix for) XML::DOM ...
by sierrathedog04 (Hermit) on Feb 26, 2001 at 15:45 UTC
    The specifications for XML 1.0 are set forth by the World Wide Web Consortium here. These specs state:
    Enumerated Attribute Types ... [59] Enumeration ::= '(' S? Nmtoken (S? '|' S? Nmtoken)* S? ' +)' [VC: Enumeration] ... Examples of attribute-list declarations: ... <!ATTLIST list type (bullets|ordered|glossary) "ordered"> ...
    My reading of the XML specifications is that pipes in attribute-list declarations should not have spaces surrounding them.

    Similarly, in XML 1.0 one ought not to surround pipes with spaces in element type declarations either:

    Examples of element type declarations: ... <!ELEMENT p (#PCDATA|emph)* >

    It is unfortunate that attr test contains what may be a malformed XML Document Type Declaration. However, the XML specifications contain no requirements for user-friendly error handling. All an XML parser has to do when encountering bad XML is stop. It need do no more.

    Update: Our resident XML guru Mirod has pointed out to me that the S? in the XML rule I quoted means that the pipe can, indeed be surrounded by whitespace. He opines that there is an error, but that it is likely in XML::Parser on which XML::DOM is built.

    Mirod points out that according to Rule 3

    (see rule 3: S ::= (#x20 | #x9 | #xD | #xA)+ >

    Hence, S? means there can be an optional space around the '|'

    Moral. I should learn to read XML specs better. Thanks to mirod for correcting this error.

      Nope!

      In Rule 59 the S? is defined by:

      White Space [3] S ::= (#x20 | #x9 | #xD | #xA)+

      So (  toto  |  tata  ) is just as valid as (  toto  | tata  ). The problem is that expat and XML::Parser report _everything_ in the document, including non-significant spaces. Now Expat is required by the spec to do so. XML::Parser could probably choose to normalize those whitespaces but does not, and XML::DOM definitely should normalize them but does not. The SAX-2 extension for declaration for example normalizes declarations by removing all spaces around tokens.

      The bottom-line is that this is a feature of XML::DOM that could easily pass for a bug (and should be fixed as soon as somebody provides a patch, I certainly don't want to get involved in XML::DOM but line 1970 of the DOM.pm in libxml-enno-1.04pre3 seems like the place to insert the normalization for attribute declarations, then on to element content models before line 2343).

      As a side note, the best way to read the XML specification is probably to go to The Annotated XML Specification for the spec and Tim Bray's comments on xml.com

      Update: Crap! My mistake (as usual :--(, XML::DOM (or maybe XML::Parser) does normalize attribute declarations. Actually that's why the test fails. It is fixed in libxml-enno-1.04pre though. The only remaining problem is that 1.04 does not pass the test on 5.6.0

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://60747]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (13)
As of 2014-08-21 19:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (143 votes), past polls