Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

XML::Parser::PerlSax ... not passing a simple test

by vaevictus (Pilgrim)
on Nov 07, 2001 at 19:55 UTC ( #123831=perlquestion: print w/replies, xml ) Need Help??

vaevictus has asked for the wisdom of the Perl Monks concerning the following question:

I've got perl v5.6.1 built for i686-linux, it's on slackware. I'm trying to install XML::DOM, which depends on XML::Parser::PerlSAX, which is in lib-xml.

I'm getting an error with test 11 of stream.h (in PerlSAX).

I'm thinking this may be an encoding thing i just don't understand, but here's what i think it is choking on, and my question would be why it's written this way, or why does my setup not do it properly.

The test:

print (($string eq $expected) ? "ok 11\n" : "not ok 11\n");

The $string

$string = $parser->parse(Source => { Encoding => 'ISO-8859-1', String => <<"EOF;" } ); <!DOCTYPE foo [ <!NOTATION bar PUBLIC "qrs"> <!ENTITY zinger PUBLIC "xyz" "abc" NDATA bar> <!ENTITY fran "fran-def"> <!ENTITY zoe "zoe.ent"> ]> <foo> First line in foo <boom>Fran is &fran; and Zoe is &zoe;</boom> <bar id="jack" stomp="jill"> <?line-noise *&*&^&<< ?> 1st line in bar <blah> 2nd line in bar </blah> 3rd line in bar <!-- Isn't this a doozy --> </bar> <zap ref="zing" /> This, '\240', would be a bad character in UTF-8. </foo> EOF;

produces:

<?xml version="1.0" encoding="UTF-8"?> <foo> First line in foo <boom>Fran is fran-def and Zoe is zoe.ent</boom> <bar id="jack" stomp="jill"> <?line-noise *&*&^&<< ?> 1st line in bar <blah> 2nd line in bar </blah> 3rd line in bar <!-- Isn't this a doozy --> </bar> <zap fubar="1" ref="zing"></zap> This, ' ', would be a bad character in UTF-8. </foo>

The $expected

$expected = <<"EOF;"; <?xml version="1.0" encoding="UTF-8"?> <foo> First line in foo <boom>Fran is fran-def and Zoe is zoe.ent</boom> <bar id="jack" stomp="jill"> <?line-noise *&*&^&<< ?> 1st line in bar <blah> 2nd line in bar </blah> 3rd line in bar <!-- Isn't this a doozy --> </bar> <zap fubar="1" ref="zing"></zap> This, '\302\240', would be a bad character in UTF-8. </foo> EOF;

produces:

<?xml version="1.0" encoding="UTF-8"?> <foo> First line in foo <boom>Fran is fran-def and Zoe is zoe.ent</boom> <bar id="jack" stomp="jill"> <?line-noise *&*&^&<< ?> 1st line in bar <blah> 2nd line in bar </blah> 3rd line in bar <!-- Isn't this a doozy --> </bar> <zap fubar="1" ref="zing"></zap> This, ' ', would be a bad character in UTF-8. </foo>

the extra character in the 2nd This would be a bad character line is coded in... I'm confused to why it's there.

Replies are listed 'Best First'.
Re: XML::Parser::PerlSax ... not passing a simple test
by mirod (Canon) on Nov 07, 2001 at 21:09 UTC

    Which versions of XML::DOM, XML::Parser and XML::Parser::PerlSAX are you trying to install? There are annoying incompatibilities between various versions, so you have to be quite careful here (and I know those incompatibilities are not really described anywhere, I have to update the review of XML::DOM on this site with the latest release of XML::DOM. My first guess would be that you use a version of XML::Parser that has problems with UTF-8 encoded strings, maybe 2.28 or 2.30, but I might be wrong.

    In fact I am not even sure you need XML::Parser::PerlSAX to use XML::DOM, XML::Parser should be enough, so you might be able to ignore .the error and still get XML::DOM to work properly.

    <plug type="shameless">If you want to use XML::DOM you might want to have a look at XML::DOM::Twig, a dirty hack that still improves XML::DOM by adding a slightly higher-level interface to the DOM.

    A good article on using XML::DOM can be found on IBM DeveloperWorkss. Tony Parungar has been using the DOM for quite a while and gives insightful feedback on how to use it.

      cpan install, for the lot of them. and i'm only installing XML::DOM to try and get Everything(tm) to work.
Re: XML::Parser::PerlSax ... not passing a simple test
by mitd (Curate) on Nov 09, 2001 at 06:26 UTC
    I too went through similar heartache during a recent personal workstation upgrade. I am trying to remember the details I seem to recall the problem began with Enno repacking a bunch of XML::xxx including XML::DOM. So to supplement mirod's info here are the versions I have installed that work for me.

    Coincidently, I am going inside XML::DOM later tonight I will look a little deeper and update this node with anymore info that might help.

    BTW, XML::Twig is a great package if you are able to consider alternatives. It is amongst other things conscientiously maintained by... well you know the shameless guy a few nodes up.

    mitd-Made in the Dark
    'My favourite colour appears to be grey.'

XML::Parser::PerlSax ... still not passing a simple test
by vaevictus (Pilgrim) on Jul 19, 2004 at 17:28 UTC
    How ironic. I decided to revive my Everything server... and proceded along the same lines to cause the same issues, the same way as 3 years ago.

    Differences: 3 years of time, nearly. I'm on FreeBSD now, and not Slackware. Deleting the \302 indeed fixes the test and the install. I have no idea if this is a safe test, fix, or if it's a good idea to use.

    I'd love to have some comments or even testing, or have someone tell me this is indeed a bug.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://123831]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2022-01-20 12:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:












    Results (56 votes). Check out past polls.

    Notices?