Beefy Boxes and Bandwidth Generously Provided by pair Networks DiBona
P is for Practical
 
PerlMonks

XML::Twig problem

by BenHopkins (Acolyte)
 | Log in | Create a new user | The Monastery Gates | Super Search | 
 | Seekers of Perl Wisdom | Meditations | PerlMonks Discussion | 
 | Obfuscation | Reviews | Cool Uses For Perl | Perl News | Q&A | Tutorials | 
 | Poetry | Recent Threads | Newest Nodes | Donate | What's New | 

on May 31, 2007 at 00:11 UTC ( #618360=perlquestion: print w/ replies, xml ) Need Help??
BenHopkins has asked for the wisdom of the Perl Monks concerning the following question:

I have a little program based on the twig's doc "Building An XML Filter." It makes roots for the things I need to process, and uses twig_print_outside_roots. However, the output is NOT valid xml (the input is). Here's how the input starts:
<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE nitf SYSTEM "../CCI-DTD/nitf-3-1.dtd" [ <!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for X +HTML//EN" "../CCI-DTD/xhtml-lat1.ent"> %HTMLlat1; <!ENTITY % HTMLsymbol PUBLIC "-//W3C//ENTITIES Symbols for + XHTML//EN" "../CCI-DTD/xhtml-symbol.ent"> %HTMLsymbol; <!ENTITY % HTMLspecial PUBLIC "-//W3C//ENTITIES Special fo +r XHTML//EN" "../CCI-DTD/xhtml-special.ent"> %HTMLspecial; ]>
Here's the output:
<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE nitf SYSTEM "../CCI-DTD/nitf-3-1.dtd"> <!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for X +HTML//EN" "../CCI-DTD/xhtml-lat1.ent"> %HTMLlat1; <!ENTITY % HTMLsymbol PUBLIC "-//W3C//ENTITIES Symbols for + XHTML//EN" "../CCI-DTD/xhtml-symbol.ent"> %HTMLsymbol; <!ENTITY % HTMLspecial PUBLIC "-//W3C//ENTITIES Special fo +r XHTML//EN" "../CCI-DTD/xhtml-special.ent"> %HTMLspecial; >
The square brackets surrounding the three !ENTITY declarations are missing. Here's the program's new declaration:
my $t = XML::Twig->new( twig_roots => { "$nitf_root/body/body.head/hedline/hl1" => \&f +ix_hl1, "$nitf_root/body/body.head/hedline/hl2" => \&f +ix_hl2, }, twig_print_outside_roots => 1, keep_encoding => 1, );
At first I didn't have keep_encoding, and then besides the missing square brackets, the first !ENTITY was also missing. keep_encoding restored the first !ENTITY, but not the brackets.

Any ideas?

(I do a flush after the parse, so it's not that, althought I don't see how it could affect anything, I saw something about it.)

Comment on XML::Twig problem
Select or Download Code
Re: XML::Twig problem
by mirod (Canon) on May 31, 2007 at 06:34 UTC

    The development version of XML::Twig (at http://xmltwig.com/xmltwig nearly fixes this: the brackets are there, but the first entity declaration is not output properly (it comes out as <!ENTITY HTMLlat1 SYSTEM "../CCI-DTD/xhtml-lat1.ent" PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" >). The other 2 are output properly, which is quite puzzling. This occurs whether keep_encoding is used or not. I'll fix it and report back.

    I have refactored the code that outputs the internal DTD in the new version, so tests are welcome. Apparently my test suite did not cover this case, so I will add a test too.

    Thanks for the info.

Re: XML::Twig problem
by mirod (Canon) on May 31, 2007 at 11:23 UTC

    OK, it looks like I did not process properly parameter entities.

    It's fixed in the development version, let me know if it works for you.

      It works. Thanks. But (you knew there would be a but, didn't you?), in the output, there was some trailing text after the final tag, which went away when I took out the flush() call.

      Also, when I tried to verify the soundness of the outputted XML with xml_pp (mine, not yours), it got this error:Undefined subroutine &Text::Wrap::wrap called at /usr/local/perl/5.8.2/lib/site_perl/5.8.2/XML/Twig.pm line 7476. When I replaced indented_c with nice, it worked.

        Indeed the flush messes things up when you're using twig_print_outside_roots. I should check for that, I'll see what I can do. In fact, with recent versions of the module, the flush after the end of the parsing is no longer needed. The module assumes that if you started flushing, then you want to keep on doing it (or you would most likely get non well-formed XML), so at the end of the parse, if flush has been used on the twig, it performs a last flush, using the filehandle that was used for the first flush. It DWIMs better that it reads ;--(.

        As for the xml_pp problem, I don't know, maybe you redefined the constants and you ended up using the one for 'wrapped' or 'cvs' instead of the one for 'indented_c' ? What's the value you are using for the style? BTW I usually use xmlwf, xmllint or perl -MXML::Parser -e'XML::Parser->new( ErrorContext => 1)->parsefile( shift())' file.xml to check the well-formedness of the XML.

Login:
Password
remember me
What's my password?
Create A New User

Node Status?
node history
Node Type: perlquestion [id://618360]
Approved by Moriarty
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (22)
ikegami
GrandFather
jdporter
Your Mother
holli
salva
rhesa
kennethk
MidLifeXis
thezip
pileofrogs
socketdave
bichonfrise74
gmargo
ssandv
rubasov
BenHopkins
MikeDexter
cmg
DavidFerrington
brp4h
vkk0125
As of 2010-02-09 21:21 GMT
Sections?
The Monastery Gates
Seekers of Perl Wisdom
Meditations
PerlMonks Discussion
Categorized Q&A
Tutorials
Obfuscated Code
Perl Poetry
Cool Uses for Perl
Perl News
Information?
PerlMonks FAQ
Guide to the Monastery
What's New at PerlMonks
Voting/Experience System
Tutorials
Reviews
Library
Perl FAQs
Other Info Sources
Find Nodes?
Nodes You Wrote
Super Search
List Nodes By Users
Newest Nodes
Recently Active Threads
Selected Best Nodes
Best Nodes
Worst Nodes
Saints in our Book
Leftovers?
The St. Larry Wall Shrine
Offering Plate
Awards
Craft
Snippets Section
Code Catacombs
Quests
Editor Requests
Buy PerlMonks Gear
PerlMonks Merchandise
Planet Perl
Perlsphere
Use Perl
Perl.com
Perl 5 Wiki
Perl Jobs
Perl Mongers
Perl Directory
Perl documentation
CPAN
Random Node
Voting Booth?

What level of existential comfort do you require?

Palace
Executive suite at the best hotel
Regular hotel in a decent part of town
Motel
Boarding house
Sleeping Bag on Couch in Basement
Any port in a storm
Camping under the freeway overpass
Jail
Other

Results (279 votes), past polls