Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: XMLin question (xmlfixup.pl)

by Anonymous Monk
on Feb 15, 2013 at 19:44 UTC ( [id://1018953]=note: print w/replies, xml ) Need Help??


in reply to XMLin question

#!/usr/bin/perl -- use strict; use warnings; use HTML::Encoding 'encoding_from_http_message'; use WWW::Mechanize; use Encode; use HTML::Tree; my $file = shift or die " Usage: xmlfixup.pl file:in.xml > out.xml xmlfixup.pl http://example.com/foo.xml > out.utf8.xml "; my $resp = WWW::Mechanize->new( autocheck => 1 )->get( $file ); my $enco = encoding_from_http_message( $resp ); my $utf8; if( $enco ) { $utf8 = decode( $enco => $resp->content ); } else { $utf8 = $resp->content; } my $t = HTML::TreeBuilder->new( qw( ignore_unknown 0 no_space_compacting 1 ignore_ignorable_whitespace 0 implicit_tags 0 no_expand_entities 1 store_comments 1 store_pis 1 ) ); #~ $t->xml_mode( 1 ); $t->parse_content( $utf8 ); binmode STDOUT, ':utf8'; print $_->as_XML for $t->content_list; __END__

Replies are listed 'Best First'.
Re^2: XMLin question (xmlfixup.pl)
by tmharish (Friar) on Feb 21, 2013 at 12:43 UTC
    Fails when data contains <![CDATA[ ... ]]>
Re^2: XMLin question (xmlfixup.pl)
by tmharish (Friar) on Feb 21, 2013 at 12:45 UTC

    I would like to use this. with a fix I have written for CDATA and a couple of other things, on XML::Smart.

    Please /msg me or reply to this so I can assign credit.

      by Anonymous Monk http://perlmonks.org/?node_id=1018953

        Sadly this breaks for too many cases - am re-writing XML::Smart::HTMLParser ( located also on GitHub )

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1018953]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (5)
As of 2025-04-25 19:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.