Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
"be consistent"
 
PerlMonks  

HTML::TreeBuilder, HTML::Element, as_XML()

by AlexTape (Monk)
on May 23, 2013 at 10:35 UTC ( #1034929=perlquestion: print w/ replies, xml ) Need Help??
AlexTape has asked for the wisdom of the Perl Monks concerning the following question:

Dear omniscient monks,

i got some html/tei like data and want to parse it to xml format. it is working pretty well for some files.. but not for all.. here is my code:
# pragma use strict; use warnings; # modules use XML::Simple; use XML::Tidy; use Data::Dumper; use Data::Diver qw( Dive DiveRef DiveError ); use HTML::TreeBuilder; use XML::Tidy::Tiny; # little helper use constant false => 0; use constant true => 1; ... # get instance of treebuilder my $root = HTML::TreeBuilder->new(); # configure treebuilder $root->ignore_unknown( false ); # dump data to the treebuilder $root->parse( $fileData ); # get name for target file my $target = $file; $target =~ s/$fileExtension$/xml/; # open output filehandle open( $FH, '>', $target ); # configure output binmode $FH, ":utf8"; # ERROR HERE 208: my $data = $root->guts()->as_XML(); print $FH xml_tidy( $data ); close $FH; ...
caption has an invalid attribute name 'n' at script.pl line 208
i substite all 'n' in the file.. but got still the same error. for that the 'n' is not the anchor of this error.. i dont know what going on here?!
$root->guts()
is okey.. it is all about the ->as_XML() :-((

kindly, perlig

$perlig =~ s/pec/cep/g if 'errors expected';

Comment on HTML::TreeBuilder, HTML::Element, as_XML()
Select or Download Code
Re: HTML::TreeBuilder, HTML::Element, as_XML()
by Jenda (Abbot) on May 23, 2013 at 15:16 UTC

    IMnsHO, there is a bug in the _valid_name subroutine deep in HTML::Element. There should be

    return (0) unless ( $attr =~ /^$START_CHAR$NAME_CHAR*$/ );
    not
    return (0) unless ( $attr =~ /^$START_CHAR$NAME_CHAR+$/ );

    The XML specs say that

    Name ::= NameStartChar (NameChar)*

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

      IMnsHO, there is a bug in the _valid_name subroutine deep in HTML::Element. There should be

      I wouldn't go that far , the OP provides no data

        The OP doesn't need to provide data, the code doesn't match the specs linked five lines above the code in question.

        Jenda
        Enoch was right!
        Enjoy the last years of Rome.

Re: HTML::TreeBuilder, HTML::Element, as_XML()
by ambrus (Abbot) on May 24, 2013 at 13:07 UTC

    Look in the implementation of XML::Twig for the workarounds it uses when HTML::Tree's as_XML method dies.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1034929]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (10)
As of 2014-04-18 09:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (465 votes), past polls