Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

HTML::TreeBuilder, HTML::Element, as_XML()

by AlexTape (Monk)
on May 23, 2013 at 10:35 UTC ( #1034929=perlquestion: print w/replies, xml ) Need Help??
AlexTape has asked for the wisdom of the Perl Monks concerning the following question:

Dear omniscient monks,

i got some html/tei like data and want to parse it to xml format. it is working pretty well for some files.. but not for all.. here is my code:
# pragma use strict; use warnings; # modules use XML::Simple; use XML::Tidy; use Data::Dumper; use Data::Diver qw( Dive DiveRef DiveError ); use HTML::TreeBuilder; use XML::Tidy::Tiny; # little helper use constant false => 0; use constant true => 1; ... # get instance of treebuilder my $root = HTML::TreeBuilder->new(); # configure treebuilder $root->ignore_unknown( false ); # dump data to the treebuilder $root->parse( $fileData ); # get name for target file my $target = $file; $target =~ s/$fileExtension$/xml/; # open output filehandle open( $FH, '>', $target ); # configure output binmode $FH, ":utf8"; # ERROR HERE 208: my $data = $root->guts()->as_XML(); print $FH xml_tidy( $data ); close $FH; ...
caption has an invalid attribute name 'n' at line 208
i substite all 'n' in the file.. but got still the same error. for that the 'n' is not the anchor of this error.. i dont know what going on here?!
is okey.. it is all about the ->as_XML() :-((

kindly, perlig

$perlig =~ s/pec/cep/g if 'errors expected';

Replies are listed 'Best First'.
Re: HTML::TreeBuilder, HTML::Element, as_XML()
by Jenda (Abbot) on May 23, 2013 at 15:16 UTC

    IMnsHO, there is a bug in the _valid_name subroutine deep in HTML::Element. There should be

    return (0) unless ( $attr =~ /^$START_CHAR$NAME_CHAR*$/ );
    return (0) unless ( $attr =~ /^$START_CHAR$NAME_CHAR+$/ );

    The XML specs say that

    Name ::= NameStartChar (NameChar)*

    Enoch was right!
    Enjoy the last years of Rome.

      IMnsHO, there is a bug in the _valid_name subroutine deep in HTML::Element. There should be

      I wouldn't go that far , the OP provides no data

        The OP doesn't need to provide data, the code doesn't match the specs linked five lines above the code in question.

        Enoch was right!
        Enjoy the last years of Rome.

Re: HTML::TreeBuilder, HTML::Element, as_XML()
by ambrus (Abbot) on May 24, 2013 at 13:07 UTC

    Look in the implementation of XML::Twig for the workarounds it uses when HTML::Tree's as_XML method dies.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1034929]
Front-paged by Arunbear
[marto]: good morning all
[Corion]: Hi marto!
[choroba]: Good morning!
[Corion]: I hope you had a good weekend!
[marto]: jetlag has really done a number on the kids, it's been a tough week
choroba played with the band on Saturday, so Sunday was very sleepy

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (10)
As of 2018-06-25 08:32 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (126 votes). Check out past polls.