Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Vertical Tab (\x0b) in XML::LibXML

by choroba (Bishop)
on Jul 23, 2014 at 21:20 UTC ( #1094850=perlquestion: print w/replies, xml ) Need Help??
choroba has asked for the wisdom of the Perl Monks concerning the following question:

XML::LibXML handles the vertical tab (\x0b) differently if encoding of the document is specified explicitly. Without encoding, the character is not inserted when serializing the document, but if you specify the encoding (I used both utf-8 and iso-8859-1), the character is present in the serialization, which means it can't be parsed back (\x0b is not permitted in XML, at least in version 1.0, see Cafe con Leche).
#!/usr/bin/perl use warnings; use strict; use XML::LibXML; my @docs = ( 'XML::LibXML::Document'->createDocument('1.0'), 'XML::LibXML::Document'->createDocument('1.0', 'utf-8'), ); for my $doc (@docs) { $doc->setDocumentElement(my $root = $doc->createElement('root')); $root->appendText("\x0b"); } for my $doc (@docs) { print ".\n"; 'XML::LibXML'->load_xml(string => $doc->toString); }

Do I miss something? Should I file a bug report?

لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Replies are listed 'Best First'.
Re: Vertical Tab (\x0b) in XML::LibXML 'XML::LibXML'->new( qw/ recover 2 / )->load_xml
by Anonymous Monk on Jul 23, 2014 at 22:00 UTC

    Do I miss something? Should I file a bug report?

    Use the option to not die on errors

    'XML::LibXML'->new( qw/ recover 2 / )->load_xml(string => $doc->toStri +ng)
      That's not my problem. I need to generate well-formed XML that can be loaded by other tools, too. Some of them don't use XML::LibXML nor Perl.
      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

        So if you make two passes through libxml you get valid xml? But then you lose the vertical tab ...

        If I try using toFile I get  error : xmlEscapeEntities : char out of range

        Error seems to be coming from libxml2 itself ... this says sanitize your inputs first so ascii control chars aren't in there :/

        I'd report it to XML::LibXML maintainer , for the clues he might provide :)

Re: Vertical Tab (\x0b) in XML::LibXML
by ikegami (Pope) on Jul 24, 2014 at 17:23 UTC

    Vertical tabs (U+000B) are not allowed in XML documents.

    Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

    I don't know why I get a different error from the two inputs in your example, but I do get an error from both.

    Update: Just noticed you knew U+000B isn't allowed in XML, in which case I don't get the question.

      When the encoding is specified, toString generates invalid XML without any error. In real life, this gets sent to a different XML processor that chokes on it. My question is why the behaviour is different, and whether there is something that should prevent me from submitting a bug report.
      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        I get an error for that too.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1094850]
Approved by hominid
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2018-03-19 01:53 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (231 votes). Check out past polls.