Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

XML equivalent of HTML::Entities?

( #89757=categorized question: print w/ replies, xml ) Need Help??
Contributed by antihero on Jun 19, 2001 at 21:58 UTC
Q&A  > data formatting


Description:

Is there an XML-related module which can convert characters into their entities (e.g. & turns into &)? I've been using HTML::Entities, but I feel that this isn't exactly the most elegant solution.

Answer: XML equivalent of HTML::Entities?
contributed by mirod

It all depends on the result you want.

If you want to get HTML entities (é as é, etc.) then it makes sense to use HTML::Entities.

If you want to escape only &, <, >, ", and ', then you can use XML::Parser::Expat::xml_escape, but frankly it doesn't make much sense to do that.

If you want to escape those characters and also to turn all characters outside of the 0-127 range into XML character entities (i.e. &#nnn;), then you can use the following subroutine, lifted from XML::DOM and not tested with all the possible system/Perl/XML-parser combinations:

sub safe_encode { my $str = shift; $str =~ s{ ([\xC0-\xDF].|[\xE0-\xEF]..|[\xF0-\xFF]...) }{ XmlUtf8Decode($1) }xegs; $str } sub XmlUtf8Decode { my( $str, $hex ) = @_; my $len = length $str; my $n; if ( $len == 4 ) { my @n = unpack "C4", $str; $n = (($n[0] & 0x0f) << 18) + (($n[1] & 0x3f) << 12) + (($n[2] + & 0x3f) << 6) + ($n[3] & 0x3f); } elsif ( $len == 3 ) { my @n = unpack "C3", $str; $n = (($n[0] & 0x1f) << 12) + (($n[1] & 0x3f) << 6) + ($n[2] & + 0x3f); } elsif ( $len == 2 ) { my @n = unpack "C2", $str; $n = (($n[0] & 0x3f) << 6) + ($n[1] & 0x3f); } elsif ( $len == 1 ) { $n = ord $str; } else { die "bad value '$str' for XmlUtf8Decode"; } $hex ? sprintf( "&#x%x;", $n ) : "&#$n;" }

Please (register and) log in if you wish to add an answer



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others having an uproarious good time at the Monastery: (11)
    As of 2015-07-07 08:47 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









      Results (87 votes), past polls