Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

XML::Parser and numeric entities

by gam3 (Curate)
on Jan 14, 2010 at 01:32 UTC ( [id://817319]=perlquestion: print w/replies, xml ) Need Help??

gam3 has asked for the wisdom of the Perl Monks concerning the following question:

Is there a way to keep XML::Parser from converting numeric entities into UTF8?

Or is there some other parser that will let me do this?

use strict; use XML::Parser; use vars qw($parser); sub handle_start { my $self = shift; my $x = shift; print "<" . $x . '>' ; } sub handle_end { my $self = shift; my $x = shift; print "</" . $x . '>' ; } sub handle_char { my $self = shift; my $x = shift; print $x; } $parser = XML::Parser->new( Handlers => { Start => \&handle_start, End => \&handle_end, Char => \&handle_char } ); $parser->parse(<<XML); <start>&#8211;</start> XML
I would like this program to output
<start>&#8211;</start>
not
<start>–</start>
-- gam3
A picture is worth a thousand words, but takes 200K.

Replies are listed 'Best First'.
Re: XML::Parser and numeric entities
by ikegami (Patriarch) on Jan 14, 2010 at 02:53 UTC

    It simply decodes the entities. It doesn't then encode the character using UTF-8.

    If you want all non-ASCII characters encoded, you can use:

    use HTML::Entities qw( encode_entities_numeric ); sub handle_char { my $self = shift; my $x = shift; print encode_entities_numeric($x); }

    There's also a handler you can use instead of Char that receives the entities still encoded, but then you're not guaranteed to have all non-ASCII characters encoded.

      Thank you for that information, I can use it to patch up my problem

      However what I really want is for XML::Parser to NOT decode the numeric entities at all.

      -- gam3
      A picture is worth a thousand words, but takes 200K.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://817319]
Approved by herveus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2025-07-11 15:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.