Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^4: Cleaning up non 7-bit Ascii Chars for XML-processing

by ikegami (Pope)
on Nov 11, 2010 at 21:06 UTC ( #870932=note: print w/ replies, xml ) Need Help??


in reply to Re^3: Cleaning up non 7-bit Ascii Chars for XML-processing
in thread Cleaning up non 7-bit Ascii Chars for XML-processing

...but it seems that I misread. I thought you were generating the XML.

The XML is always output as "UTF-8"

No it isn't.

"’" is "E2 80 99" in UTF-8.
"’" is "92" in cp1252.

You've indicated you have the latter.
You've indicated the document claims to be the former (implicitly).

You can either fix the encoding, or fix what the XML says the encoding is. The former is easier.

use strict; use warnings; use Encode qw( encode decode ); sub fix_broken_text { my ($self, $field) = @_; $field =~ s/&/&amp;/g; $field =~ s/</&lt;/g; $field =~ s/>/&gt;/g; $field =~ s/"/&quot;/g; $field =~ s/'/&#39;/g; return $field; } my $decoded_xml; { open(my $fh, '<', $xml_qfn) or die; binmode($fh); local $/; $xml = decode('cp1252', scalar(<$fh>)); } ...Try to fix problems with unescaped characters... my $encoded_xml = encode('UTF-8', $decoded_xml); ...Pass $encoded_xml to parser...

If only parts are cp1252,

use strict; use warnings; use Encode qw( encode decode ); sub fix_broken_text { my ($self, $field) = @_; $field = decode('cp1252', $field); $field =~ s/&/&amp;/g; $field =~ s/</&lt;/g; $field =~ s/>/&gt;/g; $field =~ s/"/&quot;/g; $field =~ s/'/&#39;/g; $field = encode('UTF-8', $field); return $field; } my $encoded_xml; { open(my $fh, '<', $xml_qfn) or die; binmode($fh); local $/; } ...Try to fix problems with unescaped characters... ...Pass $encoded_xml to parser...


Comment on Re^4: Cleaning up non 7-bit Ascii Chars for XML-processing
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://870932]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (9)
As of 2014-09-22 16:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (198 votes), past polls