Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^4: Cleaning up non 7-bit Ascii Chars for XML-processing

by ikegami (Pope)
on Nov 11, 2010 at 21:06 UTC ( #870932=note: print w/replies, xml ) Need Help??


in reply to Re^3: Cleaning up non 7-bit Ascii Chars for XML-processing
in thread Cleaning up non 7-bit Ascii Chars for XML-processing

...but it seems that I misread. I thought you were generating the XML.

The XML is always output as "UTF-8"

No it isn't.

"’" is "E2 80 99" in UTF-8.
"’" is "92" in cp1252.

You've indicated you have the latter.
You've indicated the document claims to be the former (implicitly).

You can either fix the encoding, or fix what the XML says the encoding is. The former is easier.

use strict; use warnings; use Encode qw( encode decode ); sub fix_broken_text { my ($self, $field) = @_; $field =~ s/&/&amp;/g; $field =~ s/</&lt;/g; $field =~ s/>/&gt;/g; $field =~ s/"/&quot;/g; $field =~ s/'/&#39;/g; return $field; } my $decoded_xml; { open(my $fh, '<', $xml_qfn) or die; binmode($fh); local $/; $xml = decode('cp1252', scalar(<$fh>)); } ...Try to fix problems with unescaped characters... my $encoded_xml = encode('UTF-8', $decoded_xml); ...Pass $encoded_xml to parser...

If only parts are cp1252,

use strict; use warnings; use Encode qw( encode decode ); sub fix_broken_text { my ($self, $field) = @_; $field = decode('cp1252', $field); $field =~ s/&/&amp;/g; $field =~ s/</&lt;/g; $field =~ s/>/&gt;/g; $field =~ s/"/&quot;/g; $field =~ s/'/&#39;/g; $field = encode('UTF-8', $field); return $field; } my $encoded_xml; { open(my $fh, '<', $xml_qfn) or die; binmode($fh); local $/; } ...Try to fix problems with unescaped characters... ...Pass $encoded_xml to parser...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://870932]
help
Chatterbox?
[Corion]: Yesterday I encountered an interesting data structure problem. I have a remote program that emits events, and my client listens for these events with one-shot callbacks, that is, I register the callback and if the event gets generated that callback ...
[Lady_Aleena]: robby_dobby, every day. Chaos is my life with few controls.
[Corion]: ... gets called once. The data structure for that is just a hash of arrays, mapping the event type to a queue of registered one-shots, and the first one-shot from the queue gets removed and called.
[Corion]: But now I want to register a one-shot for two events, of which only one will arrive, so my data structure doesn't work anymore...
[Lady_Aleena]: Corion, ouchy.
[Corion]: (maybe I should write this up as a SoPW) - currently, the "most efficient" data structure I come up with is a single array which I scan for the first fitting one-shot. Not efficient but I don't expect more than five outstanding one-shots anyway
[choroba]: can't you create a meta-key corresponding to the disjunction of the events?
[robby_dobby]: Corion: Heh. This whole thing smells of Strategy Pattern or MVC pattern.
[Corion]: And performance linear to the number of registered one-shots doesn't feel that bad. Maybe I should collect statistics on how many callbacks are outstanding ;)
[Corion]: choroba: Yes, but the longer I thought about efficient hashes mapping the event type back to their callbacks, and how to keep them in sync, the more I thought that all that optimization might just not be worth it, even if it's horribly inelegant

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (9)
As of 2017-05-29 07:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?