slugger415 has asked for the wisdom of the Perl Monks concerning the following question:
Hello Monks,
I am processing some XHTML pages (using XML::Twig) that contain numerous character entities, such as:
é
When I parse these files using XML::Twig, they turn into all sorts of wonky characters that look nothing like they did in the original HTML.
réservebecomes
réserve
I've tried setting keep_encoding in Twig, and the entities get preserved, but I get another set of wonky characters when that output goes to HTML.
I'm not sure how to proceed here -- any thoughts? I'm sure there's some kind of encoding/decoding process I need to do here, but I'm unfamiliar with the process.
Many thanks.
Scott
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Encoding/decoding question
by ikegami (Patriarch) on Sep 11, 2011 at 18:23 UTC | |
by slugger415 (Monk) on Sep 11, 2011 at 19:13 UTC | |
by ikegami (Patriarch) on Sep 11, 2011 at 20:26 UTC | |
by tchrist (Pilgrim) on Sep 12, 2011 at 00:43 UTC | |
by ikegami (Patriarch) on Sep 12, 2011 at 02:35 UTC | |
| |
by Anonymous Monk on Sep 12, 2011 at 20:34 UTC | |
by slugger415 (Monk) on Sep 12, 2011 at 15:11 UTC | |
by tchrist (Pilgrim) on Sep 12, 2011 at 15:51 UTC | |
| |
by mirod (Canon) on Sep 13, 2011 at 08:36 UTC | |
Re: Encoding/decoding question
by Anonymous Monk on Sep 11, 2011 at 15:40 UTC | |
Re: Encoding/decoding question
by slugger415 (Monk) on Sep 13, 2011 at 14:07 UTC |
Back to
Seekers of Perl Wisdom