http://www.perlmonks.org?node_id=998914


in reply to Re^2: UTF-8 and XML::Parser
in thread UTF-8 and XML::Parser

Maybe, you saved your script with utf-8 encoding. If you save the script as iso-8859-1, you will get iso-8859-1 result.

Below, 082.pl is utf-8 saved script and 082-1 is iso-8859-1 saved script."ü" is "c3 bc" in utf-8. "fc" in iso-8859-1.

>cat 082.pl |perl -ne 'print $1 if m!<word>(.*?)</word>!' | hd 00000000 4d c3 bc 6c 6c 65 72 |M..ller| 00000007 >cat 082-1.pl |perl -ne 'print $1 if m!<word>(.*?)</word>!' | hd 00000000 4d fc 6c 6c 65 72 |M.ller| 00000006 >