in reply to Re^2: UTF-8 and XML::Parser
in thread UTF-8 and XML::Parser
Maybe, you saved your script with utf-8 encoding. If you save the script as iso-8859-1, you will get iso-8859-1 result.
Below, 082.pl is utf-8 saved script and 082-1 is iso-8859-1 saved script."ü" is "c3 bc" in utf-8. "fc" in iso-8859-1.
>cat 082.pl |perl -ne 'print $1 if m!<word>(.*?)</word>!' | hd
00000000 4d c3 bc 6c 6c 65 72 |M..ller|
>cat 082-1.pl |perl -ne 'print $1 if m!<word>(.*?)</word>!' | hd
00000000 4d fc 6c 6c 65 72 |M.ller|