Re: Unicode With LWP

by Juerd (Abbot)
on Jan 20, 2008

in reply to Unicode With LWP

LWP::Simple gives you the content as a byte string, ignoring the charset attribute in the Content-Type header. If you want to pass the data along without decoding it, you will have to use the same charset that your source used, but LWP::Simple didn't provide it.

You could find it out manually, hard code it, and hope they'll never change it. Or you could hop from LWP::Simple to a more advanced module, like full LWP. My favourite way of doing this is to use decoded_content and then explicitly re-encode as UTF-8 for output, because I like to standardize on UTF-8 for web stuff.

