Re^2: www:mechanize mangles unicode

by red0hat (Initiate)
on Apr 28, 2010 at 21:07 UTC

in reply to Re: www:mechanize mangles unicode
in thread www:mechanize mangles unicode

The headers claim:

Accept-Charset: ISO-8859-1,utf-8

and the data that is being sent is "Château". Of course, what is reading the log might be making it pretty, again.


Re^3: www:mechanize mangles unicode
by Corion (Pope) on Apr 28, 2010 at 21:10 UTC

    Yes, when dealing with encoding problems, you will need to make sure that all components show you the real thing. Look at the hexdumps of the parts and check that they show the octets that correspond to the respective encoding.

Re^3: www:mechanize mangles unicode
by Hue-Bond (Priest) on Apr 28, 2010 at 21:13 UTC
    and the data that is being sent is "Château".

    But, what is "Château"? How could you be sure of that? Well, use an hexdumper for that, for example vim's xxd:

    $ echo -n Château |xxd 0000000: 4368 c3a2 7465 6175 Ch..teau

    What you specifically need then, is dumping your log file:

    $ grep 'teau\b' /path/to/log |xxd |less

     David Serrano
     (Please treat my english text just like Perl code, i.e. feel free to notify me of any syntax, grammar, style and/or spelling errors. Thank you!).

