Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Bizarre Dancer encoding behavior

by xyzzy (Pilgrim)
on Jul 21, 2014 at 01:25 UTC ( #1094410=perlquestion: print w/replies, xml ) Need Help??
xyzzy has asked for the wisdom of the Perl Monks concerning the following question:

Short version: When using return to send a response, Dancer converts a Unicode string into ISO8859-1. When setting the content directly via the Dancer::Response->new() method, the response contains the correct string.

Long version: I have an extremely minimal Dancer app. At some point, I was going to expand it to do a lot more, but as of right now the only thing it does is return the currently-playing track of an MPD server running on the same machine. A static page with an HTML5 internet radio player sends a request and updates a "Now Playing:" span at regular intervals. I needed something quick and dirty without mucking about with the two MPD modules on CPAN, so I used a system call. For those unfamiliar with MPD, it is a music player with a server-client architecture. There are a plethora of clients available for all different platforms, but the most basic is a CLI client called mpc. Called with no arguments, it returns the server status:

xyzzy@asscat:~$ mpc
ДДТ - Чёрно-белые танцы
[playing] #27/31 1:21/6:03 (21%)
volume: n/a repeat: off random: off single: off consume: off
xyzzy@asscat:~$

Here's the first version:

get '/np' => sub { return `mpd | head -n1`; }

Simple enough. But instead of the Unicode, my span looks like this:

Now playing: ””Т - Ч‘€но-бел‹е ‚а톋

I spent an hour trying to enable utf8, checking the HTTP headers, the meta tags on the page, even using Encode, but nothing worked. Then I rewrote my handler like so:

get '/np' => sub { Dancer::Response->new( status => 200, content => `mpd | head -n1`, ); }

Suddenly:

Now playing: ДДТ - Чёрно-белые танцы

Most of me only cares that it works now. But part of me is still baffled why one way works and the other way doesn't. What is it about return that mangles the sting encoding? It has to be something inherent in Dancer, because if I do

xyzzy@asscat:~$ perl -e'sub a {return `mpc|head -n1`}print a'
ДДТ - Герой

it works perfectly fine. Does anyone here know enough about Dancer's internals or is clever enough to figure this out?


$,=qq.\n.;print q.\/\/____\/.,q./\ \ / / \\.,q.    /_/__.,q..
Happy, sober, smart: pick two.

Replies are listed 'Best First'.
Re: Bizarre Dancer encoding behavior (dancer assumes unicode
by Anonymous Monk on Jul 21, 2014 at 03:00 UTC

    Hmm, backticks ie qx// return bytes not unicode ... you have to decode the bytes you get from backticks to get unicode that dancer can return

    Dancer assumes you're returning unicode, so it encodes the bytes ... thus double-encoding

    Solution is simple as perlunitut: Unicode in Perl#I/O flow (the actual 5 minute tutorial) teaches, decode external data, then dancer will encode it for you

    The client , note how the bytes match the server output

    #!/usr/bin/perl -- use strict; use warnings; use WWW::Mechanize; my $ua = WWW::Mechanize->new; for my $url(qw[ http://localhost:3000/unicode http://localhost:3000/by +tes http://localhost:3000/unibyte ]){ $ua->get($url); DD($ua->res->as_string); } sub DD { use Data::Dumper; print Data::Dumper->new([@_])->Useqq(1)->Du +mp, "\n"; } __END__ $VAR1 = "HTTP/1.0 200 OK\nServer: Perl Dancer 1.3118\nContent-Length: +6\nContent-Type: text/html; charset=UTF-8\nClient-Date: Mon, 21 Jul 2 +014 02:56:39 GMT\nClient-Peer: 127.0.0.1:3000\nClient-Response-Num: 1 +\nX-Powered-By: Perl Dancer 1.3118\n\n\320\224\320\224\320\242\n"; $VAR1 = "HTTP/1.0 200 OK\nServer: Perl Dancer 1.3118\nContent-Length: +12\nContent-Type: text/html; charset=UTF-8\nClient-Date: Mon, 21 Jul +2014 02:56:39 GMT\nClient-Peer: 127.0.0.1:3000\nClient-Response-Num: +1\nX-Powered-By: Perl Dancer 1.3118\n\n\303\220\302\224\303\220\302\2 +24\303\220\302\242\n"; $VAR1 = "HTTP/1.0 200 OK\nServer: Perl Dancer 1.3118\nContent-Length: +6\nContent-Type: text/html; charset=UTF-8\nClient-Date: Mon, 21 Jul 2 +014 02:56:39 GMT\nClient-Peer: 127.0.0.1:3000\nClient-Response-Num: 1 +\nX-Powered-By: Perl Dancer 1.3118\n\n\320\224\320\224\320\242\n";

    The server , note the DDumper of the bytes, search for it in the client output

    #!/usr/bin/perl -- use utf8; use Dancer; use Encode qw/ encode decode /; sub DD { use Data::Dumper; print Data::Dumper->new([@_])->Useqq(1)->Du +mp, "\n"; } config->{charset} = 'UTF-8'; my $unicode = "\x{414}\x{414}\x{422}"; ## q{ДДТ}; my $bytes = encode('UTF-8', $unicode); DD( $unicode, $bytes, encode('UTF-8', $bytes) ); get '/unicode' => sub { return $unicode }; get '/bytes' => sub { return $bytes }; get '/unibyte' => sub { return decode('UTF-8', $bytes ); }; dance; __END__ $VAR1 = "\x{414}\x{414}\x{422}"; $VAR2 = "\320\224\320\224\320\242"; $VAR3 = "\303\220\302\224\303\220\302\224\303\220\302\242"; >> Dancer 1.3118 server 300 listening on http://0.0.0.0:3000 == Entering the development dance floor ... Terminating on signal SIGINT(2)
Re: Bizarre Dancer encoding behavior
by Anonymous Monk on Jul 21, 2014 at 02:37 UTC
    This is what happens happens when you (I mean, Dancer) use binary strings in 'Unicode context' (so to say). For example

    perl -E 'my $m=q{ДДТ - Чёрно-белые танцы)}; binmode STDOUT, q{:encoding(utf-8)}; say $m'

    There are plenty of ways to produce mojibake actually... Did you try something like
    get '/np' => sub { return Encode::decode_utf8(`mpd | head -n1`); }
    ...just for testing? Anyway, this is probably a bug in Dancer.

    For some reason, Dancer imports utf8 into your file

    utf8->import; # line 232 of Dancer.pm
    Apparently, it expects Unicode strings from you. And `mpc|head -n1` produces bytes. The bad thing with Perl is that by default, it tries to decode binary strings from Latin-1.

Re: Bizarre Dancer encoding behavior
by wjw (Priest) on Jul 21, 2014 at 02:37 UTC

    I wonder if in the first case the return from a system call is what is actually mangling the output of the call instead of dancer.

    When Dancer handles things internally, it 'knows' what to expect and what to do with it?..

    Have just begun to look at Dancer2 myself, so have no expertise in that arena. Have noticed however, that the output of the shell can get munged when fed into some other process... . My wife works mostly in Cyrillic on her Linux laptop, and have faced challenges similar to this.

    Just a thought....

    ...the majority is always wrong, and always the last to know about it...

    Insanity: Doing the same thing over and over again and expecting different results...

    A solution is nothing more than a clearly stated problem...otherwise, the problem is not a problem, it is a facct

      I wonder if in the first case the return from a system call is what is actually mangling the output of the call instead of dancer.

      ...and in the second one? Come on, this is pretty clear. 'system' returns bytes, and Perl decodes them from Latin-1, like it always does by default.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1094410]
Front-paged by GotToBTru
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2018-01-21 21:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    How did you see in the new year?










    Results (230 votes). Check out past polls.

    Notices?