XML::Smart and UTF-8

by Kyshtynbai (Sexton)
Hi Monks! I have the following problem with CGI and XML::Smart. The short description: I get two parameters from GET request and pass them to XML::Smart methods to put them into xml file. The parameters I send are in Russian, so I use UTF-8 encoding. But when I do "cat xml/newfile.xml", all the info in tags is encoded to something weird. Like this:
cat xml/newfile.xml <?xml version="1.0" encoding="UTF-8" ?> <?meta name="GENERATOR" content="XML::Smart/1.78 Perl/5.014002 [linux] +" ?> <quotes> <txt>English text is fine</txt> <play>Really fine </play> <txt dt:dt="binary.base64">0KbQuNGC0LDRgtCwDQoK</txt> <play dt:dt="binary.base64">0J/QvtC/0L7QsdCw0LLQsAo=</play> </quotes>
I really don't know how to handle it. Here's some code of the script which adds this xml info:
#!/usr/bin/perl use strict; use XML::Smart; use CGI; use HTML::Template; use lib ('../'); use MySite; my $q = CGI->new(); my %params = MySite::get_params($q); my $add_quote_success= HTML::Template->new(filename =>'../templates/ad +d_quote_success.tmpl'); my $add_quote_error= HTML::Template->new(filename =>'../templates/add_ +quote_error.tmpl'); if ($ENV{REQUEST_METHOD} ne 'POST') { print "Content-type: text/html\n\n"; print "Stop wgetting this script."; } else { if (($params{txt} eq '') or ($params{play} eq '')) { print $q->header(-charset => 'utf-8'); print $add_quote_error->output; } else { my $XML = XML::Smart->new('../xml/newfile.xml'); my $counter = @{$XML->{quotes}{txt}}; $XML->{quotes}{txt}[$counter] = "$params{txt}\n"; $XML->{quotes}{play}[$counter] = "$params{play}\n"; $XML->save('../xml/newfile.xml'); print $q->header(-charset => 'utf-8'); print $add_quote_success->output; } }

Re: XML::Smart and UTF-8
by choroba (Archbishop) on Jun 07, 2014 at 14:53 UTC
    I don't work with XML::Smart. However, its documentation says, that this kind of encoding happens for binary data. Are you sure Perl knows your incoming data are UTF-8 encoded? (Is the data really Цитата and Попобава?)

    You can also try to directly specify you don't want to encode binary data by set_binary(0).

      I'm not actually sure (but it is text, not binary data; it comes from common input type=text html field). And the data is as you've decoded.
Re: XML::Smart and UTF-8
by ikegami (Pope) on Jun 21, 2014 at 06:57 UTC

    1. It's so so wrong, but you need to encode the data before passing to to XML::Smart. You seem to be doing that already.

    2. You need to pass decode => 1 to save.

    use strict; use warnings; use Encode qw( encode_utf8 ); use XML::Smart qw( ); my $text = chr(0xC9); my $utf8 = encode_utf8($text); my $doc = XML::Smart->new('<?xml version="1.0" encoding="UTF-8"?><root +></root>'); $doc->{root}{node}{CONTENT} = $utf8; $doc->save('a.xml', decode=>1);

