Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Data::Dumper and utf8

by thedi (Acolyte)
on Jan 30, 2010 at 07:01 UTC ( #820455=perlquestion: print w/ replies, xml ) Need Help??
thedi has asked for the wisdom of the Perl Monks concerning the following question:

I have a big application, all written in Perl. It uses a MySQL database and saves in some cases Perl structures to the database. This Perl structures are dumped with Data::Dumper and restored with eval.

Everyting is in Unicode: program sources, data files, database. Each program and Module begins with use utf8. Non-ASCII Unicode data is handled without problems, except in one case: data dumped with Data::Dumper, then restored and then written to the database (now without Dumper). Such data is corrupted.

After some research i finally found out that the data restored with eval looses its utf-8 bit. All bytes are correct, but Perl does not know that it is utf8-encoded. DBI then fails when it tries to include such data in SQL statements.

Unluckilly the application is quite big, What is the simplest way to fix this problem? All I need is a way to tell Perl in evals, that string which are generated by the eval are in Unicode (Useqq does not solve it)

This program demonstrates the problem:

#!/usr/bin/perl -w use strict; use utf8; use Data::Dumper; # $Data::Dumper::Useqq = 1; binmode STDOUT, 'utf8'; our $VAR1; my $data = ''; # this is a non-ACII a Umlaut my $dump = Dumper( $data ); eval $dump; if ( $data eq $VAR1 ) { print " == equal\n"; } else { print " != not equal\n"; } print $dump, "\n"; print Dumper( $VAR1 ), "\n"; print "original is utf8 = '" . utf8::is_utf8( $data ) . "'\n"; print "restored is utf8 = '" . utf8::is_utf8( $VAR1 ) . "'\n";

Output is

== equal $VAR1 = "\x{e4}"; $VAR1 = ''; original is utf8 = '1' restored is utf8 = ''
PS: it is on a Mac OSX 10.5.8 with Perl v5.8.8 built for darwin-thread-multi-2level

Comment on Data::Dumper and utf8
Select or Download Code
Replies are listed 'Best First'.
Re: Data::Dumper and utf8
by Krambambuli (Deacon) on Jan 30, 2010 at 08:43 UTC
    Although I'm not really sure why, the following code change seems to work:

    #!/usr/bin/perl use strict; use warnings; use utf8; use Encode; use Data::Dumper; #$Data::Dumper::Useqq = 1; binmode STDOUT, 'utf8'; our $VAR1; my $data = ''; # this is a non-ACII a Umlaut my $dump = Dumper( encode( 'utf8', $data ) ); print $dump, "\n"; eval $dump; print $dump, "\n"; if ( $data eq $VAR1 ) { print " == equal\n"; } else { print " != not equal\n"; } #print Dumper( $VAR1 ), "\n"; print "Data: $data\nVAR1: $VAR1\n"; print "original is utf8 = '" . utf8::is_utf8( $data ) . "'\n"; print "restored is utf8 = '" . utf8::is_utf8( $VAR1 ) . "'\n";
    It's Dumper that generates the problems, not the eval:

    $VAR1 = "\x{e4}";

    has no way to determine that there's something utf8-related in the string.

    Update: There's some additional info available here.

    Update2: Looks like the proper solution is actually to just set

    $Data::Dumper::Useperl = 1;
    According to the doc, you just have to use a Perl newer than 5.8.0 to have this working - it does work for me, Perl 5.10.0.

    Krambambuli
    ---

      Replacing method Data::Dumper::qquote as suggested in the additional info did it for me. Many thanks.

      Perl is great - but the best thing about Perl is its monks!

      regards

      Thedi

Re: Data::Dumper and utf8
by brycen (Monk) on Aug 13, 2010 at 21:28 UTC
    The difference with $Data::Dumper::Useperl = 1; is related to strings that can be represented entirely without utf8. On ingest eval makes the usual perl heuristic about utf8, and gets it wrong:
    #!/usr/bin/perl -w use utf8; # so source code is utf-8 encoded use Data::Dumper; $data1 = ' ☺'; # a-umlaut, space, smiley $data2 = ' '; # a-umlaut, space, space $Data::Dumper::Useperl = 1; $dump1 = Dumper( $data1 ); print $dump1; $dump2 = Dumper( $data2 ); print $dump2; print "\n"; $Data::Dumper::Useperl = 0; $dump1 = Dumper( $data1 ); print $dump1; $dump2 = Dumper( $data2 ); print $dump2; print "\n";

    Output

    $VAR1 = "\x{e4} \x{263a}"; $VAR1 = ' '; $VAR1 = "\x{e4} \x{263a}"; $VAR1 = "\x{e4} ";

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://820455]
Approved by biohisham
Front-paged by almut
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2015-07-31 07:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (275 votes), past polls