Beefy Boxes and Bandwidth Generously Provided by pair Networks vroom
P is for Practical
 
PerlMonks  

"wide character in print" error in DBM::Deep

by jdporter (Canon)
on Mar 22, 2010 at 18:35 UTC ( #830137=perlquestion: print w/ replies, xml ) Need Help??
jdporter has asked for the wisdom of the Perl Monks concerning the following question:

I have some UTF-8 data and when I try to store it in a DBM::Deep-tied hash I get a "wide character in print" error in DBM::Deep::File.pm. It occurs at line 193, which is a print to the filehandle which has been opened to store the dbm database on disk. The filehandle has been opened with sysopen, using every available means to ensure that the file is open in "binary" mode.

It seems to me, perhaps naively, that this should work; and further, I would have thought that this would have been a scenario covered by the test cases. (I should check.)

Does anyone have any idea how I can get around this? Are there DBM::Deep options I can set?
Should I open the file myself and pass the handle to DBM::Deep? If so, how exactly should I do that?

I guess I should also ask:

Are there any other modules I could use for managing a large data structure on disk rather than in memory?

Thanks, everyone...

What is the sound of Windows? Is it not the sound of a wall upon which people have smashed their heads... all the way through?

Comment on "wide character in print" error in DBM::Deep
Re: "wide character in print" error in DBM::Deep
by ikegami (Pope) on Mar 22, 2010 at 20:10 UTC

    Here's a test program:

    #!/usr/bin/perl use strict; use warnings; use Test::More tests => 8; use Data::Dumper qw( Dumper ); use DBM::Deep qw(); use File::Temp qw( tempfile ); my ($fh, $fn) = tempfile(); my $db = DBM::Deep->new( $fn ); sub _u { utf8::upgrade( my $s = $_[0] ); $s } sub _d { utf8::downgrade( my $s = $_[0] ); $s } $db->{A} = "\x61"; $db->{B} = _d("\xA0"); $db->{C} = _u("\xA0"); $db->{D} = "\x{2660}"; $db->{"\x61" } = 1; $db->{_d("\xA0")} = 1; $db->{_u("\xA0")} = 1; $db->{"\x{2660}"} = 1; my @keys = sort keys %$db; is( 0+@keys, 7, "Num keys" ) or diag('Got ' . do { local $Data::Dumper::Useqq = 1; local $Data::Dumper::Terse = 1; local $Data::Dumper::Indent = 0; Dumper(\@keys) }); is( $keys[4], "\x61", "7-bit key" ); is( $keys[5], "\xA0", "8-bit key" ); is( $keys[6], "\x{2660}", "32-bit key" ); is( $db->{A}, "\x61", "7-bit val" ); is( $db->{B}, "\xA0", "8-bit val (UTF8=0)" ); is( $db->{C}, "\xA0", "8-bit val (UTF8=1)" ); is( $db->{D}, "\x{2660}", "32-bit val" );

    Output:

    1..8 Wide character in print at /home/eric/lib/perl5/DBM/Deep/File.pm line +193. # Looks like your test exited with 9 before it could output anything.

    To validate the test, I ran it after replacing
    my $db = DBM::Deep->new( $fn );
    with
    my $db = {};

    Update: Fixed bugs in test.

Re: "wide character in print" error in DBM::Deep
by dragonchild (Archbishop) on Mar 22, 2010 at 20:39 UTC
    That's odd. There should be a UTF-8 test, but I've never used UTF-8 with DBM::Deep. In theory, you should be able to pass the filehandle in and there are tests for that. Per our email, please send me a failing test. the repos is at http://github.com/robkinyon/dbm-deep

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

      In theory, you should be able to pass the filehandle in and there are tests for that.

      That fails too. Tested by changing

      my ($fh, $fn) = tempfile(); my $db = DBM::Deep->new( $fn );
      to
      my $fh = tempfile(); my $db = DBM::Deep->new( { fh => $fh } );
      in the test I provided earlier.

      I did a bit of studying (DBM-Deep-1.0016).

      • write_value uses class DBM::Deep::Engine::Sector::Scalar for everything but references and undef.
      • ::Scalar::_init receives the value and passes it to print_at.
      • print_at expects a string of bytes. It's getting a string that contains non-bytes.

      No encoding is done anywhere, as far as I've seen. Definitely a major bug. Two possible fixes:

      • Have DBM::Deep::Engine::Sector::Scalar's _init encode values.
      • Add another Sector type for strings with UTF8=1.

      The latter should be simpler, more efficient, and allows the preservation of the UTF8 flag. Basically, adjust write_value and add

      package DBM::Deep::Engine::Sector::Unicode; use 5.006_000; use strict; use warnings FATAL => 'all'; no warnings 'recursion'; use base qw( DBM::Deep::Engine::Sector::Scalar ); sub type { $_[0]{engine}->SIG_UNICODE } sub _init { my $self = shift; utf8::encode( $self->{data} ) if $] >= 5.008 && defined($self->{data}); $self->SUPER::_init(); } sub data { my $self = shift; my $data = $self->SUPER::data(); utf8::decode( $data ) if $] >= 5.008; return $data; } 1; __END__

      And that's just for values. A separate fix is needed for the keys, I believe.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://830137]
Approved by Hue-Bond
Front-paged by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (6)
As of 2014-04-20 07:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (485 votes), past polls