Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^6: Understanding pack and unpack changes for binary data between 5.8 and 5.10

by ikegami (Pope)
on Mar 13, 2009 at 16:10 UTC ( #750460=note: print w/replies, xml ) Need Help??


in reply to Re^5: Understanding pack and unpack changes for binary data between 5.8 and 5.10
in thread Understanding pack and unpack changes for binary data between 5.8 and 5.10

A proper test:
use strict; use warnings; use Test::More tests => 2 * ( 2 + 6 + 6 ); use Carp qw( croak ); sub avoid_utf8 { my ($s) = @_; utf8::downgrade($s, 1) or croak("Input not a string of bytes"); return $s; } sub use_utf8 { my ($s) = @_; utf8::upgrade($s); return $s; } diag("Perl $]"); for ( [ "bj\x{f6}rk", 5, 'hibit' ], [ "b" x 1000, 1000, 'long' ], ) { my ($s, $length, $test_name) = @$_; # length my %length; $length{'0'} = length avoid_utf8 $s; $length{'1'} = length use_utf8 $s; for my $enc (qw( 0 1 )) { is($length{$enc}, $length, "length $test_name $enc"); } # pack 'V/a*' my $expected = pack('V', $length) . $s; my %packed; $packed{'?0'} = pack "V/a*", avoid_utf8 $s; $packed{'?1'} = pack "V/a*", use_utf8 $s; $packed{'00'} = avoid_utf8 pack "V/a*", avoid_utf8 $s; $packed{'01'} = avoid_utf8 pack "V/a*", use_utf8 $s; $packed{'10'} = use_utf8 pack "V/a*", avoid_utf8 $s; $packed{'11'} = use_utf8 pack "V/a*", use_utf8 $s; for my $enc (qw( ?0 ?1 00 01 10 11 )) { ok($packed{$enc} eq $expected, "pack $test_name $enc"); } # print for my $enc (qw( ?0 ?1 00 01 10 11 )) { my $buf = ''; { open(my $fh, '>', \$buf); binmode $fh; # No mucking with crlf print $fh $packed{$enc}; } ok($buf eq $expected, "print $test_name $enc"); } }
>c:\progs\perl588\bin\perl test.pl 1..28 # Perl 5.008008 ok 1 - length hibit 0 ok 2 - length hibit 1 ok 3 - pack hibit ?0 not ok 4 - pack hibit ?1 # Failed test 'pack hibit ?1' # at test.pl line 54. ok 5 - pack hibit 00 not ok 6 - pack hibit 01 # Failed test 'pack hibit 01' # at test.pl line 54. ok 7 - pack hibit 10 not ok 8 - pack hibit 11 # Failed test 'pack hibit 11' # at test.pl line 54. ok 9 - print hibit ?0 not ok 10 - print hibit ?1 # Failed test 'print hibit ?1' # at test.pl line 67. ok 11 - print hibit 00 not ok 12 - print hibit 01 # Failed test 'print hibit 01' # at test.pl line 67. ok 13 - print hibit 10 not ok 14 - print hibit 11 # Failed test 'print hibit 11' # at test.pl line 67. ok 15 - length long 0 ok 16 - length long 1 ok 17 - pack long ?0 ok 18 - pack long ?1 ok 19 - pack long 00 ok 20 - pack long 01 ok 21 - pack long 10 ok 22 - pack long 11 ok 23 - print long ?0 ok 24 - print long ?1 ok 25 - print long 00 ok 26 - print long 01 ok 27 - print long 10 ok 28 - print long 11 # Looks like you failed 6 tests of 28.
>c:\progs\perl5100\bin\perl test.pl 1..28 # Perl 5.010000 ok 1 - length hibit 0 ok 2 - length hibit 1 ok 3 - pack hibit ?0 ok 4 - pack hibit ?1 ok 5 - pack hibit 00 ok 6 - pack hibit 01 ok 7 - pack hibit 10 ok 8 - pack hibit 11 ok 9 - print hibit ?0 ok 10 - print hibit ?1 ok 11 - print hibit 00 ok 12 - print hibit 01 ok 13 - print hibit 10 ok 14 - print hibit 11 ok 15 - length long 0 ok 16 - length long 1 ok 17 - pack long ?0 ok 18 - pack long ?1 ok 19 - pack long 00 ok 20 - pack long 01 ok 21 - pack long 10 ok 22 - pack long 11 ok 23 - print long ?0 ok 24 - print long ?1 ok 25 - print long 00 ok 26 - print long 01 ok 27 - print long 10 ok 28 - print long 11

Internal encoding surfaces in 5.8.8, but not in 5.10.0 (for the functions tested).

Replies are listed 'Best First'.
Re^7: Understanding pack and unpack changes for binary data between 5.8 and 5.10
by squentin (Sexton) on Mar 13, 2009 at 21:26 UTC
    Ok, I'll try to be clear this time :)
    What I wanted is write the string encoded in utf8, and the length, in bytes, of the binary string resulting from pack. So I was using :
    my $p=pack "V/a*", $s; my $l=length $p;
    When I should have been using :
    use Encode qw/encode/; my $p=pack "V/a*", encode('utf8',$s); my $l=bytes::length $p; # using bytes::length just to be sure, $p shouldn't have its utf8 flag + on, but in case it does...
    Thinking about it a little more, I think what is disturbing me is that the 'a' in the pack format can be a multi-bytes character. And more generally, the idea that utf8 strings are strings of multi-bytes characters, rather than strings of bytes in utf8 encoding.
    perl 5.10's pack behavior does seem to make more sense now.

      I think what is disturbing me is that the 'a' in the pack format can be a multi-bytes character.

      Me too. You've gotta wonder what's going to happen more often: someone wanting pack non-encoded characters or someone accidentally packing non-encoded characters. I would say the latter, so I find it weird that it doesn't croak ("Wide char in ...") when passed non-encoded characters.

      It could be a side effect of allowing pack and unpack to work with fixed-width fields, where the width is in characters rather than bytes.

      my $rec_format = 'a4a5a1'; my $rec_size = 10; binmode $fh_out, ':encoding(UTF-8)'; print $fh_out pack($rec_format, @fields); ... binmode $fh_in, ':encoding(UTF-8)'; read($fh_in, my $rec = '', $rec_size); @fields = unpack($rec_format, $rec);

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://750460]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (7)
As of 2018-06-20 07:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?



    Results (116 votes). Check out past polls.

    Notices?