http://www.perlmonks.org?node_id=750525


in reply to Re^6: Understanding pack and unpack changes for binary data between 5.8 and 5.10
in thread Understanding pack and unpack changes for binary data between 5.8 and 5.10

Ok, I'll try to be clear this time :)
What I wanted is write the string encoded in utf8, and the length, in bytes, of the binary string resulting from pack. So I was using :
my $p=pack "V/a*", $s; my $l=length $p;
When I should have been using :
use Encode qw/encode/; my $p=pack "V/a*", encode('utf8',$s); my $l=bytes::length $p; # using bytes::length just to be sure, $p shouldn't have its utf8 flag + on, but in case it does...
Thinking about it a little more, I think what is disturbing me is that the 'a' in the pack format can be a multi-bytes character. And more generally, the idea that utf8 strings are strings of multi-bytes characters, rather than strings of bytes in utf8 encoding.
perl 5.10's pack behavior does seem to make more sense now.

Replies are listed 'Best First'.
Re^8: Understanding pack and unpack changes for binary data between 5.8 and 5.10
by ikegami (Patriarch) on Mar 13, 2009 at 21:44 UTC

    I think what is disturbing me is that the 'a' in the pack format can be a multi-bytes character.

    Me too. You've gotta wonder what's going to happen more often: someone wanting pack non-encoded characters or someone accidentally packing non-encoded characters. I would say the latter, so I find it weird that it doesn't croak ("Wide char in ...") when passed non-encoded characters.

    It could be a side effect of allowing pack and unpack to work with fixed-width fields, where the width is in characters rather than bytes.

    my $rec_format = 'a4a5a1'; my $rec_size = 10; binmode $fh_out, ':encoding(UTF-8)'; print $fh_out pack($rec_format, @fields); ... binmode $fh_in, ':encoding(UTF-8)'; read($fh_in, my $rec = '', $rec_size); @fields = unpack($rec_format, $rec);