Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Converting utf-8 to base64 and back

by LanX (Saint)
on Jan 21, 2017 at 23:21 UTC ( [id://1180100]=perlquestion: print w/replies, xml ) Need Help??

LanX has asked for the wisdom of the Perl Monks concerning the following question:

The following code produces the original utf-8 string after converting to base64.

But it seems overly complicated. What am I missing ?

Additionally: I can probably understand that I need to do encode_utf8 step, but why do I need to set the utf8 flag manually after explicitly encoding to utf8?

use strict; use warnings; use utf8 ; use Encode; use Data::Dump qw/dd pp/; use Devel::Peek; use MIME::Base64 ; my $str ='Ä'; warn "\n*** orig str='$str'"; Dump $str; our $encoded = MIME::Base64::encode_base64($str); warn "\n*** encode_b64 encoded='$encoded'"; Dump $encoded; warn "\n*** decode_b64"; our $decoded = MIME::Base64::decode_base64($encoded); Dump $decoded; warn "\n*** encode_utf8"; $decoded = Encode::encode_utf8($decoded); Dump $decoded; warn "\n*** _utf8_on"; Encode::_utf8_on($decoded); Dump $decoded;

*** orig str='Ä' at c:/tmp/b64_utf8.pl line 12. SV = PV(0x54c9d8) at 0x5a8b00 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0x5471d8 "\303\204"\0 [UTF8 "\x{c4}"] CUR = 2 LEN = 16 *** encode_b64 encoded='xA== ' at c:/tmp/b64_utf8.pl line 17. SV = PV(0x54caa8) at 0x26dbcb8 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x547478 "xA==\n"\0 CUR = 5 LEN = 16 *** decode_b64 at c:/tmp/b64_utf8.pl line 21. SV = PV(0x54cb08) at 0x26da130 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x28d2b98 "\304"\0 CUR = 1 LEN = 16 *** encode_utf8 at c:/tmp/b64_utf8.pl line 25. SV = PV(0x54cb08) at 0x26da130 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x28d2c28 "\303\204"\0 CUR = 2 LEN = 16 *** _utf8_on at c:/tmp/b64_utf8.pl line 29. SV = PV(0x54cb08) at 0x26da130 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x28d2c28 "\303\204"\0 [UTF8 "\x{c4}"] CUR = 2 LEN = 16

Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
Je suis Charlie!

Replies are listed 'Best First'.
Re: Converting utf-8 to base64 and back
by choroba (Cardinal) on Jan 22, 2017 at 00:41 UTC
    Because you switched encoding and decoding to/from UTF-8:
    #!/usr/bin/perl use strict; use warnings; use utf8 ; use open IO => ':encoding(UTF-8)', ':std'; use Encode; use Data::Dump qw/dd pp/; use Devel::Peek; use MIME::Base64 ; my $str ='ř'; warn "\n*** orig str='$str'"; Dump $str; my $bytes = Encode::encode_utf8($str); warn "\n*** encode utf8='$bytes'"; Dump $bytes; our $encoded = MIME::Base64::encode_base64($bytes); warn "\n*** encode_b64 encoded='$encoded'"; Dump $encoded; warn "\n*** decode_b64"; our $decoded = MIME::Base64::decode_base64($encoded); Dump $decoded; warn "\n*** encode_utf8"; $decoded = Encode::decode_utf8($decoded); Dump $decoded; warn "\n*** _utf8_on (NO-OP)"; Encode::_utf8_on($decoded); Dump $decoded;

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      Thanks I just realized it. (was compiling an answer but YOU were faster ;)

      It's confusing

      • encode_base64 is producing a base64 string
      • but encode_utf8 is producing an octet-string (i.e. bytes oriented)
      • while decode_utf8 is producing an utf8 character string from an octet string
      so encode into base64 but encode from utf8!

      update

      and my initial example didn't fail because Ä is not a wide character (code-point <256), while ř (code-point 345) is.

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!

      update

      from Encode#Basic-methods

      > CAVEAT: When you run $octets = encode("utf8", $string) , then $octets might not be equal to $string. Though both contain the same data, the UTF8 flag for $octets is always off. When you encode anything, the UTF8 flag on the result is always off, even when it contains a completely valid utf8 string. See The UTF8 flag below.

      so it's even more complicated encode_utf8 does convert into utf8 but w/o flag ..

Re: Converting utf-8 to base64 and back
by LanX (Saint) on Jan 22, 2017 at 01:38 UTC
    Here a more concise demonstration.

    (please note that PM has problems displaying wide-characters in code blocks, e.g. "&#345;" stands for "ř")

    use strict; use warnings; use utf8 ; use Encode; use Data::Dump qw/dd pp/; use Devel::Peek; use MIME::Base64 ; use open IO => ':encoding(UTF-8)', ':std'; roundtrip($_) for 'Ä', '&#345;'; sub roundtrip { my $orig = shift; warn "\n\n****** START\n"; peek( orig => $orig); my $encode_utf8 = Encode::encode_utf8($orig); peek (encode_utf8 =>$encode_utf8); my $encode_b64 = MIME::Base64::encode_base64($encode_utf8); peek(encode_b64=>$encode_b64); my $decode_b64 = MIME::Base64::decode_base64($encode_b64); peek (decode_b64=>$decode_b64); my $decode_utf8 = Encode::decode_utf8($decode_b64); peek (decode_utf8 => $decode_utf8); } sub peek { my ($name,$str) = @_; my (undef, $file, $line) = caller(0); my $pp = pp($str); warn "\n*** $name = '$str' = $pp at $file line $line\n"; Dump $_[1]; }

    ****** START *** orig = 'Ä' = "\xC4" at c:/tmp/b64_utf8_2.pl line 21 SV = PV(0x36c9d8) at 0x25c8308 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0x367478 "\303\204"\0 [UTF8 "\x{c4}"] CUR = 2 LEN = 16 *** encode_utf8 = 'Ã&#132;' = "\xC3\x84" at c:/tmp/b64_utf8_2.pl line +25 SV = PV(0x2607c48) at 0x25c8110 REFCNT = 1 FLAGS = (PADMY,POK,pPOK) PV = 0x28320c8 "\303\204"\0 CUR = 2 LEN = 16 *** encode_b64 = 'w4Q= ' = "w4Q=\n" at c:/tmp/b64_utf8_2.pl line 29 SV = PV(0x2607c58) at 0x25c80e0 REFCNT = 1 FLAGS = (PADMY,POK,pPOK) PV = 0x2832188 "w4Q=\n"\0 CUR = 5 LEN = 16 *** decode_b64 = 'Ã&#132;' = "\xC3\x84" at c:/tmp/b64_utf8_2.pl line 3 +2 SV = PV(0x2607c38) at 0x25c7f48 REFCNT = 1 FLAGS = (PADMY,POK,pPOK) PV = 0x2832398 "\303\204"\0 CUR = 2 LEN = 16 *** decode_utf8 = 'Ä' = "\xC4" at c:/tmp/b64_utf8_2.pl line 36 SV = PV(0x2607bf8) at 0x25c7f00 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0x28323c8 "\303\204"\0 [UTF8 "\x{c4}"] CUR = 2 LEN = 16 ****** START *** orig = '&#345;' = "\x{159}" at c:/tmp/b64_utf8_2.pl line 21 SV = PV(0x36c9d8) at 0x25c8308 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0x367478 "\305\231"\0 [UTF8 "\x{159}"] CUR = 2 LEN = 16 *** encode_utf8 = 'Å&#153;' = "\xC5\x99" at c:/tmp/b64_utf8_2.pl line +25 SV = PV(0x2607c48) at 0x25c8110 REFCNT = 1 FLAGS = (PADMY,POK,pPOK) PV = 0x28324e8 "\305\231"\0 CUR = 2 LEN = 16 *** encode_b64 = 'xZk= ' = "xZk=\n" at c:/tmp/b64_utf8_2.pl line 29 SV = PV(0x2607c58) at 0x25c80e0 REFCNT = 1 FLAGS = (PADMY,POK,pPOK) PV = 0x2832518 "xZk=\n"\0 CUR = 5 LEN = 16 *** decode_b64 = 'Å&#153;' = "\xC5\x99" at c:/tmp/b64_utf8_2.pl line 3 +2 SV = PV(0x2607c38) at 0x25c7f48 REFCNT = 1 FLAGS = (PADMY,POK,pPOK) PV = 0x2832428 "\305\231"\0 CUR = 2 LEN = 16 *** decode_utf8 = '&#345;' = "\x{159}" at c:/tmp/b64_utf8_2.pl line 36 SV = PV(0x2607bf8) at 0x25c7f00 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0x28322a8 "\305\231"\0 [UTF8 "\x{159}"] CUR = 2 LEN = 16

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1180100]
Front-paged by stevieb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (6)
As of 2024-03-28 18:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found