Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Encode::FB_CROAK eating up strings?

by polettix (Vicar)
on Apr 11, 2016 at 10:26 UTC ( [id://1160099]=perlquestion: print w/replies, xml ) Need Help??

polettix has asked for the wisdom of the Perl Monks concerning the following question:

Hi All!

I hit something weird I'd like to discuss with you. The behaviour below is there up until perl 5.20.2 at least, I didn't try it in anything more recent.

use strict; use warnings; use Encode qw< encode >; my $chars = "whatever"; print length($chars), " characters - initial string\n"; encode('UTF-8', $chars); print length($chars), " characters - first encode, no FB_CROAK\n"; encode('UTF-8', "$chars", Encode::FB_CROAK); print length($chars), " characters - second encode, FB_CROAK on, readonly string\n"; encode('UTF-8', $chars, Encode::FB_CROAK); print length($chars), " characters - third encode, FB_CROAK, on original string\n";
I would have expected that all the prints above would just say that the string $chars is 8 characters long... but perl thinks differently:
8 characters - initial string 8 characters - first encode, no FB_CROAK 8 characters - second encode, FB_CROAK on, readonly string 0 characters - third encode, FB_CROAK, on original string
So, it seems that if FB_CROAK is on and the string to encode is not readonly... it gets destroyed.

I tried with perl 5.8.8, 5.18.1 and 5.20.2, with the same results. I tried to look in the docs for Encode but to no avail. Tried to Google, but nothing. Tried to supersearch here... nothing.

Am I missing anything? Is this what's expected actually, and documented anywhere? Thanks!

perl -ple'$_=reverse' <<<ti.xittelop@oivalf

Io ho capito... ma tu che hai detto?

Replies are listed 'Best First'.
Re: Encode::FB_CROAK eating up strings?
by SimonPratt (Friar) on Apr 11, 2016 at 10:48 UTC

    From here:

    any fallback will destructively modifies its argument and for a good reason. See Encode::PerlIO for why

Re: Encode::FB_CROAK eating up strings?
by Anonymous Monk on Apr 11, 2016 at 11:22 UTC
    Encode::FB_CROAK | Encode::LEAVE_SRC is the incantation you want.
    I tried to look in the docs for Encode but to no avail
    Yes, it's easy to miss, but it's documented: 'If the "Encode::LEAVE_SRC" bit is not set but CHECK is set, then the source string to encode() or decode() will be overwritten in place. If you're not interested in this, then bitwise-OR it with the bitmask.'
      Thanks for pointing out.

      I personally think that the docs might use some clarifications directly in encode and decode to change people's expectations. I'll try to propose a doc patch where this possibility is mentioned explicitly.

      Update: just sent a Pull Request at https://github.com/dankogai/p5-encode/pull/54.

      Update: merged, yay!

      perl -ple'$_=reverse' <<<ti.xittelop@oivalf

      Io ho capito... ma tu che hai detto?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1160099]
Approved by Discipulus
Front-paged by Discipulus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2024-04-21 09:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found