Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

Re: Subtle(?) issue(?) with lvalues and serialization

by Eily (Monsignor)
on May 24, 2018 at 16:40 UTC ( [id://1215164]=note: print w/replies, xml ) Need Help??

in reply to Subtle(?) issue(?) with lvalues and serialization

Oh, fun! I didn't know about LVALUE references. For the curious:

perl -E "$str = 'Hi World!'; $sub = \substr $str, 0, 2; $$sub = 'Hello +'; say $str; say ref $sub" Hello World! LVALUE

So the LVALUE is the magic behind how substr works, so that the output can walk and quack like a string scalar, except that changing its value will partially modify the content of another scalar.

Now, there's a simple test to see which of Sereal and Storable does the correct thing. After the data as been serialized and deserialized, it should behave like the original data. For example:

use feature 'say'; use Data::Dump qw( pp ); use Storable qw/ freeze thaw /; my $array = [0]; push @$array, \$array->[0]; my $copy = thaw freeze $array; $array->[0]++; $copy->[0]++; say "Array:"; say join ", ", map pp($_), @$array; say pp $array; say "\nCopy:"; say join ", ", map pp($_), @$copy; say pp $copy; __END__ Array: 1, \1 do { my $a = [1, 'fix']; $a->[1] = \$a->[0]; $a; } Copy: 1, \1 do { my $a = [1, 'fix']; $a->[1] = \$a->[0]; $a; }
So references to elements of a structure should turn into references to the clone element in the clone structure.

Now let's try with substr:

my $struct = ["Hi perlmonks"]; push @$struct, \substr($struct->[0], 0, 2); my $storable = thaw freeze $struct; my $sereal = decode_sereal encode_sereal $struct; pp $struct, $storable, $sereal; ${ $_->[1] } = "Hello", say $_->[0] for $struct, $storable, $sereal; __END__ Can't handle LVALUE data at C:/Programs/Strawberry/perl/vendor/lib/Dat +a/ line 374. ( ["Hi perlmonks", '#LVALUE#'], ["Hi perlmonks", \undef], ["Hi perlmonks", \"Hi"], ) Hello perlmonks Hi perlmonks Hi perlmonks
So IMHO both Sereal and Storable are incorrect, because they should at least warn about LVALUEs not being handled correctly (like Data::Dump does). In most cases I expect that Sereal is the next best thing though?

Replies are listed 'Best First'.
Re^2: Subtle(?) issue(?) with lvalues and serialization
by Veltro (Hermit) on May 24, 2018 at 22:01 UTC
    ... because they should at least warn about LVALUEs not being handled correctly ...

    Now hold on for a second:

    Warn? Yes a warning would have been nice. Not handled correctly? No.

    Documentation of Storage promises that Storage will work for SCALAR, ARRAY, HASH or REF objects:

    ... persistence to your Perl data structures containing SCALAR, ARRAY, HASH or REF objects, i.e. anything that can be conveniently stored to disk and retrieved at a later time.

    There is no promise that Storage will handle lvalue's and it doesn't, so one may actually reason it is functioning correctly.

    The example script below first shows the effect of storing the lvalue in an array and changing it to undef outside of the array. After that it shows that the same behavior cannot be observed after using the Storage. The lvalue is invalidated and becomes undef. Same behavior can be observed when the lvalue is inside a hash, also the correct behavior if you ask me. I do believe a warning would have been nice though:

    use strict ; use warnings ; use Storable qw( freeze thaw ) ; use Data::Dumper ; my $abc = "abc" ; my @ar = ( \( substr( $abc, 0 ) ) ) ; print "ar_0 = ${$ar[0]}\n" ; my $def = "def" ; my @ar2 = ( \( substr( $def, 0 ) ) ) ; $def = undef ; print "ar2_0 = ${$ar2[0]}\n" ; $def = 'def' ; print "ar2_0 = ${$ar2[0]}\n" ; my $serialized1 = freeze \$abc ; my $serialized2 = freeze \substr $def, 0 ; $abc = 'ghi' ; $def = 'jkl' ; $abc = ${ thaw( $serialized1 ) } ; $def = ${ thaw( $serialized2 ) } ; print "thaw abc = $abc\n" ; print "thaw def = $def\n" ; my $xyz = "xyz" ; my %xyz = ( _xyz => \substr $xyz, 0 ) ; my $ser_xyz = freeze \%xyz ; $xyz = "uvw" ; my %copyxyz = %{ thaw( $ser_xyz ) } ; print Dumper( \%copyxyz ) ; __END__ ar_0 = abc Use of uninitialized value in concatenation (.) or string at +line 13. ar2_0 = ar2_0 = def thaw abc = abc Use of uninitialized value $def in concatenation (.) or string at test line 25. thaw def = $VAR1 = { '_xyz' => \undef };
Re^2: Subtle(?) issue(?) with lvalues and serialization
by vr (Curate) on May 25, 2018 at 18:11 UTC

    Thank you everyone for valuable answers. Eily, maybe you'll find this funny, too:

    use strict; use warnings; use feature 'say'; use Devel::Peek; use Storable qw/ freeze thaw /; my $s = 'abc'; my $r = \substr $s, 1, 1; say ref $r; # LVALUE Dump $r; # say ${ thaw freeze $r }; # failure $$r = 'Z'; say ref $r; # LVALUE again Dump $r; # ref target is now POK, PV is "Z" say ${ thaw freeze $r }; # "Z"

    After first direct use for the purpose it was designed for in general and created in this very test, the substr's LVALUE magic is still there, and yet referent's POK flag is set, and Storable is fooled to DWIM.

    The return value LVALUE indicates a reference to an lvalue that is not a variable. You get this from taking the reference of function calls like pos or substr.

    Why, here you are:

    use strict; use warnings; use feature 'say', 'state'; use Devel::Peek; use Storable qw/ freeze thaw /; sub foo : lvalue { state $r; $$r } foo = 42; say ref \foo; # SCALAR Dump \foo; # Nothing interesting. Much # shorter output than one # full of magic, above. say ${ thaw freeze \foo }; # 42

    Wait, but it was exactly "reference to an lvalue that is not a variable"! Hm-m, though, they didn't say the reverse is true... So the LVALUE that confuses Storable, is limited to references to substr and pos (?).


    Storable, indeed, promises only to work with "SCALAR, ARRAY, HASH or REF objects". But it successfully deep-clones references to e.g. variables long out of scope. From there, maybe it's not too long distance to clone a string to which \substr refers.

    As much as the above would be interesting (if it worked), I only wanted, in OP, to pass values, not LVALUE magic. For that, Sereal DWIMs. Storable doesn't.

    But I really should have done as in first line:

    @result = mce_map { with $_ } substr(...) @result = mce_map { with $$_ } \substr(...) @result = mce_map { with $$_ } \( my $s = substr(...))

    (2nd parameter is "array of", of course) and not in any of 2 next lines. 2nd only works with Sereal. Depending on strings length and number, benchmark shows either of the 3 can be up to 30% faster than others, with mce_map block being no-op. Which is irrelevant, this gain is tiny compared to time required for real job. Consumed memory was also almost the same for all 3, as duplicates were created anyway (by Sereal?) in 2nd case.


    BrowserUk, the route you suggest makes perfect sense, if parallelism was fine-tuned by hand, especially since ultimate source of all strings (from which substrings were further extracted) is single large file with kind of TOC (long story...). But mce_map is so amazingly convenient to transparently add parallelism here, all long time consuming work happens per substring inside its block.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1215164]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (10)
As of 2024-04-18 14:21 GMT
Find Nodes?
    Voting Booth?

    No recent polls found