Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^2: Storing state of execution

by afoken (Abbot)
on Dec 10, 2015 at 05:56 UTC ( #1149864=note: print w/replies, xml ) Need Help??


in reply to Re: Storing state of execution
in thread Storing state of execution

One big problem of Storable is that its exact file format depends on the perl version and on the machine perl was compiled for. Changing the processor architecture and/or the perl version begs for trouble.

Data::Dumper generates executable perl code that has to be parsed back into the program using string eval. That works, sure, but it is a security nightmare: Imagine someone inserting system "rm -rf /" into the saved dump.

Data::Dumper does not dump everything, sometimes, it just generates dummy code:

>perl -MData::Dumper -E 'my $double=sub { return 2*shift }; say Dumper +($double)' $VAR1 = sub { "DUMMY" };

JSON, XML, and YAML don't have those problems. They simply don't allow code references, and they all are independant from the perl version and the processor architecture.

XML can't store binary data, because some characters (0x00) are not allowed in XML, not even in escaped form. You have to resort to using a hex dump, base64 or quoted-printable encoding.

XML stores some data multiple times (opening and closing tags contain the element name), wasting more disk space than other formats.

JSON has data types (string, number, array, key-value pairs, booleans, and null alias undef). It lacks some higher data types, most commonly a date and time type. Usually, one uses strings or key-value pairs ("objects") for that, but you could also use a number (counting days or seconds since an epoch value). Reading back JSON with dates in strings or objects requires some knowledge about the data. You need to know if a string is a date in disguise or just a string.

JSON does not define comments. Some JSON parsers allow comments. JSON::XS uses shell-style # comments, but that does not fit into a Javascript context (from which JSON is derived). Javascript has /* */ and // comments, that would make the most sense to use in JSON.

YAML: I can't get it into my head. There are at least two or three ways to represent the same information, and some just don't make sense to me. I try to avoid YAML.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Replies are listed 'Best First'.
Re^3: Storing state of execution
by choroba (Bishop) on Dec 10, 2015 at 09:04 UTC
    Data::Dumper does not dump everything, sometimes, it just generates dummy code
    Unless you specify
    $Data::Dumper::Deparse = 1;
    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

      But the deparsed code does not contain all required information in all cases:

      #!/usr/bin/perl -w use strict; use warnings; use feature 'state'; use Data::Dumper; $Data::Dumper::Deparse=1; our %data; our $VAR1; sub insert { my ($name,$href)=@_; $href->{$name}=$href->{'nextID'}->(); print $name," => ",$href->{$name},"\n"; } sub init_data { %data=( nextID => sub { state $n=100; $n++; print "\$n is now $n\n"; return $n; } ); } init_data(); print "Working with the original:\n\n"; insert(a => \%data); insert(b => \%data); insert(c => \%data); my $dump=Dumper(\%data); print "\nData::Dumper output:\n\n"; print "$dump\n"; print "\nWorking with the original again:\n\n"; insert(d => \%data); print "\nWorking with the re-evaluated Data::Dumper output:\n\n"; eval $dump; die $@ if $@; insert(d => $VAR1);

      Output:

      Working with the original: $n is now 101 a => 101 $n is now 102 b => 102 $n is now 103 c => 103 Data::Dumper output: $VAR1 = { 'c' => 103, 'a' => 101, 'nextID' => sub { use warnings; use strict; use feature 'state'; state $n = 100; ++$n; print "\$n is now $n\n"; return $n; }, 'b' => 102 }; Working with the original again: $n is now 104 d => 104 Working with the re-evaluated Data::Dumper output: $n is now 101 d => 101

      Yes, this is constructed. But it shows that deparsing the sub reference is not sufficient to restore all state after a Data::Dumper-eval cycle. The state of $n is lost, creating two colliding IDs.

      It gets even worse without the state feature:

      #!/usr/bin/perl -w use strict; use warnings; use Data::Dumper; $Data::Dumper::Deparse=1; our %data; our $VAR1; sub insert { my ($name,$href)=@_; $href->{$name}=$href->{'nextID'}->(); print $name," => ",$href->{$name},"\n"; } sub init_data { my $n=100; %data=( nextID => sub { $n++; print "\$n is now $n\n"; return $n; } ); } init_data(); print "Working with the original:\n\n"; insert(a => \%data); insert(b => \%data); insert(c => \%data); my $dump=Dumper(\%data); print "\nData::Dumper output:\n\n"; print "$dump\n"; print "\nWorking with the original again:\n\n"; insert(d => \%data); print "\nWorking with the re-evaluated Data::Dumper output:\n\n"; eval $dump; die $@ if $@; insert(d => $VAR1);

      Output:

      Working with the original: $n is now 101 a => 101 $n is now 102 b => 102 $n is now 103 c => 103 Data::Dumper output: $VAR1 = { 'c' => 103, 'b' => 102, 'a' => 101, 'nextID' => sub { use warnings; use strict; ++$n; print "\$n is now $n\n"; return $n; } }; Working with the original again: $n is now 104 d => 104 Working with the re-evaluated Data::Dumper output: Global symbol "$n" requires explicit package name at (eval 8) line 8. Global symbol "$n" requires explicit package name at (eval 8) line 9. Global symbol "$n" requires explicit package name at (eval 8) line 10.

      On the other hand, complaining loudly is better than just generating repeated IDs.

      Stupidly removing use strict and use warnings from the code hides the error, and results in worse behaviour:

      #!/usr/bin/perl -w use Data::Dumper; $Data::Dumper::Deparse=1; our %data; our $VAR1; sub insert { my ($name,$href)=@_; $href->{$name}=$href->{'nextID'}->(); print $name," => ",$href->{$name},"\n"; } sub init_data { my $n=100; %data=( nextID => sub { $n++; print "\$n is now $n\n"; return $n; } ); } init_data(); print "Working with the original:\n\n"; insert(a => \%data); insert(b => \%data); insert(c => \%data); my $dump=Dumper(\%data); print "\nData::Dumper output:\n\n"; print "$dump\n"; print "\nWorking with the original again:\n\n"; insert(d => \%data); print "\nWorking with the re-evaluated Data::Dumper output:\n\n"; eval $dump; die $@ if $@; insert(d => $VAR1);

      Output:

      Working with the original: $n is now 101 a => 101 $n is now 102 b => 102 $n is now 103 c => 103 Data::Dumper output: $VAR1 = { 'a' => 101, 'c' => 103, 'nextID' => sub { ++$n; print "\$n is now $n\n"; return $n; }, 'b' => 102 }; Working with the original again: $n is now 104 d => 104 Working with the re-evaluated Data::Dumper output: $n is now 1 d => 1

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re^3: Storing state of execution -- Sereal?
by Discipulus (Monsignor) on Dec 10, 2015 at 08:44 UTC
    afoken is used to give very esaustive explainations. ++ as always. But is not worth to mention also Sereal ? I have used it with profit, but i have not touched his limits because was a plain usage of it.

    have you experence with this also?

    L*

    PS What they say about their module is definetively intriguing! see Sereal Comparison Graphs

    L*


    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      But is not worth to mention also Sereal ?

      Never used it. I stumbled over Sereal some time ago, then forgot it, because I did not need it. It looks quite promising, and has similarities to various other binary formats (like BSON, BJSON, MessagePack).

      All of those formats promise compact data storage and easy parsing. But you lose one big advantage of text-base file formats: You can not simply read them using less, your favorite web browser, or your favorite text editor. You need a converter and/or a special viewer.

      If storage size or data transfer volume is an issue, the text-based formats can usually be compressed quite well, resulting in sizes similar to binary formats.

      As usual, Wikipedia has a big list, containing both binary and text-based formats: https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re^3: Storing state of execution
by Anonymous Monk on Dec 10, 2015 at 07:38 UTC

      I prefer to have a storage format that by definition can not contain executable code instead of relying on a filter that tries to prevent malicious code execution inside a string eval. One bug in Safe and the "SafestUndumper" is no longer save, but instead happily executes malicious code.

      Also, the "non-executable" formats force the programmer to use a parser. There is no way to accidentally or intentionally use a string eval on those formats.

      So, who would intentionally use a string eval on untrusted code?

      • The new programmer who does not know enough about the project.
      • The new programmer who did not learn the style guide by heart.
      • The lazy programmer who thinks "It's just a quick hack, I'll use string eval for now because I trust my current, hand-written config file, and fix that problem later." (We all know from experience that it won't be fixed until at least a few years later.)
      • The stupid programmer who thinks "all of that stinking modules are just a stupid waste of time, eval is much faster".

      A little bit of bean counting:

      Actually, every storage format that can contain strings can - in theory - also contain executable Perl code. But when reading back formats like XML or JSON, an explicit string eval on an extracted string is required, and that string eval is not present in the library reading the file format (or, at least, it should not be present).

      Oh, and string eval means more than just eval $string:

      • do $filename is a string eval on a file content - exactly what the four programmers from above would like to use do undump Data::Dumper output.
      • require $filename - it's do $filename at the core, plus a little bit of book keeping to avoid repeated reading of the file.
      • use $filename - require with an implicit BEGIN block.
      • evalbytes $bytes - new since v5.16

      And finally: Any Javascript compiler/interpreter must be able to read and execute JSON, as it is a very restricted subset of Javascript/ECMAScript. That also means that using Javascripts eval (always a string eval) to read JSON is a tempting, but stupid idea, on the same level as using Perl's string eval to read Data::Dumper output. Since ECMAScript Fifth Edition (2009), there is a special JSON parser embedded in the Javascript environment (see https://github.com/douglascrockford/JSON-js/blob/master/README).

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re^3: Storing state of execution
by stevieb (Abbot) on Dec 10, 2015 at 07:26 UTC

    ++ That's a spectacular explanation.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1149864]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (8)
As of 2018-06-19 17:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?



    Results (114 votes). Check out past polls.

    Notices?