Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Use a Serialized Hash... When It Might Not Exist?

by redapplesonly (Sexton)
on Nov 11, 2022 at 22:39 UTC ( [id://11148149]=perlquestion: print w/replies, xml ) Need Help??

redapplesonly has asked for the wisdom of the Perl Monks concerning the following question:

Hola Perl Monks,

I am working on a Perl (v5.30.0) script that needs to use stateful information from the previous run of the script. In my imagination, I can see the script basically working like this:

>> STEP ONE: Loads stateful data from an external file

>> STEP TWO: Crunches stateful data

>> STEP THREE: Saves stateful data to the external file for the next iteration

I'm pretty confident that I can use a serialized hash to store the stateful information. I've read up on Perl's Storable module, plus the store() and retrieve() functions. Seems doable.

But here's the problem: What about the first time the script runs? There would be no external file. So the code would have to be smart enough to check for the existence of the file, and create it if necessary. I didn't think this would be a hard problem, but its got me vexed. Can you take a look?

Here's my test code:

#!/usr/bin/perl use warnings; use strict; use Storable; my $HASHFILE='file'; package main; sub DoIHaveThisData { my ($myhash, $key) = @_; if (exists $myhash->{$key}) { return 1; } return -1; } unless(-e $HASHFILE){ printf "Hash file \'%s\' doesn't exist, creating...\n", $HASHF +ILE; my %emptyHash = (); # Serialize emptyHash, store it in $HASHFILE: store \%emptyHash, $HASHFILE; # If we reach here, there should now be an empty hash, seriali +zed in file $HASHFILE } # Deserialize the hash, now access it as $hashref: my $hashref = retrieve($HASHFILE); # Check to see if key2 is there: if(DoIHaveThisData($hashref, 'key2')){ printf "Hash has key2! (Value is: \'%s\')\n", $hashref->{'key +2'}; } # Add/Overwrite some data: $hashref->{'key1'} = 'data1'; $hashref->{'key2'} = 'data2'; $hashref->{'key3'} = 'data3'; # Save the hash for the next time this script runs: store $hashref, $HASHFILE; # END OF PROGRAM

This is pretty bulky, and I'm pretty sure there's got to be a smarter (or more compact) way to do all of the above. But there's a problem which concerns me:

Here's the command line output when I delete the external file, then run my script twice in a row:

me@ubuntu01$ rm file me@ubuntu01$ me@ubuntu01$ me@ubuntu01$ ./SerialHashTest.perl Hash file 'file' doesn't exist, creating... Use of uninitialized value in printf at ./SerHash2.perl line 33. Hash has key2! (Value is: '') me@ubuntu01$ me@ubuntu01$ me@ubuntu01$ ./SerialHashTest.perl Hash has key2! (Value is: 'data2') me@ubuntu01$ me@ubuntu01$

Okay, so on Iteration 1, the code realized that the external file was missing, and created it. So that's good. But I worry about this part of Iteration 1:

Use of uninitialized value in printf at ./SerHash2.perl line 33. Hash has key2! (Value is: '')

Line 33 is printf "Hash has key2!  (Value is: \'%s\')\n", $hashref->{'key2'}; in the if(DoIHaveThisData($hashref, 'key2')) block of code. DoIHaveThisData() believes that key2 is in the hash, when the hash has yet to be populated. So why does the function return a false positive? Is there a logic bug? Or is my hash populated with a key2 --> nothing key/value pair by default?

Any advice/criticism is welcome. I have a funny feeling that the design of my code is unnecessarily complicated, so if you have any design suggestions, I'll happily lap them up. Thanks in advance for your consideration.

Replies are listed 'Best First'.
Re: Use a Serialized Hash... When It Might Not Exist?
by choroba (Cardinal) on Nov 11, 2022 at 23:01 UTC
    -1 is a true value. Use 0, "", or undef for false.

    In fact, why not replace

    if(DoIHaveThisData($hashref, 'key2')){

    with

    if (exists $hashref->{key2}) {
    ?

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: Use a Serialized Hash... When It Might Not Exist?
by afoken (Chancellor) on Nov 12, 2022 at 20:50 UTC

    A very minor issue with Storable is that it depends on at least the perl version, maybe also on compile-time settings. This may cause trouble when you update perl.

    Other serialization formats do not have that specific problem.

    Using JSON would make your serialized hash human readable (at least when adding some whitespace).

    YAML and Data::Dumper output may contain executable code, reading them back may cause security problems.

    Sereal promises to be fast and compact, but it is a binary format like Storable.

    XML is bloat, can't store 0x00, *and* has security issues (Billion laughs attack).


    If your hash is as simple as shown (no references, just a key-value-store), also consider SQLite with a simple two-column table. (See DBI and DBD::SQLite). As a minor and probably welcome side-effect, the database file will automatically be created at first use.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Use a Serialized Hash... When It Might Not Exist?
by kcott (Archbishop) on Nov 12, 2022 at 02:33 UTC

    G'day redapplesonly,

    "I am working on a Perl (v5.30.0) script that needs to use stateful information from the previous run of the script."

    All of your steps, and other design notes, seem spot-on to me. :-)

    "But here's the problem: What about the first time the script runs?"

    That's a very valid point. I don't use Storable very often, but when I do I encounter the same issue. I usually handle this by dealing with it before runtime execution starts in an INIT block.

    Here's a rough example of how I might have written your code (pm_11148149_storable_first_run_0.pl):

    #!/usr/bin/env perl use 5.030; use warnings; use Storable; my ($hashfile, $serial_data_for); INIT { $hashfile = 'pm_11148149_storable_first_run.sto'; $serial_data_for = -e $hashfile ? retrieve($hashfile) : {}; } my @keys = qw{key1 key2 key3}; my @vals = qw{data1 data2 data3}; $serial_data_for->@{@keys} = @vals; store($serial_data_for, $hashfile);

    I added some extra statements for reporting/demo purposes (pm_11148149_storable_first_run_1.pl):

    #!/usr/bin/env perl use 5.030; use warnings; use Storable; use Data::Dump; warn "Perl version: $^V\n"; my ($hashfile, $serial_data_for); INIT { $hashfile = 'pm_11148149_storable_first_run.sto'; $serial_data_for = -e $hashfile ? retrieve($hashfile) : {}; } say 'Serialised data:'; dd $serial_data_for; my @keys = qw{key1 key2 key3}; my @vals = qw{data1 data2 data3}; $serial_data_for->@{@keys} = @vals; store($serial_data_for, $hashfile);

    Here's some equivalent output to your sample run:

    ken@titan ~/tmp $ rm pm_11148149_storable_first_run.sto ken@titan ~/tmp $ ls -l pm_11148149_storable_first_run.sto ls: cannot access 'pm_11148149_storable_first_run.sto': No such file o +r directory ken@titan ~/tmp $ ./pm_11148149_storable_first_run_1.pl Perl version: v5.30.0 Serialised data: {} ken@titan ~/tmp $ ls -l pm_11148149_storable_first_run.sto -rw-r--r-- 1 ken None 69 Nov 12 13:00 pm_11148149_storable_first_run.s +to ken@titan ~/tmp $ ./pm_11148149_storable_first_run_1.pl Perl version: v5.30.0 Serialised data: { key1 => "data1", key2 => "data2", key3 => "data3" } ken@titan ~/tmp $

    Notes:

    • I concur with ++choroba's suggestion to use the builtin exists() function directly, rather than rolling your own DoIHaveThisData(). While abstracting functions into subroutines for reuse is often a very good idea, I don't see any benefit in this instance. If you did consider it to be necessary, I'd keep it very simple along the lines of:
      sub DoIHaveThisData { my ($hash, $key) = @_; return exists $hash->{$key}; }
    • You may need to look at "perlref: Postfix Reference Slicing", and perhaps even "perldata: Slices", if you're unfamiliar with how I've populated the hashref.
    • A .sto extension, to identify Storable files, is widely used. It's not a requirement; it might qualify as a convention.

    — Ken

Re: Use a Serialized Hash... When It Might Not Exist?
by tybalt89 (Monsignor) on Nov 12, 2022 at 05:40 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11148149 use warnings; use Storable; my $filename = '/tmp/d.11148149.testfile'; my $hashref = eval { retrieve $filename } || {}; use Data::Dump 'dd'; dd $hashref; # was $hashref->{+time} = 1; # do something with $hashref... use Data::Dump 'dd'; dd $hashref; # is now store $hashref, $filename;
Re: Use a Serialized Hash... When It Might Not Exist?
by karlgoethebier (Abbot) on Nov 14, 2022 at 11:36 UTC

    A while ago I wrote this about serializing a hash. Probably it is still helpful. Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11148149]
Approved by choroba
Front-paged by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2024-06-21 16:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.