Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Optimization Question (Data Serialization)

by jonadab (Parson)
on May 15, 2016 at 23:21 UTC ( #1163100=perlquestion: print w/replies, xml ) Need Help??

jonadab has asked for the wisdom of the Perl Monks concerning the following question:

I originally developed this serialization/deserialization code for small configuration files. An important goal was to be able to save and restore arbitrary nested hash/array data structures, without updating the save/restore code every time I added new config data. It worked fine.

But now I'm working on a roguelike game, which needs to be able to save and restore much more substantial quantities of information. Performance instantly becomes a problem.

The save is a little slow; I could work around that probably, by storing each level in a separate file and updating it only when it changes. (Assuming nothing happens on levels where the player is not, this would mean only one level's worth of data would be saved most turns, two levels' worth on turns where the player changes levels.)

Restore, however, is unacceptably slow. It's so slow, I don't actually know how long it takes, because I've never let it run to completion.

Here's my code, such as it is:

sub readconfigfile { my ($cfgkey, $file) = @_; my $path = catfile($config{configfiles}{directory}, $file); if (-e $path) { open CFG, "<", $path or return nonfatalerror("Unable to read confi +g $path: $!"); #print " Opened config file $file\n"; my $slurp = join "", <CFG>; close CFG; $config{cfgkey}{fromfile} = $path; readconfigtext($config{$cfgkey}, $slurp); } } sub readconfigscalar { my ($text) = @_; return if not $text; my %closer = ("[" => "]", "{" => "}", '"' => '"', "'" => "'"); if ($text =~ m/^\s*([[{"'])/) { my ($type) = ($1); $text =~ s/^\s*([[{"'])//; if ($type eq "[") { return readconfiglist($text, $closer{$type}); } elsif ($type eq "{") { my ($value, $remainingtext) = readconfiglist($text, $closer{$typ +e}); return (+{ @$value }, $remainingtext); } else { return readconfigstring($text, $closer{$type}); } } else { my ($line) = $text =~ m/^(.*?)$/; nonfatalerror("Failed to parse config scalar: $line"); } } sub readconfiglist { my ($text, $closer) = @_; my @list; while ($text and not $text =~ m/^\s*[$closer]/) { if ($text =~ m/^\s*(?:[,]|[=][>])\s*/) { push @list, undef; } else { my ($value, $rest) = readconfigscalar($text); push @list, $value; $text = $rest; } $text =~ s/^\s*(?:[,]|[=][>])\s*//; } $text =~ s/^\s*[$closer]\s*//; return (\@list, $text); } sub readconfigstring { my ($text, $closer) = @_; my $string = ""; while ($text and not $text =~ m/^[$closer]/) { my ($substring) = $text =~ m/^([^"'\\])/; $string .= $substring; $text =~ s/^([^"'\\])//; if ($text =~ m/^[\\]([\\"'])/) { my ($escaped) = $1; $string .= $escaped; $text =~ s/^[\\]([\\"'])//; } } $text =~ s/^[$closer]//; return ($string, $text); } sub readconfigtext { my ($cfghash, $text) = @_; $text =~ s/^\s*(?:[#].*|)(?:$)\s*//; return if not $text; print " <" . length($text) . ">\n"; my %closer = ("[" => "]", "{" => "}", '"' => '"', "'" => "'"); if ($text =~ m/^\s*(\w+)\s*[=][>]?\s*([[{"'])/) { my ($key, $type) = ($1, $2); $text =~ s/^\s*(\w+)\s*[=][>]?\s*([[{"'])//; if ($type eq "[") { my ($value, $remainingtext) = readconfiglist($text, $closer{$typ +e}); if ($value) { $$cfghash{$key} = $value; readconfigtext($cfghash, $remainingtext); } } elsif ($type eq "{") { my ($value, $remainingtext) = readconfiglist($text, $closer{$typ +e}); if ($value) { $$cfghash{$key} = +{ %{$$cfghash{$key}}, @$value }; readconfigtext($cfghash, $remainingtext); } } else { my ($value, $remainingtext) = readconfigstring($text, $closer{$t +ype}); if ($value) { $$cfghash{$key} = $value; readconfigtext($cfghash, $remainingtext); } } } else { my ($line) = $text =~ m/^(.*?)$/; nonfatalerror("Failed to parse config text: $line"); } } sub cfgscalartostring { my ($scalar, $indentlevel) = @_; $indentlevel ||= ""; if ((ref $scalar) eq "ARRAY") { return "[" . (join ", ", map { cfgscalartostring($_, $indentlevel +. " ") } @$scalar) . "]\n$indentlevel"; } elsif ((ref $scalar) eq "HASH") { return "{" . (join ", ", map { my $k = $_; cfgscalartostring($k) . + " => " . cfgscalartostring($$scalar{$k}, $indentlevel . " ") } key +s %$scalar) . "}\n$indentlevel"; } elsif (not defined $scalar) { return ""; } elsif (not ref $scalar) { my $string = "" . $scalar; $string =~ s/[\\]/\\\\/; $string =~ s/(['"])/\\$1/; return '"' . $string . '"'; } } sub writeconfigfile { my ($cfgkey, $file) = @_; my $path = catfile($config{configfiles}{directory}, $file); open CFG, ">", $path or return nonfatalerror("Unable to write config + file $path: $!"); my $cfghash = $config{$cfgkey}; for my $key (keys %$cfghash) { print CFG "$key = " . cfgscalartostring($$cfghash{$key}) . "\n"; } close CFG; }

An example of a working program that uses this code for configuration purposes is available (for a limited time) on my scratchpad (a few Unicode characters seem to have got mangled to HTML entities, sorry) or here (no such caveat). Yes, it's a Tetris game. But it demonstrates how the code works.

I've thought about trying to rip out all the regex based parsing and replace it with substr() and such, but I don't know how much that will actually help. I also thought about trying to find a suitable serialization module on the CPAN that I can just use, but I would need something that preserves nested data structures containing both hashes and arrays intact. (I don't need to preserve the distinction between numbers and strings. If 1 becomes "1" that's fine with me. I go out of my way to avoid writing code that cares about the difference.)

It's crucial that I don't need to change the save/restore code every time I add some more data somewhere, because I expect to be doing a lot of that as this roguelike game develops. That code, in its current highly preliminary and not-entirely-working state, can be seen here. (If you comment out the save/restore stuff, it actually kind of sort of works, as far as it goes, though barely any features are implemented yet, and ingestion doesn't currently work due to an unrelated bug.)

I'd kind of like to avoid using Data::Dumper if possible, on the grounds that allowing configuration files to run arbitrary code gives me the willies. (If the game ever has, say, a public server, it'd be nice to let users copy/paste some of their config files from their local systems, such as the one for key bindings...)

How should I go about optimizing this (it needs to improve by at least a couple of orders of magnitude), or what should I replace it with?

Replies are listed 'Best First'.
Re: Optimization Question (Data Serialization)
by stevieb (Canon) on May 16, 2016 at 00:02 UTC

    JSON or Storable?

    If you don't have complex circular references or other oddities, both should work (JSON is cross-language, where Storable isn't).

      Hmm. I looked at YAML briefly, but it didn't seem to quite do what I needed, unless I missed something. I thought JSON was basically a more Javascript-oriented variant of YAML, so I didn't consider it. I'll look at that.

      Update: Yes, I think JSON will indeed do what I need. Thanks!

        YAML is essentially a superset of JSON. The default formatting of YAML is considered more readable than JSON (at least by me). Either protocol turns out to be somewhat less easy to hand edit accurately than it first glance it seems it should be. That may not be an issue, but readability will make it easier to debug configuration issues at least.

        Premature optimization is the root of all job security
Re: Optimization Question (Data Serialization)
by Discipulus (Abbot) on May 16, 2016 at 06:51 UTC
    I have no time for the moment to review your code and probably i would catch no issue, but if you are interested in performance, gives a try to Sereal a relatively new serialization module with impressive performances.

    You can also profit the read of the thread where afoken describes Data::Dumper JSON Storable YAML differences


    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1163100]
Approved by Marshall
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2023-01-27 07:45 GMT
Find Nodes?
    Voting Booth?

    No recent polls found