Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

Data::Dumper Efficiency Problem

by madhatter (Sexton)
on Jan 03, 2001 at 23:54 UTC ( #49602=perlquestion: print w/replies, xml ) Need Help??

madhatter has asked for the wisdom of the Perl Monks concerning the following question:

I'm using variable persistence with Data::Dumper. Indentation is set to 0. What is up with this? Why does it's CPU/RAM intensity increase exponentially when the it parses two times the logs? I could understand a good several second increase, but this is rediculous. Is there something I'm missing?

Here is the run:

C:\Perl>perl 0.0400 Data Loaded 0.4200 Lines Parsed: 10347 0.4210 Data Dumped 0.8810 TOTAL ; Finished Execution C:\Perl>perl 0.0400 Data Loaded 1.7620 Lines Parsed: 20337 50.5330 Data Dumped 52.3350 TOTAL ; Finished Execution

.. and the relevant code:

use Data::Dumper; $Data::Dumper::Indent = 0; $encoding = Data::Dumper->Dump( [\%DATA ], [qw(*DATA)]);
%DATA is a hash of hashes of hashes of strings and integers, basically. (heh)

If nothing can be done about this, what sort of (more efficient) alternatives are there? Preferably as simple as dumping to a file and a

do "data.dat";


Replies are listed 'Best First'.
(tye)Re: Data::Dumper Efficiency Problem
by tye (Sage) on Jan 04, 2001 at 00:03 UTC

    My partially wild guess is that Data::Dumper's stuffing everything into one big string causes lots of realloc()s which can't be done in-place due to Perl malloc()ing things in between so the growing string is repeatedly copied around to new places where there is enough space to hold it all in one piece.

    The correct solution is for Data::Dumper to be fixed to know how to write to a Perl file handle!

            - tye (but my friends call me "Tye")
Re (tilly) 1: Data::Dumper Efficiency Problem
by tilly (Archbishop) on Jan 04, 2001 at 00:08 UTC
Re: Data::Dumper Efficiency Problem
by Trinary (Pilgrim) on Jan 04, 2001 at 00:12 UTC
    I used to swear by Data::Dumper, but I don't anymore. I'm not sure about the internals, but I have to say that recently I've really come to be frustrated by it. Doing performance analysis under Win32 (Win32::PerfLib, not in CPAN), I end up dumping large hash structures all the time. When I was doing research into the format of these things I tried to get a dump of one base level object (System, for those who care).

    It ended up running out of memory, swap...everything. Wouldn't finish running, it was using well over 200M of memory. I have since then written my own (somewhat dumb) replacement, took an hour or two, and suggest either following suit or searching around here for something that has enough functionality for what you need and is simpler than Data::Dumper. If there's interest, I'll post my lil snippet, but it's basically trivial.


      Please do post! I'm very interested in this.
        Ask, and ye shall recieve: This is just a sub, pretty basic actually and probably broken in a couple ways. takes a ref as argument, and starts-a-printin. Haven't done any performance testing vs. Data::Dumper.

        Begin code

        sub dumpref { my $testref = shift; my $levels = shift; if (ref($testref) eq 'HASH') { print "{\n"; $levels++; my $maxlevel = scalar(keys %$testref); my $curlevel = 0; foreach my $key (keys %$testref) { $curlevel++; print " " x $levels; print $key; print " => "; my $val = $testref->{$key}; if (ref($val)) { &dumpref($val,$levels); } else { $val =~ s#\\#\\\\#; $val =~ s#'#\\'#; print "'$val'"; } print "," if $curlevel < $maxlevel; print "\n"; } print " " x ($levels - 1) . "}"; } elsif (ref($testref) eq 'ARRAY') { print "[\n"; $levels++; my $maxlevel = scalar(@$testref); foreach my $val (@$testref) { $curlevel++; print " " x $levels; if (ref($val)) { &dumpref($val,$levels); print " " x ($levels - 1); } else { $val =~ s#\\#\\\\#; $val =~ s#'#\\'#; print "'$val'"; } print "," if $curlevel < $maxlevel; print "\n"; } print " " x ($levels - 1) . "]"; } else { print ref($testref); print "\n"; } }

        End Code

        Use at your own risk, but it handles basic stuff ok, I think. =b


Re: Data::Dumper Efficiency Problem
by repson (Chaplain) on Jan 04, 2001 at 06:34 UTC
    Another method depending on data is XML::Simple.
    XMLout can take a filename or filehandle which may reduce memory used during running by immediate output instead of storing (I don't know if it does). XMLin is supposed to always create the original data structure...

    It does allow buzzword compliance, and a structure parseable without needing perl.

    As to your original question, Data::Dumper may be creating self referential output, this means that is has to remember and constantly process everything that has already passed through it. Read the module docs to find out if this may be happening and what you should do about it (call $OBJ->Reset under the OO interface possibly, depending on how you are doing things).

      This is not the way XML::Simple works. XML::Simple is designed to let you input an XML file (with some restrictions) and use the data it contains or update it and output it back. Altough I have never tried it I would bet it will not output arbitrary data structures as XML (although that might be fun!).

      On the other hand XML::Dumper and Data::DumpXML will dump data to XML. I have no idea how fast they are though (and considering Data::DumpXML is also written by Gisle AAs I don't think it will be faster than Data::Dumper).

        Directly from the XML::Simple docs:


        Takes a data structure (generally a hashref) and returns an XML encoding of that structure. If the resulting XML is parsed using XMLin(), it will return a data structure equivalent to the original.

        That sounds similar to what Data::Dumper is being used for here.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://49602]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (2)
As of 2022-05-28 03:46 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (98 votes). Check out past polls.