http://www.perlmonks.org?node_id=433547


in reply to Re: In need of a Dumper that has no pretentions to being anything else.
in thread In need of a Dumper that has no pretentions to being anything else.

This node falls below the community's threshold of quality. You may see it by logging in.
  • Comment on Re^2: In need of a Dumper that has no pretentions to being anything else.

Replies are listed 'Best First'.
Re^3: In need of a Dumper that has no pretentions to being anything else.
by fergal (Chaplain) on Feb 23, 2005 at 02:19 UTC
    That sounds like a serious bug in DD. In order to catch circularity it only needs to keep a 4 byte hash key and a short string for every reference in the structure. Unless your structure is full of almost empty arrays, hashes and scalar refs, this should take less memory than your structure.

    Update: Just dug in DD and I see it's storing a 2 element arrayref for each ref it find. That's still not very much and unless you have an unusual structure, it should be negligible. The only other thing is that it stores a copy of the hash key in that 2 element array. If your keys are very big then that could be a problem however you'd still only be at most doubling things.

    Update again It was a DD bug, see below

      Okay. Try this:

      #! perl -slw use strict; use Data::Dumper; my %h; $h{ $_ } = [ 1 .. 10 ] for 'aaaa' .. 'zzzz'; print Dumper \%h;

      Add whatever Dumper options you like. Prior to the Dump, this hash with somewhat under 500,000 keys and a smallish array for each value consumes ~ 177 MB of ram.

      Attempting to dump it pushes that memory consumption (transiently on Win32) to well over 700 800 MB (and still going and consumption still climbing after 1/2 3/4 hour!).

      My real hash has close to a million keys and nested arrays. It consumes over 500 MB to start with. Trying to dump it blows 2GB of virtual memory before it crashes Perl--and the time taken even before swapping starts is measured in the half-lifes of Plutonium. I'd like to avoid both. I just need to be able to dump the structure to a file. Preferably in a reasonably compact format.


      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.
        Try the following patch to Data/Dumper.pm . Also, turn off Deepcopy as in my first post. It should make a huge difference. I'll file a bug and submit the patch.
        --- ./ok/Data/Dumper.pm.orig 2005-02-22 20:17:13.000000000 -0800 +++ ./ok/Data/Dumper.pm 2005-02-22 20:16:47.000000000 -0800 @@ -405,7 +405,7 @@ my $ref = \$_[1]; # first, catalog the scalar - if ($name ne '') { + if ($s->{deepcopy} and ($name ne '')) { ($id) = ("$ref" =~ /\(([^\(]*)\)$/); if (exists $s->{seen}{$id}) { if ($s->{seen}{$id}[2]) {
        Updated And do $Data::Dumper::Useperl = 1;
Re^3: In need of a Dumper that has no pretentions to being anything else.
by merlyn (Sage) on Feb 23, 2005 at 02:14 UTC
      but your request is self-conflicting, and you haven't said exactly what you don't like about Data::Dumper.

      Actually, I thought I had spelt out what I was looking for pretty carefully.

      I am dealing with a huge (> 500 MB) heavily nested data structure consisting of lots of small hashes and arrays. Data::Dumper

    • consumes huge amounts of memory (pushing my machine into swaping) checking for circular references when I know there will be none.
    • Either dumps everything one element per line indented, or totally flattened without any structure.
    • Produces this
      use Math::Pari qw[ :int factorint sqrtint divisors PARI ]; $f = factorint 1000000; print Dumper $f; $VAR1 = bless( [ bless( [ bless( do{\(my $o = 33884400)}, 'Math::Pari' + ), bless( do{\(my $o = 33884376)}, 'Math::Pari' + ) ], 'Math::Pari' ), bless( [ bless( do{\(my $o = 33884388)}, 'Math::Pari' + ), bless( do{\(my $o = 33884364)}, 'Math::Pari' + ) ], 'Math::Pari' ) ], 'Math::Pari' ); Attempt to free unreferenced scalar: SV 0x19f41f4 at c:\Perl\bin\p1.pl + line 14, <STDIN> line 3. Attempt to free unreferenced scalar: SV 0x19f42cc at c:\Perl\bin\p1.pl + line 14, <STDIN> line 3.

      Or this

      $Data::Dumper::deepcopy=1; print Dumper $f; $VAR1 = bless( [ bless( [ bless( do{\(my $o = 33884400)}, 'Math::Pari' + ), bless( do{\(my $o = 33884376)}, 'Math::Pari' + ) ], 'Math::Pari' ), bless( [ bless( do{\(my $o = 33884388)}, 'Math::Pari' + ), bless( do{\(my $o = 33884364)}, 'Math::Pari' + ) ], 'Math::Pari' ) ], 'Math::Pari' ); Attempt to free unreferenced scalar: SV 0x19f43bc at c:\Perl\bin\p1.pl + line 14, <STDIN> line 5.

      Or this

      $Data::Dumper::Indent=0; print Dumper $f; $VAR1 = bless( [bless( [bless( do{\(my $o = 33884400)}, 'Math::Pari' ) +,bless( do{\(my $o = 33884376)}, 'Math::Pari' )], 'Math::Pari' ),bles +s( [bless( do{\(my $o = 33884388)}, 'Math::Pari' ),bless( do{\(my $o += 33884364)}, 'Math::Pari' )], 'Math::Pari' )], 'Math::Pari' ); Attempt to free unreferenced scalar: SV 0x19f37c8 at c:\Perl\bin\p1.pl + line 14, <STDIN> line 7.

      When what I want is something more akin to this:

      print "[@$_]" for @$f; [2 5] [6 6]

      Or this

      print "[@{[ join', ', map{ \"[@$_]\" } @$f ]}]"; [[2 5], [6 6]]

      Except that there are thousands of arrays at varying depths of nesting.

      I can write one myself, perhaps based around Data::Rmap or similar, but I thought I look and see if there is an existing one available. My search didn't turn up anything promising, but it seems a reasonably simple enough requirement that someone might know or have one already written?


      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.
        Apart from the "Attempt to free unreferenced scalar" stuff (which seems like a bug) what's wrong with this? Math::Pari objects are just a simple scalar blessed into a class as far as Perl is concerned. Nothing is going to dump them out as anything more sensible unless it specififcally knows how to understand Math::Pari objects.

        You'll need to play with DD's Freezer stuff to get anywhere on that.