Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

In need of a Dumper that has no pretentions to being anything else.

by BrowserUk (Pope)
on Feb 23, 2005 at 01:23 UTC ( #433537=perlquestion: print w/ replies, xml ) Need Help??
BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

Does anyone know of a module/function which will dump a perl data structure in a compact, human readable format that doesn't aspire to being a serialisation module?

  1. Dumps what's in the structure without attempting to analyse it.

    I don't care if the array in two or more nested arrays are references to the same array. I just want to see what's in there.

    And I have no use for a module that takes up 20 times as much memory to dump the contents of a structure than the structure itself takes up.

  2. Dumps the contents in a compact, but "structured" manner (read indented).
  3. Produces sensibly compact output that looks roughly the way I might key it in--without trying to make the output evalable.

    Keeps nested structures on one line if it is practical. Preferably with a setable wrap limit that I can set to 1000 or 2000 if I wish--not hard coded to wrap at 72!

    ie. somthing like:

    { a => [ 1, 2, 3 ], b => [ { X => 1, Y=> 2 } { X => [ 1, 2, 3], Y=> [ 4, 5, 6 ], Z => [ 7, 8, 9 ] }, c => 'fred', }

    Not YAML!

    Preferably a function rather than an object.


Examine what is said, not who speaks.
Silence betokens consent.
Love the truth but pardon error.

Comment on In need of a Dumper that has no pretentions to being anything else.
Download Code
Re: In need of a Dumper that has no pretentions to being anything else.
by fergal (Chaplain) on Feb 23, 2005 at 01:58 UTC
    You can set $Data::Dumper::Deepcopy = 1 and Data::Dumper will dump the full contents even if they are just a copy of some other part of the structure already dumped. The only exception is if a structure contains a reference to itself, then it has to show a variable, otherwise it would loop forever.

    I wouldn't worry about the memory use of Data::Dumper.

      if a structure contains a reference to itself ... otherwise it would loop forever.

      That's my problem, if I give it a self-referential structure--I won't.

      I wouldn't worry about the memory use of Data::Dumper.

      Um. Er. But... it keeps crashing my program by exhausting all the memory! But that's okay fergal says: "Don't worry about it!".

      I assume that means you don't know a proper dumper module?


      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.
        That sounds like a serious bug in DD. In order to catch circularity it only needs to keep a 4 byte hash key and a short string for every reference in the structure. Unless your structure is full of almost empty arrays, hashes and scalar refs, this should take less memory than your structure.

        Update: Just dug in DD and I see it's storing a 2 element arrayref for each ref it find. That's still not very much and unless you have an unusual structure, it should be negligible. The only other thing is that it stores a copy of the hash key in that 2 element array. If your keys are very big then that could be a problem however you'd still only be at most doubling things.

        Update again It was a DD bug, see below

Re: In need of a Dumper that has no pretentions to being anything else.
by ysth (Canon) on Feb 23, 2005 at 03:37 UTC
    What you want sounds fairly trivial, but with lots of room for variation (sort keys or not, details of indenting, etc.)

    Why not just write it yourself?

      Well, I was in the middle of doing something else at the time and thought that I might avoid having to do that.

      For the record, I now am, but it was worth asking wasn't it?

      Especially if it also leads to a bug in Data::Dumper being fixed too.


      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.
Re: In need of a Dumper that has no pretentions to being anything else.
by halley (Prior) on Feb 23, 2005 at 04:42 UTC
    Remember, many people do like Data::Dumper because it does presume to be a serialization standard, albeit in text form. It is mission critical for a lot of small glue applications out there.

    If I read fergal's patch right, then you'd be removing the circular reference protection for everyone who turns off Deepcopy. Just because BrowserUk doesn't break up his data for analysis doesn't mean everyone should have a new behavior here. DD should still protect the iterator from cycling around forever, even if it doesn't produce reference syntax in its output.

    I implemented a fourth format style to Data::Dumper once, which tries to fit whole arrays on one line if possible, or word-wrap the elements of arrays on a minimum number of lines if the elements are non-references.

    But two bad things happened: (1) I lost that patch at some point, and (2) the stock DD is now not written in Perl but in native code. Changing the Perl version is a waste of effort.

    I think the answer here is not to wade through 500_000 elements written in Perl syntax, looking for a programmatic error on your part. Instead, formulate some theories as to the fault and analyze the structure with a couple of lines of Perl, or find a smaller dataset which exhibits the fault.

    Also, turn on Terse, turn off Purity, and consider overloading the string-izing operator for certain objects so you don't get so much bless(do{...},'Math::Pari') noise.

    --
    [ e d @ h a l l e y . c c ]

      Please note that I didn't pass any judgement upon Data::Dumper or any other existing module--I just simply asked if there was an alternative that fit my needs.

      And tried to head off the offers of the "other" modules that I have already looked at, that also do not fit my needs.

      Just because BrowserUk doesn't break up his data for analysis...

      I have a requirement. I asked if anyone knew of a module that fit those requirements. You have no concept of what I am doing, or why I want this format.

      For the record, I am not "looking for a programmatic error". I am looking for patterns in the data.

      You'd be surprised at how the human eye and brain can detect patterns in data, even if the text is displayed in a very small (unreadable) font, and the data is scrolling quite fast. Once I perceive a pattern of some kind, I can then increase the font size and view the boundaries and repeat points of the pattern, so that I can then write code to pick out those boundaries and then plot the data graphically for further analysis.

      For an example of something similar, download and run the code I posted at Re: Testing for randomness (spectral testing). Try uncommenting the constant line flagged as ## Really BAD!! and then allow the program to cycles for 2 or 3 minutes and see how it immediately shows up just how bad a Linear Congruential PRNG is with badly chosen values.

      Then for contrast, install Math::Random::MT and run that for a few minutes (or hours or days) and see that the spectral test generated remains almost totally even, with just enough evenly distributed variation to show true randomness.

      The whole point of the exercise, is to view a large volumes of data, together, in a consistent and repetative format, so that any patterns become obvious. So please, keep your supercilious and judgemental comments in your closet where they belong.


      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.
        (1) I'm not going to dig any more into your task; I could not care less what the task is. But if you happen to be developing a PRNG and you do find any patterns, then it IS a programming error. A common one, an important one, but a flaw nonetheless.

        (2) I downvoted you for your last sentence. I think your whole thread smacks of being frustrated, and you're letting it get the best of you. Think rationally, not emotionally, and you wouldn't be in this situation. This is a forum, and you brought your frustration to the forum. Don't tell us to shut up. I don't care how you vote my response, but keep the discourse civil, thanks.

        --
        [ e d @ h a l l e y . c c ]

      My patch doesn't remove circular ref protection. Circular ref protection comes from keeping a track of all the refs we've seen so far and I've not touched that.

      DD also keeps track of all scalars that it sees so that when you run this

      use Data::Dumper; my %s=('key',1); my $s=\$s{'key'}; print Dumper([\%s, $s])
      you get
      $VAR1 = [ { 'key' => 1 }, \$VAR1->[0]{'key'} ];
      that is, it was able to spot that the 1 in %s{key} is actually the same 1 that's in $s.

      My patch removes this ability when $Deepcopy = 0. This is consistent with the rest of the deepcopy behaviour and means that now that the only thing that can cause a backreference to some other part of the structure is circularity.

        Fair enough-- the patch doesn't give a lot of context and I didn't look at the whole. It's squirrelly code. I was just putting in a note that any quick change could break a lot of people.

        --
        [ e d @ h a l l e y . c c ]

        Note that DD only gets it right with print Dumper([\%s, $s]) but not with print Dumper([$s, \%s])

        For that you need a better dumper. I wonder if its arguable that the tracking you are talking about disabling is actually not required for Purity=0 dumps.

        ---
        demerphq

Re: In need of a Dumper that has no pretentions to being anything else.
by Aristotle (Chancellor) on Feb 23, 2005 at 09:38 UTC

    Dumpvalue mimics the debugger's dump format and is in core. Maybe that is closer to your needs?

    Makeshifts last the longest.

      Thankyou Aristotle.

      In it's veryCompact form the output is exactly what I was looking for:

      P:\test>perl -MDumpvalue -e" $d=new Dumpvalue; $h{ $_ } = [ map{ $_ & 1 ? { 'a'..'z' } : [ 1..100 ] } 1 .. 10 ] for 'a' .. 'j'; $d->veryCompact(1); $d->dumpValue( \%h )" 'a' => ARRAY(0x1871700) 0 'a' => 'b', 'c' => 'd', 'e' => 'f', 'g' => 'h', 'i' => 1 0..99 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2 'a' => 'b', 'c' => 'd', 'e' => 'f', 'g' => 'h', 'i' => 3 0..99 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 4 'a' => 'b', 'c' => 'd', 'e' => 'f', 'g' => 'h', 'i' => 5 0..99 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 6 'a' => 'b', 'c' => 'd', 'e' => 'f', 'g' => 'h', 'i' => 7 0..99 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 8 'a' => 'b', 'c' => 'd', 'e' => 'f', 'g' => 'h', 'i' => 9 0..99 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 'b' => ARRAY(0x18757e4) 0 'a' => 'b', 'c' => 'd', 'e' => 'f', 'g' => 'h', 'i' => 1 0..99 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2 'a' => 'b', 'c' => 'd', 'e' => 'f', 'g' => 'h', 'i' => 3 0..99 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 4 'a' => 'b', 'c' => 'd', 'e' => 'f', 'g' => 'h', 'i' => 5 0..99 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 6 'a' => 'b', 'c' => 'd', 'e' => 'f', 'g' => 'h', 'i' => 7 0..99 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 8 'a' => 'b', 'c' => 'd', 'e' => 'f', 'g' => 'h', 'i' => 9 0..99 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 'c' => ARRAY(0x18794fc) 0 'a' => 'b', 'c' => 'd', 'e' => 'f', 'g' => 'h', 'i' => 1 0..99 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

      And it even has an option to see through overloading and handle what comes out -- at least part way.

      P:\test>p1 perl> use Math::Pari qw[ :int factorint sqrtint divisors PARI ];; perl> $f = factorint 1000000;; perl> use Dumpvalue;; perl> $d = new Dumpvalue;; perl> $d->veryCompact( 1 );; perl> $d->set( bareStringify => 1 );; perl> { local $\; $d->dumpValue( $f ) };; 0 Math::Pari=ARRAY(0x1a3702c) 0 Math::Pari=SCALAR(0x1a3c9ec) -> 33884400 1 Math::Pari=SCALAR(0x1a3ec7c) -> 33884376 1 Math::Pari=ARRAY(0x1a36f54) 0 Math::Pari=SCALAR(0x1a3ec70) -> 33884388 1 Math::Pari=SCALAR(0x1a3ec94) -> 33884364 perl> Terminating on signal SIGINT(2)

      All it needs is to not de-overload the final SCALAR values so that Math::Pari will return the numbers, which as it's in Perl, I can fix.

      It also does circularity testing:

      perl> $r = \$r;; perl> $d->dumpValue( $r );; -> REF(0x22ac9c) -> REUSED_ADDRESS

      But dumping my testcase above, it uses less than 50% extra memory--a considerable saving over the 250% of Data::Dumper. Though I now realise that a large proportion of the extra memory is DD consumes is used by building the output in memory rather than dumping straight the select'd output handle.

      I think I can see how to reduce that further still--though it may slow it down a little.

      And it's been sat there on my machine the whole time! It's a bit embarrassing that I've never noticed it, but I don't ever recall it being mentioned.

      Not only did someone else see the need for what I was asking for, they wrote it, covered all the bases and dropped it on my machine without telling me:)

      Once again, thanks Aristotle.


      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.

        My pleasure. :-)

        I only recently found it myself; I was idly flipping through the list of Template Toolkit plugin distributions on CPAN and stumbled over Template::Plugin::Dumpvalue, which made me aware of the module I'd had sitting right here, all the time.

        I don't know why I'd never heard of it before from anyone else either. It's pretty damn useful as a debugging aid.

        Makeshifts last the longest.

Re: In need of a Dumper that has no pretentions to being anything else.
by grinder (Bishop) on Feb 23, 2005 at 10:47 UTC
    dump a perl data structure in a compact, human readable format

    I wrote the following pair of functions for debugging Regexp::Assemble. It only deals with hashes, arrays and undef, but it sounds like that that's all you need.

    The display is about as minimal and compact as I can get it.

    sub _dump { my $path = shift; return _dump_node( $path ) if ref($path) eq 'HASH'; my $dump = '['; my $d; my $nr = 0; for $d( @$path ) { $dump .= ' ' if $nr++; if( ref($d) eq 'HASH' ) { $dump .= _dump_node($d); } elsif( ref($d) eq 'ARRAY' ) { $dump .= _dump($d); } elsif( defined $d ) { $dump .= ( ($d =~ /\s/ or not length $d) ? qq{'$d'} : $d ); } else { $dump .= '*'; } } $dump . ']'; } sub _dump_node { my $node = shift; my $dump = '{'; my $nr = 0; my $n; for $n (sort keys %$node) { $dump .= ' ' if $nr++; if( $n eq '' and not defined $node->{$n} ) { $dump .= '*'; } else { $dump .= "$n=>" . ( ref($node->{$n}) eq 'ARRAY' ? _dump($node->{$n}) : $node->{$n} ); } } $dump . '}'; }

    I wasn't able to make complete sense of your data structure; I suspect you just invented it off the top of your head. Nevertheless:

    my $ref = { a => [ 1, 2, 3 ], b => [ { X => 1, Y=> 2 }, { X => [ 1, 2, 3 ], Y => [ 4, 5, 6 ], Z => [ 7, 8, 9 ] }, ], }; print _dump($ref);

    produces

    {a=>[1 2 3] b=>[{X=>1 Y=>2} {X=>[1 2 3] Y=>[4 5 6] Z=>[7 8 9]}]}

    - another intruder with the mooring in the heart of the Perl

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://433537]
Approved by kvale
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (12)
As of 2014-07-30 18:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (239 votes), past polls