http://www.perlmonks.org?node_id=779673

BioLion has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I wanted to get your opinions before committing to anything, so here goes:

Having read around (Burned by Storable, Re: Caching or using values across multiple programs, How to duplicate objects?, Serializing data ,Data::Serialize, Data::Dumper, YAML, ), I am a bit put off by the frequency of people having bad times with Storable.
Especially with serializing code refs and the need to use Safe compartments for inflating code refs.

So my question is this - what is the (IYHO) currently preferred method for (recursively) storing objects?

By way of warning, I want to apply this method to a very large object that is a container for ~50-100,000 smaller objects/code refs, that in turn may contain further objects... Much calculation goes into building it up, and i would like to able to cache it at key stages, so that in future i can choose whether to bypass building it from scratch each time.

Thus, if i could also test for the existence of a stored object at each checkpoint too that would be a huge bonus for me:

if ( store_exists('store_file') ){ ...inflate it... } else { ...make the object from scratch... }

Thanks in advance,

Update edited my ramble a bit

Update 2 (16 July 09)After much helpful discussion (esp. clinton/ELISHEVA on the CB, and the posts below), i came to the realisation that storing code refs and objects, while possible, doesn't make sense in my situation because the testing post-inflation ( Re^2: Storable Objects ) would take more or less as much time as i could save by having 'storage checkpoints' during the build-up of my mecha-objects, and moreover loses me/users much flexibility (i.e. what if they have multiple DB/flat files that they want to work from).

So in the end I decided ( thanks perrin! )that it would make more sense to put that time into making my code more efficient, so that users don't need to go for a coffee while their computer thinks...

My codebase is fairly large / diverse, but it has all been handled wonderfully by Devel::NYTProf, which seems to be the profiler-du-jour round here ( Re: Profiling and Performance ). There was also much useful info found here. I particularly found the nytprofhtml utility very useful and nice to work with, especially the way it gives you your code right next to how it is being used, and your can very easily click-navigate right across any code that was called! I set it up with these options:
export NYTPROF=trace=2:start=begin:file=/tmp/nytprof.out:addpid=1:savesrc=1

So, just a big thanks all round!

Just a something something...

Replies are listed 'Best First'.
Re: Storable Objects
by ikegami (Patriarch) on Jul 13, 2009 at 18:07 UTC

    I am a bit put off by the frequency of people having bad times with Storable. Especially with serializing code refs

    I imagine that's not a problem with Storable at all. Perl's internals weren't designed to be serialisable. There could be issues simply trying to access the necessary internals, to the point that support for it has always been broken and finally removed from Perl in 5.10. All modules will face these problems.

      So, more or less, there is no solution that fundamentally outstrips any of the others?

      I guess my main concern is textualising the code refs in the objects. People seem to be happy with YAML Use/Dump/LoadCode methods... and i can modify the deserialization methods so that code refs are Safely inflated. I just wondered if anyone had any experience/advice/opinions on which of the many options produced the best results. (best in terms of performance/ease/maintainability etc...)

      The way YAML and Data::Dumper go about it seem pretty much the same (all the way down to B::Deparse...). So is it really a matter of personal preference?

      Thanks for your advice ikegami.

      Just a something something...
Re: Storable Objects
by perrin (Chancellor) on Jul 13, 2009 at 19:42 UTC
    My advice is, don't do it. Trying to serialize coderefs is just a bad idea that's likely to break at some point. I suggest you take a step back and think of a different approach that doesn't require this kind of magic.
Re: Storable Objects
by GrandFather (Saint) on Jul 14, 2009 at 01:37 UTC

    I see that Data::Dump::Streamer is missing from your list. It is interesting to note that it expends extra effort to try and serialize code refs, although there are a few problematic areas.


    True laziness is hard work

      Thanks GrandFather, that may well be what i am looking for. I can will try to incorporate tests into the objects so that they can make sure that the code refs haven't gone funky and that the data the objects hold hasn't changed either.

      I guess subclassing a specialised My::Mod::WithTests class, which includes testable data and expected code ref output, would be the best way to do this? then i can include a few 'testable' objects in the larger collection. Or would a run_tests() method be enough (again including specialised test data and expected ouput)?

      I'll take into account perrin's advice too, see if i can't avoid this whole thing somehow. Or at least strip down the serialized data to a simpler format, only including the key data, and have a new_from_skeleton() method...

      Thanks everyone for your input!

      Just a something something...
Re: Storable Objects
by moritz (Cardinal) on Jul 13, 2009 at 20:33 UTC
    I've heard that KiokuDB can (de)serialize closures including all variables they closed over, so maybe it might be worth for you to look at how KiokuDB does it. (Or ask on their IRC channel, they are generally rather helpful if you have a bit of patience).