Re^8: Serializing a large object

in reply to Re^7: Serializing a large object
in thread Serializing a large object

Thank you BrowserUk. I greatly appreciate your kind attention.

I guess I will stick with the original, space-consuming implementation. Does it makes sense to compress the store files? If so, can you think of any particular module/method that is most appropriate for compressing this kind of data (I usually use IO::Compress::Gzip but I know there are other options too.

Thanks again for your help,
Dave.

Comment on Re^8: Serializing a large object Download Code

Replies are listed 'Best First'.
Re^9: Serializing a large object by BrowserUk (Patriarch) on Sep 28, 2010 at 06:30 UTC
Does it makes sense to compress the store files? Yes & no. :( Yes. I generated a random set of 3,000 overlaps--positive & negative--with a maximum range of 10,000. The nstore'd file on disk was: `26/09/2010 15:26 60,783,878 fred.bin`. gzipping that resulted in: `26/09/2010 15:26 423,984 fred.bin.gz`. It'll certainly save you large amounts of disk space. But that's not your aim. No. The problem is that whilst you save time reading from disk. You send time decompressing. And in the end, much of the time spent `retrieve()`ing the data, is the time required to allocate the memory and reconstruct it. It would certainly be worth you investigating the idea with your real-world datasets; and it will absolutely save huge amounts of disk space. But whether it will actually load faster will depend on many factors; you'll have to try for yourself with real data. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^9: Serializing a large object
by BrowserUk (Patriarch) on Sep 28, 2010 at 06:30 UTC

Does it makes sense to compress the store files?

Yes & no. :(

Yes.
I generated a random set of 3,000 overlaps--positive & negative--with a maximum range of 10,000.
The nstore'd file on disk was: 26/09/2010 15:26 60,783,878 fred.bin.
gzipping that resulted in: 26/09/2010 15:26 423,984 fred.bin.gz.
It'll certainly save you large amounts of disk space. But that's not your aim.
No.
The problem is that whilst you save time reading from disk. You send time decompressing. And in the end, much of the time spent retrieve()ing the data, is the time required to allocate the memory and reconstruct it.

It would certainly be worth you investigating the idea with your real-world datasets; and it will absolutely save huge amounts of disk space. But whether it will actually load faster will depend on many factors; you'll have to try for yourself with real data.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

RIP an inspiration; A true Folk's Guy

[reply]
[d/l]
[select]

In Section Seekers of Perl Wisdom