Compact data classes

creeble has asked for the wisdom of the Perl Monks concerning the following question:

I have a Class::Struct (using array form, fwiw) containing about a dozen string members. I build a database of these by sorting them into an array.

The total length of the strings used to create this database is about 2MB; about 150 bytes of actual data times about 15,000 records, say.

As an array of Class::Struct objects, this takes up over 15MB in ram. I am unfortunately running on an embedded processor with limited ram.

How could I store these strings in a more compact way, and still use a convenient accessor function (preferably, the same one, namely a class function) to retrieve the data? The db is read-only once it has been sorted.

Or, to perhaps put it another way, what would it take to write something like Class::Struct in XS with a non-growable string as the only data type?

Or am I already missing something from CPAN? What would be a more compact form to store

Comment on Compact data classes

Replies are listed 'Best First'.
Re: Compact data classes by BrowserUk (Patriarch) on Jun 08, 2013 at 06:07 UTC
This uses 4.4MB of ram to construct 15,000 objects each of 10 x 15-char fields. The array holding those 15k objects (containing a total of 2.15MB of string data) occupies 3.67MB of ram. The accessors are lvalue subs and their performance should be comparable with any other perl OO objects. No validation is done. use Class::Struct::Compact; use Devel::Size qw[ total_size ]; use Data::Dump qw[ pp ]; my @chars = ( 'a'..'z' ); sub dummy { my $n = shift; join '', @chars[ map int( rand @chars ), 1 .. $n ]; } Class::Struct::Compact->new( 'Test', F1 => 15, F2 => 15, F3 => 15, F4 => 15, F5 => 15, F6 => 15, F7 => 15, F8 => 15, F9 => 15, F10 => 15, ); printf "Class constructed: check mem: "; <>; our $N //= 15_000; my @db; push @db, Test->new( dummy( 150 ) ) for 1 .. $N; printf "Instances created: check mem: "; <>; @db = sort{ $a->F1 cmp $b->F1 } @db; #pp \@db; printf "Instances sorted: check mem: "; <>; print "total size: ", total_size \@db; for( 0, $#db ) { print "Record $_"; print "\t", $db[ $_ ]->F1; print "\t", $db[ $_ ]->F2; print "\t", $db[ $_ ]->F3; print "\t", $db[ $_ ]->F4; print "\t", $db[ $_ ]->F5; print "\t", $db[ $_ ]->F6; print "\t", $db[ $_ ]->F7; print "\t", $db[ $_ ]->F8; print "\t", $db[ $_ ]->F9; print "\t", $db[ $_ ]->F10; } __END__ C:\test>perl \perl64\site\lib\Class\Struct\Compact.pm Class constructed: check mem: 4.6MB Instances created: check mem: 9.0MB Instances sorted: check mem: 9.1MB total size: 3851232 Record 0 aaartlhgpkapaol jrbkelwlfklkjgn rhdrrltzezyuenc zxccfpxpbzcxoqy ysfqlfkrnhmaqhf vclpccofujbyars gwrdngknxjyxxni foiuaojwzrqouzc msbepsdptomdtbe qazhgrkywspzsts Record 14999 zzypdjmcgmgxnso yzygmkgabelvqlj xihybqagfiydipo fpgsaybyhrfawuc zyxekczeaxfomrs lwyannakxmgists nzehvwysfvpkeuf gggdblbshwhgnto vtdgbjvgwevpurx wtdgjumncxgfaih [download] The module & test code: <Reveal this spoiler or all in this thread> With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^2: Compact data classes by creeble (Sexton) on Jun 08, 2013 at 21:55 UTC
omg, you may be my best friend. Let me check that and get back.	[reply]
Re^2: Compact data classes by creeble (Sexton) on Jun 09, 2013 at 17:04 UTC
Wait, where is this magical Class::Struct::Compact? It's not in CPAN afaict?	[reply]
Re^3: Compact data classes by BrowserUk (Patriarch) on Jun 09, 2013 at 19:47 UTC
As hdb points out, it's in the spoiler at the end of my post. It's not on cpan because I wrote it -- actually adapted it from some existing code -- in reponse to your OP. I'd want to use a few times myself and see what else it needs before putting it out for general use. For starters it needs some error checking and Carp for when things go wrong. It'd also be nice to use pack templates to allow for numeric fields; but then it I'd have to drop the lvalue-ness of the accessors. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^4: Compact data classes by Anonymous Monk on Jun 10, 2013 at 16:52 UTC
Re^3: Compact data classes by hdb (Monsignor) on Jun 09, 2013 at 17:59 UTC
All in BrowserUk's post: http://perlmonks.org/?node_id=1037797;spoil=1	[reply] [d/l]
Re^2: Compact data classes by creeble (Sexton) on Jun 10, 2013 at 17:06 UTC
(I just love it when I reply thinking I'm logged in, ugh.) This is very clever! The only drawback is the fixed record length, but the savings is so significant compared to Class::Struct's ~5x overhead I should be able to find a workable size. Let me PM you about error checking, etc. Thanks so very much.	[reply]
Re^2: Compact data classes by creeble (Sexton) on Jun 10, 2013 at 17:56 UTC
Okay, one more question for BrowserUK here which may be of benefit to other similarly-clueless sub-monks like myself. Why does your solution use a compiled (eval) class? Is this faster or does it save memory vs. a solution like roboticus proposes?	[reply]
Re^3: Compact data classes by BrowserUk (Patriarch) on Jun 10, 2013 at 18:12 UTC
Because it allows me to 'hardcode' the numbers in the substrs. If you uncomment the print before the eval, you'll see something like: `package Test; use constant { F1_N => 0, F2_N => 15, F3_N => 30, F4_N => 45, F5_N => +60, F6_N => 75, F7_N => 90, F8_N => 105, F9_N => 120, F10_N => 135, } +; use constant { F1_L => 15, F2_L => 15, F3_L => 15, F4_L => 15, F5_L => + 15, F6_L => 15, F7_L => 15, F8_L => 15, F9_L => 15, F10_L => 15, }; sub new { my $class = shift; my $self = shift // ''; return bless \$self, $class } # line 1 "sub_F4" sub Test::F4 :lvalue { my $self = shift; substr( $$self, F4_N(), F4_L() ); }` [download] `F4_N()` and `F4_L()` are constant subs which get optimised away during compilation, leaving hardcoded numbers which are faster than variables. The memory saving comes from packing the fields into single strings; the performance comes from asking the sub to do as little as possible. That said, by explaining that, I've spotted another couple of optimisations; and a potential bug. I'll get back to you with a revised version 2 days from now. (A good reason for not uploading to cpan straight away :) With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^4: Compact data classes by creeble (Sexton) on Jun 10, 2013 at 18:46 UTC
Re^5: Compact data classes by BrowserUk (Patriarch) on Jun 23, 2013 at 05:38 UTC
Re: Compact data classes by roboticus (Chancellor) on Jun 08, 2013 at 00:20 UTC
creeble: I can think of a few possibilities, but I don't know if any would be good because i don't know how you intend to access the values, how fast you need to look them up, etc. For example, you could concatenate all your strings into one long one using the scheme of having a byte containing the length of the field and the text. (You did mention that each object totals about 150 bytes of data, if I understand correctl.) Then your objects would store only the offset of the first field. Each access function would simply need to skip the appropriate number of fields. $ cat t.pl #!/usr/bin/perl use strict; use warnings; my $joe = kablooie::new('Joe', 'Blow'); my $jane = kablooie::new('Jane', 'Smith'); my $name = $joe->fname(); print "joe's first name: $name\n"; $name = $jane->lname(); print "jane's last name: $name\n"; package kablooie; my $kablooie_storage; BEGIN { $kablooie_storage = "Bob's your uncle"; } sub new { my ($fname,$lname) = @_; my $offset = length($kablooie_storage); $kablooie_storage .= chr(length($fname)) . $fname; $kablooie_storage .= chr(length($lname)) . $lname; return bless \$offset, 'kablooie'; } sub fname { # FName is the first field my $self = shift; my $offset = $$self; my $len = ord(substr($kablooie_storage,$offset,1)); return substr($kablooie_storage,$offset+1,$len); } sub lname { # LName is the second field my $self = shift; my $offset = $$self; my $len = ord(substr($kablooie_storage,$offset,1)); $offset+=$len+1; $len = ord(substr($kablooie_storage,$offset,1)); return substr($kablooie_storage,$offset+1,$len); } [download] When run, I get: `$ perl t.pl joe's first name: Joe jane's last name: Smith` [download] You could add some simple compression scheme, too. (RLE might be useful if you have lots of repeated characters, and if you don't mind restricting your character set, you could compress the characters into 6 bits each.) That said, I have no idea how much space you could save using this technique, and whether it would be worth it or not. ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l] [select]
Re^2: Compact data classes by creeble (Sexton) on Jun 08, 2013 at 02:28 UTC
I tried a variation of that using pack and unpack, but it was unworkably slow. I do need quick access to all the fields. It did save a fair amount of memory, but less than I would have hoped. What I'm starting to envision is a somewhat-specialized XS module that builds something similar (I could afford pointers to the individual field strings) in C, and then allows access to it via XS. The db is read-only once it's sorted, so updates aren't an issue. You could malloc large blocks and just write null-terminated strings to them, updating the pointers for the fields. But I know almost nothing about XS and whether the conversion from a string in C to what perl thinks is a string would be painfully slow. I would guess not, since it must do it internally all the time?	[reply]
Re: Compact data classes by v-zor (Initiate) on Jun 10, 2013 at 01:12 UTC
HTH : http://search.cpan.org/search?query=Gzip&mode=all	[reply]
Re: Compact data classes by sundialsvc4 (Abbot) on Jun 08, 2013 at 14:43 UTC
Well, what about SQLite? It’s public-domain(!), used in cell phones all over the planet, and known to be quite efficient. Maybe you could make that your primary backing-store. (Perhaps also this would eliminate the need for sorting?) Because SQLite is used in many resource-constrained situations, it has a lot of options for making efficient use of memory. (Here, I’m referring to the software itself, not just its Perl implementation ...) (Quite seriously, “SQLite is another Swiss Army® knife of computer programming” ...) A homebrew in-memory NRU caching scheme is fairly easy to construct in Perl ... if it turns out that you actually need one when using SQLite. (Find out, first.) A hashref can provide random access to strings, while a separate array of hashrefs (to the same Perl objects) provides for round-robin recycling: when the array has reached its arbitrarily-set limit, `shift` or `pop` an element off, `delete` the key from the hashref (you must do both!), then let the now-unreferenced data disappear into the gloom while you create another key, add a reference to it to the hash, and `unshift` or `push` another reference to the same thing onto the array. (The data-records now have a reference-count of 2, one from the hash, the other from the list, as they will for the duration.) It’s NRU = Not Recently Used, not LRU = Least Recently Used, but it’s all resident-memory, so, so what.	[reply]
Re^2: Compact data classes by BrowserUk (Patriarch) on Jun 08, 2013 at 15:46 UTC
How much memory is used by constructing an SQLite DB with 15,000 records containing 10 x 15-char fields? You recommend this module so often; surely you know this off the top of your head? Even if not, it will be the work of minutes to try it out won't it? Come on. Prove me wrong. Show me that this isn't just another case of a keyword triggering one of your 6 remaining synapses to fire, and cause you to trot out the associated, autonomic "advice". With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^3: Compact data classes by sundialsvc4 (Abbot) on Jun 10, 2013 at 12:42 UTC
Please limit your discussions on this forum totechnical objections, not vilifying of fellow Monks. You have been courteously asked this before. Thank you.	[reply]
Re^4: Compact data classes by BrowserUk (Patriarch) on Jun 10, 2013 at 13:07 UTC
Re^2: Compact data classes by creeble (Sexton) on Jun 10, 2013 at 18:09 UTC
And just for the record, while I appreciate that SQLite is free and easy, I'm not looking for a "backing store" per se -- I need this 'db' to be all in ram, and my experience with SQLite using an in-ram database hasn't shown it to be particularly memory-efficient. That said, I'd be happy to be proven wrong; I haven't used SQLite from Perl in many years.	[reply]

Back to Seekers of Perl Wisdom