Re: Compact data classes
by BrowserUk (Patriarch) on Jun 08, 2013 at 06:07 UTC
|
This uses 4.4MB of ram to construct 15,000 objects each of 10 x 15-char fields.
The array holding those 15k objects (containing a total of 2.15MB of string data) occupies 3.67MB of ram.
The accessors are lvalue subs and their performance should be comparable with any other perl OO objects.
No validation is done.
use Class::Struct::Compact;
use Devel::Size qw[ total_size ];
use Data::Dump qw[ pp ];
my @chars = ( 'a'..'z' );
sub dummy {
my $n = shift;
join '', @chars[ map int( rand @chars ), 1 .. $n ];
}
Class::Struct::Compact->new(
'Test',
F1 => 15, F2 => 15, F3 => 15, F4 => 15, F5 => 15,
F6 => 15, F7 => 15, F8 => 15, F9 => 15, F10 => 15,
);
printf "Class constructed: check mem: "; <>;
our $N //= 15_000;
my @db;
push @db, Test->new( dummy( 150 ) ) for 1 .. $N;
printf "Instances created: check mem: "; <>;
@db = sort{
$a->F1 cmp $b->F1
} @db;
#pp \@db;
printf "Instances sorted: check mem: "; <>;
print "total size: ", total_size \@db;
for( 0, $#db ) {
print "Record $_";
print "\t", $db[ $_ ]->F1;
print "\t", $db[ $_ ]->F2;
print "\t", $db[ $_ ]->F3;
print "\t", $db[ $_ ]->F4;
print "\t", $db[ $_ ]->F5;
print "\t", $db[ $_ ]->F6;
print "\t", $db[ $_ ]->F7;
print "\t", $db[ $_ ]->F8;
print "\t", $db[ $_ ]->F9;
print "\t", $db[ $_ ]->F10;
}
__END__
C:\test>perl \perl64\site\lib\Class\Struct\Compact.pm
Class constructed: check mem: 4.6MB
Instances created: check mem: 9.0MB
Instances sorted: check mem: 9.1MB
total size: 3851232
Record 0
aaartlhgpkapaol
jrbkelwlfklkjgn
rhdrrltzezyuenc
zxccfpxpbzcxoqy
ysfqlfkrnhmaqhf
vclpccofujbyars
gwrdngknxjyxxni
foiuaojwzrqouzc
msbepsdptomdtbe
qazhgrkywspzsts
Record 14999
zzypdjmcgmgxnso
yzygmkgabelvqlj
xihybqagfiydipo
fpgsaybyhrfawuc
zyxekczeaxfomrs
lwyannakxmgists
nzehvwysfvpkeuf
gggdblbshwhgnto
vtdgbjvgwevpurx
wtdgjumncxgfaih
The module & test code:
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
omg, you may be my best friend. Let me check that and get back.
| [reply] |
|
Wait, where is this magical Class::Struct::Compact? It's not in CPAN afaict?
| [reply] |
|
As hdb points out, it's in the spoiler at the end of my post.
It's not on cpan because I wrote it -- actually adapted it from some existing code -- in reponse to your OP.
I'd want to use a few times myself and see what else it needs before putting it out for general use. For starters it needs some error checking and Carp for when things go wrong.
It'd also be nice to use pack templates to allow for numeric fields; but then it I'd have to drop the lvalue-ness of the accessors.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
|
| [reply] [d/l] |
|
(I just love it when I reply thinking I'm logged in, ugh.)
This is very clever! The only drawback is the fixed record length, but the savings is so significant compared to Class::Struct's ~5x overhead I should be able to find a workable size.
Let me PM you about error checking, etc. Thanks so very much.
| [reply] |
|
| [reply] |
|
Because it allows me to 'hardcode' the numbers in the substrs.
If you uncomment the print before the eval, you'll see something like:
package Test;
use constant { F1_N => 0, F2_N => 15, F3_N => 30, F4_N => 45, F5_N =>
+60, F6_N => 75, F7_N => 90, F8_N => 105, F9_N => 120, F10_N => 135, }
+;
use constant { F1_L => 15, F2_L => 15, F3_L => 15, F4_L => 15, F5_L =>
+ 15, F6_L => 15, F7_L => 15, F8_L => 15, F9_L => 15, F10_L => 15, };
sub new {
my $class = shift;
my $self = shift // '';
return bless \$self, $class
}
# line 1 "sub_F4"
sub Test::F4 :lvalue {
my $self = shift;
substr( $$self, F4_N(), F4_L() );
}
F4_N() and F4_L() are constant subs which get optimised away during compilation, leaving hardcoded numbers which are faster than variables.
The memory saving comes from packing the fields into single strings; the performance comes from asking the sub to do as little as possible.
That said, by explaining that, I've spotted another couple of optimisations; and a potential bug. I'll get back to you with a revised version 2 days from now.
(A good reason for not uploading to cpan straight away :)
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|
Re: Compact data classes
by roboticus (Chancellor) on Jun 08, 2013 at 00:20 UTC
|
creeble:
I can think of a few possibilities, but I don't know if any would be good because i don't know how you intend to access the values, how fast you need to look them up, etc.
For example, you could concatenate all your strings into one long one using the scheme of having a byte containing the length of the field and the text. (You did mention that each object totals about 150 bytes of data, if I understand correctl.) Then your objects would store only the offset of the first field. Each access function would simply need to skip the appropriate number of fields.
$ cat t.pl
#!/usr/bin/perl
use strict;
use warnings;
my $joe = kablooie::new('Joe', 'Blow');
my $jane = kablooie::new('Jane', 'Smith');
my $name = $joe->fname();
print "joe's first name: $name\n";
$name = $jane->lname();
print "jane's last name: $name\n";
package kablooie;
my $kablooie_storage;
BEGIN {
$kablooie_storage = "Bob's your uncle";
}
sub new {
my ($fname,$lname) = @_;
my $offset = length($kablooie_storage);
$kablooie_storage .= chr(length($fname)) . $fname;
$kablooie_storage .= chr(length($lname)) . $lname;
return bless \$offset, 'kablooie';
}
sub fname {
# FName is the first field
my $self = shift;
my $offset = $$self;
my $len = ord(substr($kablooie_storage,$offset,1));
return substr($kablooie_storage,$offset+1,$len);
}
sub lname {
# LName is the second field
my $self = shift;
my $offset = $$self;
my $len = ord(substr($kablooie_storage,$offset,1));
$offset+=$len+1;
$len = ord(substr($kablooie_storage,$offset,1));
return substr($kablooie_storage,$offset+1,$len);
}
When run, I get:
$ perl t.pl
joe's first name: Joe
jane's last name: Smith
You could add some simple compression scheme, too. (RLE might be useful if you have lots of repeated characters, and if you don't mind restricting your character set, you could compress the characters into 6 bits each.)
That said, I have no idea how much space you could save using this technique, and whether it would be worth it or not.
...roboticus
When your only tool is a hammer, all problems look like your thumb. | [reply] [d/l] [select] |
|
I tried a variation of that using pack and unpack, but it was unworkably slow. I do need quick access to all the fields. It did save a fair amount of memory, but less than I would have hoped.
What I'm starting to envision is a somewhat-specialized XS module that builds something similar (I could afford pointers to the individual field strings) in C, and then allows access to it via XS. The db is read-only once it's sorted, so updates aren't an issue. You could malloc large blocks and just write null-terminated strings to them, updating the pointers for the fields.
But I know almost nothing about XS and whether the conversion from a string in C to what perl thinks is a string would be painfully slow. I would guess not, since it must do it internally all the time?
| [reply] |
Re: Compact data classes
by v-zor (Initiate) on Jun 10, 2013 at 01:12 UTC
|
HTH : http://search.cpan.org/search?query=Gzip&mode=all | [reply] |
Re: Compact data classes
by sundialsvc4 (Abbot) on Jun 08, 2013 at 14:43 UTC
|
Well, what about SQLite? It’s public-domain(!), used in cell phones all over the planet, and known to be quite efficient. Maybe you could make that your primary backing-store. (Perhaps also this would eliminate the need for sorting?) Because SQLite is used in many resource-constrained situations, it has a lot of options for making efficient use of memory. (Here, I’m referring to the software itself, not just its Perl implementation ...)
(Quite seriously, “SQLite is another Swiss Army® knife of computer programming” ...)
A homebrew in-memory NRU caching scheme is fairly easy to construct in Perl ... if it turns out that you actually need one when using SQLite. (Find out, first.) A hashref can provide random access to strings, while a separate array of hashrefs (to the same Perl objects) provides for round-robin recycling: when the array has reached its arbitrarily-set limit, shift or pop an element off, delete the key from the hashref (you must do both!), then let the now-unreferenced data disappear into the gloom while you create another key, add a reference to it to the hash, and unshift or push another reference to the same thing onto the array. (The data-records now have a reference-count of 2, one from the hash, the other from the list, as they will for the duration.) It’s NRU = Not Recently Used, not LRU = Least Recently Used, but it’s all resident-memory, so, so what.
| [reply] |
|
How much memory is used by constructing an SQLite DB with 15,000 records containing 10 x 15-char fields?
You recommend this module so often; surely you know this off the top of your head?
Even if not, it will be the work of minutes to try it out won't it?
Come on. Prove me wrong. Show me that this isn't just another case of a keyword triggering one of your 6 remaining synapses to fire, and cause you to trot out the associated, autonomic "advice".
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
| [reply] |
|
|
| [reply] |