Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

Re: Fast data structure..!!!

by NetWallah (Abbot)
on Apr 15, 2008 at 15:26 UTC ( #680540=note: print w/replies, xml ) Need Help??

in reply to Fast data structure..!!!

As apl indicates - your current storage mechanism is already pretty fast.

Perhaps if we had more information on your problem domain, better suggestions could be made.

For example, are all the keys numeric, under 2M ? Have you tried pre-allocation of the keys ? An in-memory database ? An AOA, instead of a HOH ?

     "How many times do I have to tell you again and again .. not to be repetitive?"

Replies are listed 'Best First'.
Re^2: Fast data structure..!!!
by MimisIVI (Acolyte) on Apr 15, 2008 at 15:36 UTC
    Hi Guys,

    Here is the real code..

    I 've got a bit strinbg where i saved (f.e. 2 milion)possitive integers with 4 bytes each one..The code is like below..

    my $docID=0; my $tf=0; my $index=0; my $Aa=time(); for(1 .. 2000000) { $docID = vec ($dp, $index++, 32); $tf = vec ($dp, $index++, 32); $ TERM FREQUENCY $vec{$docID}=$tf; ### SAve Data for(1 .. $tf) { my $poss=vec ($dp, $index++, 32); $pos{$docID}{$poss}=\$order;# save Data } } print "unpack vector in \t",time()-$Aa," secs...\n";

    Thats what i want to speed up...The vec is really very fast to read the bitstring but the saving make the prerfomance slow...:(

    Any suggestions????

      Get a faster computer. On my notebook (about 1 year old, 2 GHz CPU and enough RAM) this takes about 1.6 seconds. I don't say that boast, but rather to tell you that your hardware isn't up to date.

      And are you sure that you actually access all items of that data structure? Right now you just build it up, but don't use it, so there's no way for us to tell.

        It may run that fast because the OP didn't supply a value for $dp. I pasted the code into a file that has use strict at the top, and it blew up immediately.

      Assuming that $pos and $poss are two different things, and $pos is defined somewhere outside of the code snippet, you can save a bundle by doing something like this:
      my $docpos = $pos{$docID}; for (1 .. $tf) { $docpos->{vec ($dp, $index++, 32)}=\$order; }
      It may seem trivial, but that's about 2000000*avg_tf dereferences.
      Getting rid of an unnecessary "my" variable should also pick up a few extra cycles.

      I'd like a sample value for $dp. A Data::Dumper representation of it or some code to cook up a reasonable facsimile would work. When I run this on my computer, it doesn't do anything interesting until I give it a value for that, but what particular value it has could have a big effect on performance.


        Here is the whole code...I should give it earlier

        use strict; use Devel::Size qw(size); my $df=500000;my $tf=3; my $wektor = ''; my $packed = ''; my $nr=0; for(0 .. $df) { vec ($wektor, $nr++, 32) = $_; # DOC ID...... vec ($wektor, $nr++, 32) = $tf; # TF...... for(0 .. $tf) { vec ($wektor, $nr++, 32) = $_+10; # POSITIONS } } print "Vector's size: " . size( $wektor ) . " bytes\n"; ###################### UNPACK VECTOR2..... my %vec;my %pos; my $docID=0; my $tf=0; my $index=0; my $order=1; my $Aa=time(); for(0 .. $df) { $docID = vec ($wektor, $index++, 32); $tf = vec ($wektor, $index++, 32); $vec{$docID}=$tf; # print "Doc id: $docID\ttf: $tf\n"; for(0 .. $tf) { my $last=vec ($wektor, $index++, 32); $pos{$docID}{$last}=$order; } } print "unpack vector in \t",time()-$Aa," secs..\n";

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://680540]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (6)
As of 2017-04-30 10:32 GMT
Find Nodes?
    Voting Booth?
    I'm a fool:

    Results (535 votes). Check out past polls.