Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^4: Fast data structure..!!!

by MimisIVI (Acolyte)
on Apr 15, 2008 at 16:40 UTC ( #680570=note: print w/ replies, xml ) Need Help??


in reply to Re^3: Fast data structure..!!!
in thread Fast data structure..!!!

Here is the whole code...I should give it earlier

use strict; use Devel::Size qw(size); my $df=500000;my $tf=3; my $wektor = ''; my $packed = ''; my $nr=0; for(0 .. $df) { vec ($wektor, $nr++, 32) = $_; # DOC ID...... vec ($wektor, $nr++, 32) = $tf; # TF...... for(0 .. $tf) { vec ($wektor, $nr++, 32) = $_+10; # POSITIONS } } print "Vector's size: " . size( $wektor ) . " bytes\n"; ###################### UNPACK VECTOR2..... my %vec;my %pos; my $docID=0; my $tf=0; my $index=0; my $order=1; my $Aa=time(); for(0 .. $df) { $docID = vec ($wektor, $index++, 32); $tf = vec ($wektor, $index++, 32); $vec{$docID}=$tf; # print "Doc id: $docID\ttf: $tf\n"; for(0 .. $tf) { my $last=vec ($wektor, $index++, 32); $pos{$docID}{$last}=$order; } } print "unpack vector in \t",time()-$Aa," secs..\n";


Comment on Re^4: Fast data structure..!!!
Download Code
Re^5: Fast data structure..!!!
by moritz (Cardinal) on Apr 15, 2008 at 16:59 UTC
    I still don't get why it takes 15s for you to execute that code, it runs in 3s on mine. Did your machine swap to disk or something? Or is that an ancient machine? or a debugging perl?

    A few thoughts anyway:

    1. Try to pack vector. You're only accessing it linearly anway

    2. Try not to use that vector at all. You can populate your %pos hash in the first place instead

    3. Store the data structure on disk once, for example in a BerkeleyDB. It seems to be constant, so you don't actually need to calculate it every time your program starts. Or store $wektor in a plain binary file after you created it, and in subsequent runs only retrieve that from disk and generate your hash from it.

Re^5: Fast data structure..!!!
by kyle (Abbot) on Apr 15, 2008 at 17:03 UTC

    I ran this under Devel::NYTProf and found that this loop is the hot spot:

    for(0 .. $tf) { my $last=vec ($wektor, $index++, 32); $pos{$docID}{$last}=$order; }

    I changed it to this, following a suggestion from dvryaboy and also getting rid of the block completely:

    $pos{$docID}{vec($wektor, $index++, 32)}=$order for 0 .. $tf;

    That got somewhat faster.

    I tried taking out the constant reference to $pos{$docID} like this:

    $pos{$docID} ||= {}; my $did_ref = $pos{$docID}; $did_ref->{vec($wektor, $index++, 32)}=$order for 0 .. $tf;

    ...but that didn't make much difference to the hot loop, and it was more expensive outside the loop than the savings it got inside.

    This is without really understanding what's going on, though. I wouldn't be surprised if what you're doing would benefit from just using a better algorithm.

      Yes,this loop is the hot spot..Perhaps if i will change my algorithm will be the best way..

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://680570]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (14)
As of 2014-08-27 18:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (248 votes), past polls