Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Fast data structure..!!!

by MimisIVI (Acolyte)
on Apr 15, 2008 at 15:07 UTC ( #680532=perlquestion: print w/replies, xml ) Need Help??
MimisIVI has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I need to save info with a fast way in Perl and to be able to read this info fast too... For example the below loop seems in simplified way what i want to improve.. I have three values in each loop time to save and by using a hash of hash it takes 15 secs..

my %save; for(0 .. 2000000) { $save{$_}{$_+1}=$_+2; }

Do you have any idea how can i save data fast????


Replies are listed 'Best First'.
Re: Fast data structure..!!!
by apl (Monsignor) on Apr 15, 2008 at 15:13 UTC
    Call me old-fashioned, but storing 133 thousand entries a second doesn't sound bad...
      Yes its not sound bad, but i need more speed..and putting a limit to the info isnot an option...
Re: Fast data structure..!!!
by kyle (Abbot) on Apr 15, 2008 at 15:26 UTC

    If you really are storing numerical data with numerical keys, an array might serve you better than a hash.

    Have you tried profiling your code to make sure that this really is the part that needs to be optimized? I find it hard to believe that in any useful application, storage and retrieval in RAM is the bottleneck.

Re: Fast data structure..!!!
by moritz (Cardinal) on Apr 15, 2008 at 15:26 UTC
    If you use only continuous natural numbers as keys, it's (slightly) faster to use an array instead of a hash.

    But why do you want to store that data? It's faster to compute them on the fly (if you don't access all of them, at least).

Re: Fast data structure..!!!
by FunkyMonk (Canon) on Apr 15, 2008 at 15:31 UTC
    I see only four options for you:
    1. Buy a faster computer
    2. use an array, not a hash (if your data really is numerical & sequenced)
    3. Code it in C
    4. Live with it
    Here's some timings on my computer (Linux, Perl 5.8.8, AMD Dual core 4800)
    use Time::HiRes qw/time/; my $start = time; my %save; for(0 .. 2000000) { $save{$_}{$_+1}=$_+2; } print "hash: ", time - $start, "\n"; $start = time; my @save; for(0 .. 2000000) { $save[$_]{$_+1}=$_+2; } print "array: ", time - $start, "\n"; #Ouput: #hash: 4.92373299598694 #array: 3.34077191352844

Re: Fast data structure..!!!
by moklevat (Priest) on Apr 15, 2008 at 15:54 UTC
    Depending on the type of data you are working with, you might benefit from using PDL. For numerical tasks I have found that it can be very fast and memory efficient.

    Update: As requested below, let's say you want a 2D array. You can create a "piddle" (the PDL data structure) with a sequence of arbitraty length and two dimensions as simply as:

    #!/usr/bin/perl use strict; use warnings; use PDL; my $sequence = pdl [0..20],[0..20]; print "$sequence\n"; ***Outputs*** [ [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20] [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20] ]
      Hi moklevat,

      I dont know anything about PDL..

      Can you give a simple example how can use the numerical task with the PDL??
Re: Fast data structure..!!!
by samtregar (Abbot) on Apr 15, 2008 at 16:34 UTC
    This is around 20% faster in my tests and should use less memory (I didn't check how much):

    my %save; for(0 .. 2000000) { $save{$_,$_+1}=$_+2; }

    The difference is that this version creates just one hash and fills it with data in compound keys (joined with $;, \034 by default). Your version created lots of sub-hashes, allocating memory for each one. Using an array is even faster though.

    I once solved a problem like this by using XS to compile a shared object containing a packed C data structure with my data, along with accessor routines also written in C. Then at runtime the Perl code just loaded the shared-object to get access to the data. It was really, really fast but I doubt I'd do it again - Berkeley DB has pretty much solved this problem.


Re: Fast data structure..!!!
by NetWallah (Abbot) on Apr 15, 2008 at 15:26 UTC
    As apl indicates - your current storage mechanism is already pretty fast.

    Perhaps if we had more information on your problem domain, better suggestions could be made.

    For example, are all the keys numeric, under 2M ? Have you tried pre-allocation of the keys ? An in-memory database ? An AOA, instead of a HOH ?

         "How many times do I have to tell you again and again .. not to be repetitive?"

      Hi Guys,

      Here is the real code..

      I 've got a bit strinbg where i saved (f.e. 2 milion)possitive integers with 4 bytes each one..The code is like below..

      my $docID=0; my $tf=0; my $index=0; my $Aa=time(); for(1 .. 2000000) { $docID = vec ($dp, $index++, 32); $tf = vec ($dp, $index++, 32); $ TERM FREQUENCY $vec{$docID}=$tf; ### SAve Data for(1 .. $tf) { my $poss=vec ($dp, $index++, 32); $pos{$docID}{$poss}=\$order;# save Data } } print "unpack vector in \t",time()-$Aa," secs...\n";

      Thats what i want to speed up...The vec is really very fast to read the bitstring but the saving make the prerfomance slow...:(

      Any suggestions????

        Get a faster computer. On my notebook (about 1 year old, 2 GHz CPU and enough RAM) this takes about 1.6 seconds. I don't say that boast, but rather to tell you that your hardware isn't up to date.

        And are you sure that you actually access all items of that data structure? Right now you just build it up, but don't use it, so there's no way for us to tell.

        Assuming that $pos and $poss are two different things, and $pos is defined somewhere outside of the code snippet, you can save a bundle by doing something like this:
        my $docpos = $pos{$docID}; for (1 .. $tf) { $docpos->{vec ($dp, $index++, 32)}=\$order; }
        It may seem trivial, but that's about 2000000*avg_tf dereferences.
        Getting rid of an unnecessary "my" variable should also pick up a few extra cycles.

        I'd like a sample value for $dp. A Data::Dumper representation of it or some code to cook up a reasonable facsimile would work. When I run this on my computer, it doesn't do anything interesting until I give it a value for that, but what particular value it has could have a big effect on performance.


Re: Fast data structure..!!!
by Thilosophy (Curate) on Apr 16, 2008 at 01:20 UTC

    Pre-sizing the hash so that it does not need to expand dynamically should save a lot of time.

    Also, try to get rid of the nesting in the hash by concatenating the components of the key and using the concatenation as key to a flat hash.

    If that is still not fast enough, you will probably have to implement your own hash in C, or reconsider if you can come up with a clever algorithm that does not depend on instant access to a huge data structure (do you really need all that data?)

Re: Fast data structure..!!!
by Gavin (Chancellor) on Apr 15, 2008 at 15:48 UTC

    Perhaps!!! a!!! Database!!! is!!! the!!! answer!!!

      I!!!use!!!already!!!a!!Database!!to!!! retrieve!!!the!!!bitstring!!!!

        Blackadder: Baldrick, have you no idea what irony is?

        Baldrick: Yes, it's like goldy and bronzy only it's made out of iron !!!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://680532]
Approved by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2017-05-28 18:40 GMT
Find Nodes?
    Voting Booth?