Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^3: Serializing a large object

by BrowserUk (Pope)
on Oct 09, 2010 at 13:53 UTC ( #864381=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Serializing a large object
in thread Serializing a large object

A small set (say 1000 or so) "typical" input ranges; and a hundred or so test ranges along with the result counts when compared against the supplied input set. Timings of how long it took to run would also be useful.

As for how to exchange them, email seems possible. One set of 1000 pairs; plus 1 set of 100 hundred pairs + counts; plus a time won't take much space. You could probably even post them here. /msg me for a email address if you want to go that route.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re^3: Serializing a large object
Re^4: Serializing a large object
by daverave (Scribe) on Oct 09, 2010 at 14:35 UTC
    coordinates are in 1 .. 877879 (both inclusive). I used some real data so the number of ranges is 2777 and it followed by some 1000 queries and their results.

      Thanks. How long did that take using your current solution? A load time and size fr compressed nstore data; and the time taken to perform the search and produce the results would be useful.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        use strict; use warnings; use 5.012; use FastBioRanges; # use PerlIO::gzip; # I just noticed I forgot this 'use' but it worked + fine... how? use Storable qw (retrieve_fd); use Time::HiRes qw(gettimeofday tv_interval); my $time = [gettimeofday]; open( my $fastbioranges_fh, "<:gzip", 'fastbioranges.store.gz') or die; my $fastbioranges = retrieve_fd($fastbioranges_fh) or die; close($fastbioranges_fh); say "loaded in ", tv_interval($time), " seconds"; my $n = 1000; $time = [gettimeofday]; #say "start\tend\tcover"; for ( 1 .. $n ) { my $start = int rand(877878); my $size = int rand(7000); my $end = ( $start + $size ) % 877879 + 1; my $cover = $fastbioranges->num_ranges_containing( $start, $end ); # say "$start\t$end\t$cover"; } say "$n queries in ", tv_interval($time), " seconds";

        loaded in 0.385292 seconds 1000 queries in 0.005204 seconds
        Recall usually my objects are 5-10 time larger and the number of queries is in the millions. The querying is not optimized (all the 'my $...', 'rand' etc.) but still it's very fast.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://864381]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (4)
As of 2014-09-01 23:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (18 votes), past polls