Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Problems with SDBM

by Tux (Monsignor)
on Mar 15, 2013 at 12:48 UTC ( #1023682=note: print w/ replies, xml ) Need Help??


in reply to Problems with SDBM

Every module that can tie a hash to some persistence mechanism has pros and cons. It not only depends on the size of your data or the number of elements in the hash, but also on the usage of the hash. How do reads compare to writes? Is it write once, read often? Is the reason to tie resource limits or is it persistence?

YMMV across architectures and the type of data stored.

See this table, this table, this graph, this graph, this graph, and this graph for speed comparisons. It compares DB_File with other serializer modules. I wanted to see the results after I wrote Tie::Hash::DBD that I created after I ran into serious trouble when DB_File hit resource limits and caused data corruption.


Enjoy, Have FUN! H.Merijn


Comment on Re: Problems with SDBM
Re^2: Problems with SDBM
by Anonymous Monk on Mar 15, 2013 at 13:00 UTC

    after I ran into serious trouble when DB_File hit resource limits and caused data corruption.

    Which backend, which db_version?

    Another benchmark at SQLite vs CDB_File vs BerkeleyDB

      I have to guess here, as it is too long ago to be sure, and trouble hit the fan at a customer site with less resources than where they had tested the script (which had to run a long analysis on two databases that took close to 30 hours, which makes it obvious why data corruption after 20+ hours is not an option.

      I started Tie::Hash::DBD in August 2010, which makes me assume we ran perl-5.10.1/64all on HP-UX 11.11 (at the customer site) with DB_File-1.020 targetting libdb-4.2.52 (after which I stopped upgrading, as Oracle made it close to impossible to port).

      Is that what you wanted to know?


      Enjoy, Have FUN! H.Merijn

      New actual numbers (higher is better), now include BerkeleyDB:

      updated with *DBM_File columns (compressed the output a bit to make it "fit")

      Linux 3.4.33-2.24-desktop [openSUSE 12.2 (Mantis)] i386 Core(TM) i7-2 +620M CPU @ 2.70GHz/800(4) i686 7969 Mb This is perl 5, version 16, subversion 3 (v5.16.3) built for i686-linu +x-64int Size op GDBM NDBM ODBM SDBM DB_File CDB_File BerkDB Re +dis Redis2 SQLite Pg mysql CSV ------ -- ------- ------- ------- ------- ------- -------- ------- --- +--- ------ ------- ------- ------- ------- 20 rd 32573 27972 27855 165289 24752 1111111 18587 4 +754 7186 30257 6197 3003 883 20 wr 19685 10678 9182 20855 6361 26917 5762 4 +848 6289 11961 2107 723 953 200 rd 142959 113636 116822 161550 62092 1333333 53404 5 +033 7507 37943 6312 1143 124 200 wr 65189 54555 64578 89007 58479 221483 37800 7 +700 8325 25687 4092 1417 230 600 rd 155925 114832 120992 183486 49285 1263157 43687 6 +366 7551 37657 11386 428 - 600 wr 101437 71633 83148 109950 44886 444115 41649 8 +717 6311 27700 5081 670 - 2000 rd 156311 97092 102202 138169 44295 1006036 39761 6 +209 8277 34599 10931 142 - 2000 wr 100376 76438 82474 107060 40711 577700 39096 8 +724 12205 27475 6241 260 - 20000 rd 141094 92384 94677 123507 49629 693096 43771 6 +098 8201 30522 9721 - - 20000 wr 94704 76299 80329 103297 30815 527676 29369 8 +284 8595 23866 5667 - - 200000 rd 134909 110688 99839 138195 45577 677541 40508 5 +385 7658 30463 8482 - - 200000 wr 51296 58657 59119 99944 28033 592327 26488 7 +949 9728 22878 5160 - -

      Below is the script I run


      Enjoy, Have FUN! H.Merijn
      help|?
Re^2: Problems with SDBM
by BrowserUk (Pope) on Mar 15, 2013 at 13:19 UTC

    Those are he most confusing graphs I've ever seen.

    For example: both this & this are labelled "Write records per second", carry the same numbers on the x=axis, and list the same (10) DBs in the legend; but the are totally different graphs. On one, only 7 lines; on the other 9 of 10.

    DB_File hit resource limits and caused data corruption.

    What resource limit?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      The graphs are part of a talk, and you miss the spoken explanation here :)

      The graphs come in pairs. The second is a zoom-in on the bottom part of the first. You might notice that the lines that are high on the first graph of each set do not appear on the second. The colors might have hinted you to this.

      The resource limits esp. in memory. At start, most memory was available. Halfway the long running process, the system also needed (lots of) memory for other processes and started swapping. They tied hashes where about 4 Gb each (4 of them).


      Enjoy, Have FUN! H.Merijn
Re^2: Problems with SDBM
by Laurent_R (Parson) on Mar 15, 2013 at 14:37 UTC

    Hi, thanks everyone for the answers already provided.

    The main reason to tie is resource limits: the data input has about 30 million records (and slightly less than 2 GB) and that is just too large for a hash (untied hash, that is). Having said that, persistence would also be a bonus because later processes would use the same data and would not have to load it again. But persistence is not the primary reason for using tied hashes.

    I am not too much concerned with speed performance at this point (although it might become important at some point, given the large data volume), my concern is that the process fails when I have loaded only about half of the data (15.8 million records), presumably because of the large volume of data. I could use several tied hashes to get around this volume limit, but that would be sort of awkward and unwieldy (and not very scalable).

    It seems that the Berkeley DB is not available on our system, so it seems that it will not be an option.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1023682]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (8)
As of 2014-08-28 07:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (257 votes), past polls