Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^3: DBI::SQLite slowness

by erix (Vicar)
on Sep 21, 2013 at 13:13 UTC ( #1055127=note: print w/ replies, xml ) Need Help??


in reply to Re^2: DBI::SQLite slowness
in thread DBI::SQLite slowness

If I simply have "$dbh->commit" following the For loop, is that effectively the same as explicitly using transactions?

No; BEGIN WORK (or BEGIN TRANSACTION or just plain BEGIN) starts a transaction in SQL. Then (normally after 1 or more insert/delete/update's) COMMIT commits the transaction (or, when an error occurs, ROLLBACK rolls the transaction back to the state just before the BEGIN).

See DBI's begin_work

(I don't use SQLite; in the above I am assuming SQLite does this the same way that other (RDBMS) databases do)


Comment on Re^3: DBI::SQLite slowness
Re^4: DBI::SQLite slowness
by Endless (Beadle) on Sep 21, 2013 at 18:20 UTC

      That's actually the same behaviour as other DB's have.

      But only now do I see the initial thread/problem (Scaling Hash Limits). (It's useful to link to original threads in follow-up posts, you know). With the relatively small sizes involved, a database doesn't seem necessary.

      If the problem is that simple, can't you just run

      sort -u dupslist > no_dupslist

      on your id list? Perhaps not very interesting, or fast (took about 7 minutes in a 100M test run here), but about as simple as it gets.

      (BTW, just another datapoint (as I did the test already): PostgreSQL (9.4devel) loads about 9000 rows/s, on a slowish, low-end desktop. That's with the laborious INSERT-method that your script uses; bulk-loading (with COPY) loads ~ 1 million rows /second (excluding any de-duplication):

      perl -e 'for (1..50_000_000){ printf "%012d\n", $_; }' > t_data.txt; echo " drop table if exists t; create unlogged table t(klout integer); " | psql; echo "copy t from '/tmp/t_data.txt'; " | psql time < t_data.txt psql -c 'copy t from stdin' real 0m25.661s
      That's a rate of just under 2 million per second

      )

      UPDATE: added 'unlogged', adjusted timings (it makes the load twice as fast)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1055127]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (13)
As of 2015-07-01 21:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (22 votes), past polls