Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^2: DBI::SQLite slowness

by Endless (Beadle)
on Sep 20, 2013 at 17:59 UTC ( #1055057=note: print w/ replies, xml ) Need Help??


in reply to Re: DBI::SQLite slowness
in thread DBI::SQLite slowness

If I simply have "$dbh->commit" following the For loop, is that effectively the same as explicitly using transactions?


Comment on Re^2: DBI::SQLite slowness
Re^3: DBI::SQLite slowness
by erix (Vicar) on Sep 21, 2013 at 13:13 UTC
    If I simply have "$dbh->commit" following the For loop, is that effectively the same as explicitly using transactions?

    No; BEGIN WORK (or BEGIN TRANSACTION or just plain BEGIN) starts a transaction in SQL. Then (normally after 1 or more insert/delete/update's) COMMIT commits the transaction (or, when an error occurs, ROLLBACK rolls the transaction back to the state just before the BEGIN).

    See DBI's begin_work

    (I don't use SQLite; in the above I am assuming SQLite does this the same way that other (RDBMS) databases do)

        That's actually the same behaviour as other DB's have.

        But only now do I see the initial thread/problem (Scaling Hash Limits). (It's useful to link to original threads in follow-up posts, you know). With the relatively small sizes involved, a database doesn't seem necessary.

        If the problem is that simple, can't you just run

        sort -u dupslist > no_dupslist

        on your id list? Perhaps not very interesting, or fast (took about 7 minutes in a 100M test run here), but about as simple as it gets.

        (BTW, just another datapoint (as I did the test already): PostgreSQL (9.4devel) loads about 9000 rows/s, on a slowish, low-end desktop. That's with the laborious INSERT-method that your script uses; bulk-loading (with COPY) loads ~ 1 million rows /second (excluding any de-duplication):

        perl -e 'for (1..50_000_000){ printf "%012d\n", $_; }' > t_data.txt; echo " drop table if exists t; create unlogged table t(klout integer); " | psql; echo "copy t from '/tmp/t_data.txt'; " | psql time < t_data.txt psql -c 'copy t from stdin' real 0m25.661s
        That's a rate of just under 2 million per second

        )

        UPDATE: added 'unlogged', adjusted timings (it makes the load twice as fast)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1055057]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (6)
As of 2015-07-06 09:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (71 votes), past polls