Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^2: Help on selecting and deleting many records in a DB.

by neilwatson (Curate)
on Apr 05, 2014 at 17:02 UTC ( #1081264=note: print w/ replies, xml ) Need Help??


in reply to Re: Help on selecting and deleting many records in a DB.
in thread Help on selecting and deleting many records in a DB.

Since a unique record is based on a combination of most fields the current indexes do not work well for this operation. Would adding another field, that is a concatenation of the other fields, and indexing that be a good approach?

Neil Watson
watson-wilson.ca


Comment on Re^2: Help on selecting and deleting many records in a DB.
Re^3: Help on selecting and deleting many records in a DB.
by vincent_veyron (Acolyte) on Apr 06, 2014 at 16:38 UTC

    No, you need to create an index that matches your query terms. See: http://www.postgresql.org/docs/9.1/static/indexes.html

    You can try this :

    delete from agent_log t1 where WHERE timestamp < now() - interval '7 days') and timespan < (select max(timestamp) from agent_log t2 where t2.class=t1.class and t2.hostname=t1.hostname and t2.id_address=t1.ip_address)

    I suppose you have an index on timestamp and the other fields? A surrogate key might help, or even using timespan as primary key, if possible.

    If you can create a history table, then you could:

    SELECT class, max(timestamp) as timestamp, hostname, ip_address, promise_handle, promiser, promisee, policy_server, promise_outcome FROM agent_log WHERE timestamp < now() - interval '7 days' into history_table ; delete from agent_log WHERE timestamp < now() - interval '7 days'

    That would be a good start towards partitioning your data, which you might need to consider.



    libremen.com : legal case, contract and insurance claim management software
      SELECT * FROM agent_log t1 WHERE timestamp < now() - interval '7 days' AND timestamp < ( SELECT max(timestamp) FROM agent_log t2 WHERE t2.class=t1.class AND t2.promiser=t1.promiser AND t2.hostname=t1.hostname AND t2.ip_address=t1.ip_address ) ORDER by class,timestamp;
      You almost had me there, but I don't think this will keep the maximum timestamp per day.

      Neil Watson
      watson-wilson.ca

        Well, it does if you run it every day and clean up afterwards, with a cron job for instance.

        If possible, it seems you could solve a big part of your problem using partitioning of your table, probably by day; see:
        http://www.postgresql.org/docs/9.3/static/ddl-partitioning.html



        libremen.com : legal case, contract and insurance claim management software

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1081264]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (8)
As of 2014-11-23 07:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (129 votes), past polls