http://www.perlmonks.org?node_id=1081264


in reply to Re: Help on selecting and deleting many records in a DB.
in thread Help on selecting and deleting many records in a DB.

Since a unique record is based on a combination of most fields the current indexes do not work well for this operation. Would adding another field, that is a concatenation of the other fields, and indexing that be a good approach?

Neil Watson
watson-wilson.ca

  • Comment on Re^2: Help on selecting and deleting many records in a DB.

Replies are listed 'Best First'.
Re^3: Help on selecting and deleting many records in a DB.
by vincent_veyron (Sexton) on Apr 06, 2014 at 16:38 UTC

    No, you need to create an index that matches your query terms. See: http://www.postgresql.org/docs/9.1/static/indexes.html

    You can try this :

    delete from agent_log t1 where WHERE timestamp < now() - interval '7 days') and timespan < (select max(timestamp) from agent_log t2 where t2.class=t1.class and t2.hostname=t1.hostname and t2.id_address=t1.ip_address)

    I suppose you have an index on timestamp and the other fields? A surrogate key might help, or even using timespan as primary key, if possible.

    If you can create a history table, then you could:

    SELECT class, max(timestamp) as timestamp, hostname, ip_address, promise_handle, promiser, promisee, policy_server, promise_outcome FROM agent_log WHERE timestamp < now() - interval '7 days' into history_table ; delete from agent_log WHERE timestamp < now() - interval '7 days'

    That would be a good start towards partitioning your data, which you might need to consider.



    libremen.com : legal case, contract and insurance claim management software
      SELECT * FROM agent_log t1 WHERE timestamp < now() - interval '7 days' AND timestamp < ( SELECT max(timestamp) FROM agent_log t2 WHERE t2.class=t1.class AND t2.promiser=t1.promiser AND t2.hostname=t1.hostname AND t2.ip_address=t1.ip_address ) ORDER by class,timestamp;
      You almost had me there, but I don't think this will keep the maximum timestamp per day.

      Neil Watson
      watson-wilson.ca

        Well, it does if you run it every day and clean up afterwards, with a cron job for instance.

        If possible, it seems you could solve a big part of your problem using partitioning of your table, probably by day; see:
        http://www.postgresql.org/docs/9.3/static/ddl-partitioning.html



        libremen.com : legal case, contract and insurance claim management software