Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

[OT] SQL "on duplicate key" custom action

by haukex (Chancellor)
on Oct 21, 2019 at 07:18 UTC ( #11107746=perlquestion: print w/replies, xml ) Need Help??

haukex has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

My SQL-fu is too weak for this one and Google hasn't helped yet. I'm inserting data records into a table, and there will be lots of duplicates: if the primary key exists, it should silently ignore the record if all the other columns match, but throw an error if they don't. Basically, of the following, the second should be ignored and the third should throw an error. I feel like there should be an easy solution but I'm not seeing it yet.

INSERT INTO records (PriKey, Foo, Bar) VALUES (1, 'Hello', World'); INSERT INTO records (PriKey, Foo, Bar) VALUES (1, 'Hello', World'); INSERT INTO records (PriKey, Foo, Bar) VALUES (1, 'Hello', Quz');

Also, related: Since I can choose the DB, what is your opinion on MySQL vs. Postgres? I'm thinking I'll probably use the latter but would like to hear any opinions.

Replies are listed 'Best First'.
Re: [OT] SQL "on duplicate key" custom action
by daxim (Curate) on Oct 21, 2019 at 09:30 UTC

      Thanks for all the references! Yeah, I'll go with Postgres then :-)

Re: [OT] SQL "on duplicate key" custom action
by erix (Parson) on Oct 21, 2019 at 09:43 UTC

    if the primary key exists, it should silently ignore the record if all the other columns match, but throw an error if they don't.

    I'd think a unique index including any extra columns would make this version of your INSERT work:

    drop table if exists testtable; create table testtable (pk int primary key, foo text, bar te +xt); create unique index foobar_idx on testtable (pk, foo, bar); INSERT INTO testtable (pk, foo, bar) VALUES (1, 'Hello', 'World') ON C +ONFLICT (pk,foo,bar) DO NOTHING; INSERT INTO testtable (pk, foo, bar) VALUES (1, 'Hello', 'World') ON C +ONFLICT (pk,foo,bar) DO NOTHING; INSERT INTO testtable (pk, foo, bar) VALUES (1, 'Hello', 'Quz' ) ON C +ONFLICT (pk,foo,bar) DO NOTHING;

    which yields:

    drop table if exists testtable; DROP TABLE create table testtable (pk int primary key, foo text, bar te +xt); CREATE TABLE create unique index foobar_idx on testtable (pk, foo, bar); CREATE INDEX INSERT INTO testtable (pk, foo, bar) VALUES (1, 'Hello', 'World') ON C +ONFLICT (pk,foo,bar) DO NOTHING; INSERT 0 1 INSERT INTO testtable (pk, foo, bar) VALUES (1, 'Hello', 'World') ON C +ONFLICT (pk,foo,bar) DO NOTHING; INSERT 0 0 INSERT INTO testtable (pk, foo, bar) VALUES (1, 'Hello', 'Quz' ) ON C +ONFLICT (pk,foo,bar) DO NOTHING; ERROR: duplicate key value violates unique constraint "testtable_pkey +" DETAIL: Key (pk)=(1) already exists.

    Postgres 12.0 (but this works from 9.5.x onward)

    Manual PostgreSQL 12.0: INSERT # SQL-ON-CONFLICT

    update: The first INSERT should be the same, of course, fixed. Also, removed vestiges of the original bash-version.

    update 2: Re 'MySQL vs. Postgres': I think open-sourced and completely-free PostgreSQL is a better/safer choice than proprietary Oracle MySQL. For one thing, over time Pg will get better -- even better than Oracle -- but how could Oracle ever allow cheapo MySQL to overtake its venerable and expensive Oracle database product?

      Thank you for this, I figured there was probably a fairly easy way to do it that I was missing! I'll try and compare it with Corion's trigger suggestion in terms of disk space and speed.

        I tried to get that trigger-version working but I could not (well, not in a satisfactory way). I'm beginning to think it's not possible -- so if you hit upon a trigger-solution, I'd very much like to know. Thanks!

Re: [OT] SQL "on duplicate key" custom action
by Corion (Pope) on Oct 21, 2019 at 07:36 UTC

    I think the "best" approach (if that is supported by your DBMS) are ON UPDATE triggers, or if that import is a one time thing only, creating a unique key across all columns:

    create or replace function skip_insert() returns trigger language plpg +sql as $$ begin return null; end $$; create or replace function raise_exception() returns trigger language +plpgsql as $$ begin RAISE EXCEPTION 'UPDATE changing row values not allowed'; end $$; CREATE TRIGGER mytrigger ON records INSTEAD OF UPDATE WHEN old.PriKey = new.PriKey and old.Foo=new.Foo and old.Bar=new.B +ar FOR EACH ROW EXECUTE PROCEDURE skip_insert() CREATE TRIGGER mytrigger ON records INSTEAD OF UPDATE WHEN old.PriKey = new.PriKey and (old.Foo!=new.Foo or old.Bar!=new +.Bar) FOR EACH ROW EXECUTE PROCEDURE raise_exception()

    ... but this approach does not handle NULL values.

      Do you perhaps have a tested version? I can't get this to run.

      The trigger functions are all right, it seems, but I don't see how to create the triggers themselves onto the table 'records' so that they react in the desired way. (For one thing, you can't do an 'INSTEAD OF UPDATE' on a table, and when you use a BEFORE INSERT or an AFTER INSERT trigger, this old. and new. business is not going to work.)

        No, sorry - I don't have a tested version, and I'm not on the machine where I did the searches.

        I worked from the Postgres 12 documentation and some Stackoverflow response, but I don't find them now :( I think it should be a before update trigger and the statement would be insert or update maybe. But again, I didn't test any of this, sorry.

      Thanks very much! It's not just a single import, it'll be a large import roughly once a month. So I understand you'd suggest a trigger over a unique key, may I ask why? (Performance?)

        An index across the complete table will eat up lots of disk space, and also will make each insert slow(er). Think of an index as (somewhat structured) hash keys, where you can also quickly look up strings starting with a specific string.

        If your table is largely read-only and you have the disk space to spare, the index will speed up queries maybe a bit more than an index only on the primary keys, because the DB can satisfy the query completely from the index without hitting the table storage.

        The triggers will slow down all UPDATE statements but leave SELECT untouched and also will not use additional storage.

Re: [OT] SQL "on duplicate key" custom action
by cavac (Curate) on Oct 21, 2019 at 15:12 UTC

    If you have the choice, choose PostgreSQL. It might not matter in the beginning, but projects tend to grow in complexity and PostgreSQL is a much better fit when it comes to complex data handling tasks and keeping your data integrity intact.

    Also, if you like PerlMonks, you will most likely enjoy working with the PostgreSQL community.

    perl -e 'use MIME::Base64; print decode_base64("4pmsIE5ldmVyIGdvbm5hIGdpdmUgeW91IHVwCiAgTmV2ZXIgZ29ubmEgbGV0IHlvdSBkb3duLi4uIOKZqwo=");'
      Also, if you like PerlMonks, you will most likely enjoy working with the PostgreSQL community.

      That sounds nice :-) What's the Postgres equivalent to PerlMonks?

        There are a lot of places to look at. https://www.postgresql.org/community/ is as good a place to start as any. Especially the mailing lists (yup, those things still exist) are very helpful. The mailing lists are pretty much the equivalent of PerlMonks in my opinion

        If you need help semi-urgent or just want to hang out, #postgresql on irc.freenode.net is also a good place. Saved my bacon on a couple of occasions.

        Last, but not least, if you need commercial PostgreSQL help for a big project, PostgreSQL got you covered: https://www.postgresql.org/support/professional_support/

        perl -e 'use MIME::Base64; print decode_base64("4pmsIE5ldmVyIGdvbm5hIGdpdmUgeW91IHVwCiAgTmV2ZXIgZ29ubmEgbGV0IHlvdSBkb3duLi4uIOKZqwo=");'

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11107746]
Approved by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2019-11-19 00:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Strict and warnings: which comes first?



    Results (93 votes). Check out past polls.

    Notices?