jimbus has asked for the
wisdom of the Perl Monks concerning the following question:
hello
I'm having an issue with insertions/updates using DBI. With Blokhead's help (sorry for getting snippy, Blokhead, but I do try to look things up before I ask) I turned up tracing and I'm not understanding what I'm seeing.
This is the table:
| Field | Type | Null | Key | Default
+---------+--------------+------+-----+------------------
| tstamp | timestamp | YES | PRI | CURRENT_TIMESTAMP
| url | varchar(64) | | PRI |
| average | double(17,1) | YES | | NULL
| maximum | bigint(20) | YES | | NULL
| total | double(17,0) | YES | | NULL
Here is the code in question
foreach $key (sort keys %urls)
{
my @tmpData = @{$urls{$key}};
print "\nDATA: @tmpData\n";
eval
{
$i2_stmt->execute(@tmpData[0],@tmpData[1],@tmpData[2],@tmpData[3],
+@tmpData[4
]);
};
if ($@)
{
$u2_stmt->execute(@tmpData[2],@tmpData[4],@tmpData[4],@tmpData[3],
+@tmpData[4],@tmpData[0],@tmpData[1]);
}
}
the general idea is to insert incoming data, but if a row already exists, update it instead. For test data I'm loading the same data twice, so in the trace output I should be seeing a nice flow of insert, fail, update, insert, fail, update... but I'm seeing insert, fail, update, insert, insert, insert, fail update, fail, exit.
DATA: 05-09-30 10:00 cnet.com 1.0000 1 2
-> execute for DBD::mysql::st (DBI::st=HASH(0x838fd60)~0x804d26c '
+05-09-30 1
0:00' 'cnet.com' '1.0000' '1' '2')
-> dbd_st_execute for 0815d350
Binding parameters: insert tmp_urls (tstamp,url,average,maximum,
+total) val
ues ('05-09-30 10:00','cnet.com','1.0000','1','2')
Duplicate entry '2005-09-30 10:00:00-cnet.com' for key 1 error 1062 re
+corded: Du
plicate entry '2005-09-30 10:00:00-cnet.com' for key 1
<- dbd_st_execute -2 rows
!! ERROR: 1062 'Duplicate entry '2005-09-30 10:00:00-cnet.com' for
+ key 1' (e
rr#0)
<- execute= undef at coalData.pl line 68 via ./coalData.pl line 67
-> execute for DBD::mysql::st (DBI::st=HASH(0x83c3110)~0x838fe20 '
+1.0000' '2
' '2' '1' '2' '05-09-30 10:00' 'cnet.com')
-> dbd_st_execute for 0838fe14
Binding parameters: update tmp_urls set average = round((average
+*total+'1.
0000'*'2')/(total+'2'),2), maximum = greatest(maximum,'1'),total = tot
+al + '2' w
here tstamp = '05-09-30 10:00' and url = 'cnet.com'
<- dbd_st_execute 1 rows
<- execute= 1 at coalData.pl line 70
DATA: 05-09-30 12:00 cnet.com 1.0000 1 3
-> execute for DBD::mysql::st (DBI::st=HASH(0x838fd60)~0x804d26c '
+05-09-30 1
2:00' 'cnet.com' '1.0000' '1' '3')
-> dbd_st_execute for 0815d350
Binding parameters: insert tmp_urls (tstamp,url,average,maximum,
+total) val
ues ('05-09-30 12:00','cnet.com','1.0000','1','3')
<- dbd_st_execute 1 rows
<- execute= 1 at coalData.pl line 68 via ./coalData.pl line 67
DATA: 05-09-30 13:00 cnet.com 1.2000 2 6
-> execute for DBD::mysql::st (DBI::st=HASH(0x838fd60)~0x804d26c '
+05-09-30 1
3:00' 'cnet.com' '1.2000' '2' '6')
-> dbd_st_execute for 0815d350
Binding parameters: insert tmp_urls (tstamp,url,average,maximum,
+total) val
ues ('05-09-30 13:00','cnet.com','1.2000','2','6')
Duplicate entry '2005-09-30 13:00:00-cnet.com' for key 1 error 1062 re
+corded: Du
plicate entry '2005-09-30 13:00:00-cnet.com' for key 1
<- dbd_st_execute -2 rows
!! ERROR: 1062 'Duplicate entry '2005-09-30 13:00:00-cnet.com' for
+ key 1' (e
rr#0)
<- execute= undef at coalData.pl line 68 via ./coalData.pl line 67
-> execute for DBD::mysql::st (DBI::st=HASH(0x83c3110)~0x838fe20 '
+1.2000' '6
' '6' '2' '6' '05-09-30 13:00' 'cnet.com')
-> dbd_st_execute for 0838fe14
Binding parameters: update tmp_urls set average = round((average
+*total+'1.
2000'*'6')/(total+'6'),2), maximum = greatest(maximum,'2'),total = tot
+al + '6' w
here tstamp = '05-09-30 13:00' and url = 'cnet.com'
Duplicate entry '2005-10-13 15:58:42-cnet.com' for key 1 error 1062 re
+corded: Du
plicate entry '2005-10-13 15:58:42-cnet.com' for key 1
<- dbd_st_execute -2 rows
!! ERROR: 1062 'Duplicate entry '2005-10-13 15:58:42-cnet.com' for
+ key 1' (e
rr#0)
<- execute= undef at coalData.pl line 70
DBD::mysql::st execute failed: Duplicate entry '2005-10-13 15:58:42-cn
+et.com' fo
r key 1 at ./coalData.pl line 72.
-> DESTROY for DBD::mysql::st (DBI::st=HASH(0x838fe20)~INNER)
ERROR: 1062 'Duplicate entry '2005-10-13 15:58:42-cnet.com' for
+ key 1' (e
rr#0)
<- DESTROY= undef
-> DESTROY for DBD::mysql::st (DBI::st=HASH(0x804d26c)~INNER)
ERROR: 1062 'Duplicate entry '2005-10-13 15:58:42-cnet.com' for
+ key 1' (e
rr#0)
<- DESTROY= undef
-> DESTROY for DBD::mysql::st (DBI::st=HASH(0x804d254)~INNER)
ERROR: 1062 'Duplicate entry '2005-10-13 15:58:42-cnet.com' for
+ key 1' (e
rr#0)
<- DESTROY= undef
-> DESTROY for DBD::mysql::db (DBI::db=HASH(0x838fb20)~INNER)
&imp_dbh->mysql: 806405c
ERROR: 1062 'Duplicate entry '2005-10-13 15:58:42-cnet.com' for
+ key 1' (e
rr#0)
<- DESTROY= undef
The DATA lines are a print line of the input data. I'm also seeing fresh lines being inserted with current time... which might indicate I'm seeing null timestaps... but I don't see that in the DATA lines or the binds to the update
thanks
Jimbus
Re: DBI strangeness by pg (Canon) on Oct 14, 2005 at 01:46 UTC |
Quite possible, it is because of the fact that you are using timestamp as part of the primary key.
You don't do this, NEVER do this! I have dealt with this kind of database design before. When your program execute fast enough and the timestamps of the two operations will be the same, and can cause failure (duplicate key); When it is not that fast, and you get two different timestamps, your program will work. Unpredicatable is the word, and you don't want it.
| [reply] |
|
Yes,
primary keys could be numbers generated from sequence (which garants their uniqueness) in 99% of cases. Exceptionally, you can use characters or some composite types.
Good theme for a little flame, but not perl-related :)
| [reply] |
|
| [reply] |
|
Never say NEVER
I probably should have detailed it better, but posting here is often trying to find a balance between providing enough information to get relavant answers and putting too much information and having a million and six tangents questioning every aspect of the code except the question asked. On the other hand, if you would have read as far as the code or the traces, you would have seen that your issue was anticipated and there was more involved.
The application is not real time, it is processing timestamped log data. The most important information coming from the file is urls/second. I'm deriving url/second as over all and by url. I build hashes with the per socond counts based on epic seconds and dump the data out to a database. Collisions can only occur if there is a time overlap between incoming logs. The insert is inside an eval block so that if it fails, the original and new record are summed with an update. This logical and it has been tested and works.
In the end, the app users only see data on an hourly basis, so keeping the data on a second basis was expense in both diskspace and query time. In a fit of laziness, I decide to leave the original script alone and run the coalation query in a new script and create a new database based on the hourly averages, peak and totals. I leveraged the eval insert, else update logic from the previous script and it works for the over all table, which has a single component key, the timestamp.
One difference between the new and old script is the database they are pointing at. The original one uses datatime for the tstamp field, but the new one uses timestamp. Are you inferring that there is an issue with the timestamp data type that makes it unstable? I did notice that when I made it a primary key I noticed that its default is the current time. but if you look at the data I print out and the trace's bind statements, I'm never putting a null tstamp.
I googled on tstamp and didn't find much. Mysql's reference page pretty much ignores it. It says it is like datetime, but that it will cover the differences that and when you foolow the link to timestamp properties as of 4.1, it mostly talks about 5.0. I'll alter the table so that tstamp is a datetime and test that, but I still don't see whats wrong with what I am doing
Jimbus
| [reply] |
|
The UPDATE statement can set a timestamp field to the current time automatically, maybe this is causing the problem.
poj
| [reply] |
Re: DBI strangeness by jimbus (Friar) on Oct 14, 2005 at 18:15 UTC |
Changing the column to datetime fixed the problem. There's one mythical man day I'll never get back. Now, after acknowledging my laziness in my previous comment, I feel obligated to revisit this and do it all in one step... that should finish the rest of this day
Jimbus
| [reply] |
|
|