Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: DBI vs Bulk Loading

by jimbus (Friar)
on Sep 15, 2005 at 14:51 UTC ( #492253=note: print w/ replies, xml ) Need Help??


in reply to DBI vs Bulk Loading

By summing, I mean that if I had all of the records in a table without tstamp as a primary key, but wanted to create a new table that was modifiable to have it, I would:

Create table new_table as select tstamp, sum(op100), sum(op700), sum(total) from old_table group by tstamp

And then add my primary key rule

The script runs every fifteen minutes. The data is from a web services broker that appears to only put one of 8 or 9 opcode types per file. so you'll see a file with 4-5k of WAP 1.X requests followed by a bunch of files that have 2 or 4 (no k) ldap updates or other such maintenance traffic, followed by a file with 2-4k of WAP 2.0 requests... All in All, I end up with a couple thousand files per day and I'm pulling around 800-850k lines of data and digesting it down to a max of 86,400. I'm thinking that I'm running about 1500 pk violations per day, which is why I can to the conclusion that my original algorithm was too inefficient.

I don't think IGNORE is a valid option, because I don't want to lose any data. And it appears I was remembering Oracle when I thought it would write erring records to a file... or it was just wishful thinking.

Since I know my range of time stamps and they should be limited to a query size of ~900 per run, that LanceDeeply's suggestion is my best option.

Thanks!

jimbus

Never moon a werewolf!


Comment on Re: DBI vs Bulk Loading
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://492253]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (15)
As of 2015-07-01 19:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (17 votes), past polls