Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^3: find difference in dates/items in same column

by dragonchild (Archbishop)
on Nov 14, 2008 at 20:14 UTC ( #723719=note: print w/ replies, xml ) Need Help??


in reply to Re^2: find difference in dates/items in same column
in thread find difference in dates/items in same column

Add indices to your columns. Also, why do you need the COUNT(*)? Removing that if you don't need it will speed you significantly.


My criteria for good software:
  1. Does it work?
  2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?


Comment on Re^3: find difference in dates/items in same column
Re^4: find difference in dates/items in same column
by zerocred (Beadle) on Nov 15, 2008 at 07:12 UTC
    Indexes is a good idea but I already have the relevant columns already indexed. The count is there to sequence the same brands like:
    id brand rank=(count*) 1 A 1 2 B 1 3 A 2 4 A 3 5 B 2 ...
    etc. That way the dates from consecutive brand appearances can be compared by the rank=rank+1 for the datediff() comparison. There already is an autoincrement id column but it can't be used for comparing consecutive dates of the same brand.

      How fast would be the following query?

      select brand, id, date, datediff(day, (select max(T2.date) from test as T2 where T1.brand = T2.brand and T2.date < T1.date) ,date) from test as T1 order by brand, id

      I am not sure if that syntax works for MySQL. May be it must be translated to INNER JOIN

        Great! 120k rows in 10 seconds on MySQL on Linux and 17 seconds on MS Access
        (there's no 'day' parameter mysql's DATEDIFF - when i cracked that it worked!)
        select brand, id, date, datediff( date, (select max(T2.date) from test as T2 where T1.brand = T2.brand and T2.date < T1.date) ) from test as T1 order by brand, id
        Your solution is vastly superior (simpler/faster) to the one in MYSQL cookbook or any SQL forum/discussion I have seen!
        Thanks that's great!
      Ok - I'm interested now. I'm going to assume that the following is the table definition:
      CREATE TABLE `test` ( `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,`brand` VARCHAR(255) NOT NULL ,`date` DATE NOT NULL ,INDEX (`brand`,`date`) ,INDEX (`date`,`brand`) );
      Given that, you can do something assuming that the number of reads is at least 20x the number of writes (otherwise it's really not that performant). Basically, the idea is to maintain the `rank` column on each INSERT, UPDATE, or DELETE. Your writes get slow, but your reads get fast. This, essentially, is a cache. You can do this maintenance either in code or using a trigger. MySQL has triggers, so that works out nicely.

      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
        I think your brand/date and reverse date/brand index is a great idea for this and all sorts of other applications! I hadn't thought to set them up quite that way in this case.

        To maintain the rank would mean searching backward through the table to find the last 'brand' entry and its corresponding rank. I'll investgate triggers and think about how to do it when the new records are being inserted rather than after the fact.

        Thanks for all the help! it has been great... Apologies for contaminating a perl forum with what turned out to be a SQL question!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://723719]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (9)
As of 2014-08-23 10:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (173 votes), past polls