Syntactic Confectionery Delight PerlMonks

### Re: find difference in dates/items in same column

by dragonchild (Archbishop)
 on Nov 14, 2008 at 15:04 UTC ( #723673=note: print w/replies, xml ) Need Help??

While I can't offer any better solutions (because I haven't really spent any time thinking on the problem), it might help to know why you're having issues. In relational theory, a SELECT statement is a view transformation on each row in the resulting (often temporary) table from the JOIN clause (modified by the GROUP BY clause). The WHERE clause is a limiter on that table. Given that you want to do a transformation crossing rows, that's where your problem is. Essentially, you need to create a table where the column you're interested in appears twice in each row so that your SELECT function can work on a given row.

My criteria for good software:
1. Does it work?
2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
• Comment on Re: find difference in dates/items in same column

Replies are listed 'Best First'.
Re^2: find difference in dates/items in same column
by Anonymous Monk on Nov 14, 2008 at 18:44 UTC
I tried the techniques here:
http://www.onlamp.com/pub/a/onlamp/excerpt/mysqlckbk/index3.html?page=2
it looks like this:
```SELECT a.brand,
a.date AS newdate,
b.date AS olddate,
DATEDIFF(a.date,b.date) AS diff
FROM(
SELECT brand, date,
(
SELECT count(*)
FROM t_detail_hkjc
WHERE brand=T.brand
AND date<T.date
)+1
AS rank1
FROM test
AS T
)
AS a INNER JOIN
(
SELECT brand, date,
(
SELECT count(*)
FROM t_detail_hkjc
WHERE brand=T1.brand
AND date<T1.date
)
AS rank2
FROM test2
AS T1
)
AS b ON (a.rank1=b.rank2)
AND (a.brand = b.brand)
ORDER BY a.brand,
a.date DESC
;
This does work like you suggest by creating the T and T1 alias/copy tables then you line up the sequence numbers (rank1 and rank2) and process the columns of interest.

The problem is it takes about 1 hour to return the query on about 120,000 records... Even merely sequencing the brands alone (the inner Select count(*) part) takes about 7 minutes). I think the SQL is slow because the whole table is searched each time to find the next in sequence. By using my perl program and executing the Select to return the table in the sorted order you know the next record is the one you are looking for. It runs in about 30secs - 1min on the same 120k record table and writes the answer to a new table. Fortunately the data table is only updated about once per week so the query table can be generated immediately after.

When I started out looking to solve this problem I thought it would be a relatively easy piece of SQL to solve a very common problem, e.g. calculate the time between time stamps in a database file for example. But in SQL it seems not;) - I'm no expert in SQL found it tough going. I wish I'd thought of the perl solution earlier!
Re^2: find difference in dates/items in same column
by zerocred (Beadle) on Nov 14, 2008 at 18:48 UTC
oopps error in the earlier code I accidentally poseted while not logged in.

I tried the techniques here:
http://www.onlamp.com/pub/a/onlamp/excerpt/mysqlckbk/index3.html?page=2
it looks like this:
```SELECT a.brand,
a.date AS newdate,
b.date AS olddate,
DATEDIFF(a.date,b.date) AS diff
FROM(
SELECT brand, date,
(
SELECT count(*)
FROM test
WHERE brand=T.brand
AND date<T.date
)+1
AS rank1
FROM test
AS T
)
AS a INNER JOIN
(
SELECT brand, date,
(
SELECT count(*)
FROM test
WHERE brand=T1.brand
AND date<T1.date
)
AS rank2
FROM test2
AS T1
)
AS b ON (a.rank1=b.rank2)
AND (a.brand = b.brand)
ORDER BY a.brand,
a.date DESC
;
This does work like you say by creating the T and T1 alias/copy tables then you line up the sequence numbers (rank1 and rank2) and process the columns of interest.

The problem is it takes about 1 hour to return the query on about 120,000 records... Even merely sequencing the brands alone (the inner Select count(*) part) takes about 7 minutes). I think teh SQL is slow because the whole table is searched each time to find the next in sequence. By using my perl program and executing the Select to return the table in the sorted order you know the next record is the one you are looking for. It runs in about 30secs -1min onthe same table.

When I started out looking to solve this problem I thought it would be a relatively easy piece of SQL to solve a very common problem, e.g. calculate the time between time stamps in a log file for example. But in SQL it seems not;)
Add indices to your columns. Also, why do you need the COUNT(*)? Removing that if you don't need it will speed you significantly.

My criteria for good software:
1. Does it work?
2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Indexes is a good idea but I already have the relevant columns already indexed. The count is there to sequence the same brands like:
```id brand rank=(count*)
1  A     1
2  B     1
3  A     2
4  A     3
5  B     2
...
etc. That way the dates from consecutive brand appearances can be compared by the rank=rank+1 for the datediff() comparison. There already is an autoincrement id column but it can't be used for comparing consecutive dates of the same brand.

Create A New User
Node Status?
node history
Node Type: note [id://723673]
help
Chatterbox?
and one hand claps...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (7)
As of 2018-02-21 05:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
When it is dark outside I am happiest to see ...

Results (275 votes). Check out past polls.

Notices?