Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

find difference in dates/items in same column

by zerocred (Beadle)
on Nov 14, 2008 at 09:06 UTC ( [id://723592]=perlquestion: print w/replies, xml ) Need Help??

zerocred has asked for the wisdom of the Perl Monks concerning the following question:

The problem is finding differences (delta days, whatever) between two dates (or numbers) in the same column of a mysql database. I would have thought this was a common problem but couldn't find a good way to do it in SQL.
I couldn't find a suitable solution for so made one myself... (apologies for the rough non-optimum perl code)..

Has anyone got a better way, improvements, comments and optimizations appreciated!

The real code creates another table with the differences in.

I tried this SQL code from http://forums.databasejournal.com/showthread.php?t=40845 it was helpful, but it was v-e-r-y s-l-o-w for my big table:
# sql to calculate date differences in same column # SELECT B.* # FROM # (SELECT ordernum, agent, MIN([datetime]) AS mindatetime # FROM YourTable # GROUP BY ordernum, agent) AS A # JOIN YourTable AS B # ON A.ordernum = B.ordernum # AND A.agent = B.agent # AND (A.mindatetime = B.[datetime] OR DATEADD(minute, 120, A.mindatet +ime) <= B.[datetime])
Actually using the hashes to keep track of things allows lots of cool things to be calculated in from the database (like delta of dates, time, weight etc, )
#!/usr/bin/perl #query to get the difference between dates and things in the same colu +mns... use strict; use DBI; use Time::Local; ##################### mysql definitions #password and access my $db = "new" ; my $host = "192.168.1.20" ; my $user = "user" ; my $pass = "pass" ; my $dbh = DBI->connect ("DBI:mysql:database=$db:host=$host",$use +r,$pass)or die "Can't connect to database: $DBI::errstr\n"; ############################ mysql connection definitions #the key thing is to do a select query that results in the items being + in order in the resultset... my $sql_query ="SELECT brand, date, weight FROM test2 ORDER BY brand, +date ASC"; my $sth_query = $dbh->prepare($sql_query); $sth_query->execute(); ####################################################### my $brand=""; my $date_old=timelocal(0,0,0,10,10,1980); #just set it to something my $date; my $date_diff; my $weight; my $weight_old; my %samples; my %dates; my @row; while(@row = $sth_query->fetchrow_array()){ $brand = $row[0]; $date = $row[1]; $weight = $row[2]; $samples{$brand}++; #increment the number of times this brand +has appeared if($samples{$brand} ==1){ #if it is the first time then make old = new + so that no big step changes in weight etc recorded $date_old = $date; $weight_old = $weight; } #difference in two datestrings calculation - convert to seconds - I'm +sure there are better ways to do this.. $date =~ m|(\d{4}).(\d{2}).(\d{2})|; my $date_new1= timelocal(0,0,0,$3, $2-1, $1); #the '-1' cos locatime(0 +,0,0,dd,mm,yyyy) months count from zero $date_old =~ m|(\d{4}).(\d{2}).(\d{2})|; my $date_old1= timelocal(0,0,0,$3, $2-1, $1); $date_diff = $date_new1- $date_old1; # and subtract to get difference +in seconds $date_diff = $date_diff / 60 / 60 / 24; # then divide by number of sec +onds in a day to get days # voila! print "$brand,\t$samples{$brand}\t$date\t$date_old\t$date_diff,\t$weig +ht,\t$weight_old,\t".($weight-$weight_old)."\n"; $date_old = $date; $weight_old = $weight; } $sth_query->finish; $dbh->disconnect;
#here's table crating code for your conveinience
CREATE TABLE `new`.`test2` ( `id` int(11) NOT NULL auto_increment, `brand` varchar(32) NOT NULL, `date` date NOT NULL, `weight` int(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM AUTO_INCREMENT=10 DEFAULT CHARSET=latin1; INSERT INTO `new`.`test2` VALUES (1,'C','2008-02-06',15), (2,'C','2008-02-05',20), (3,'A','2008-02-04',140), (4,'C','2008-02-03',10), (5,'B','2008-02-01',190), (6,'A','2008-02-01',150), (7,'B','2008-01-03',200), (8,'A','2008-01-01',110), (9,'A','2007-01-01',100);
output:
A, 1 2007-01-01 2007-01-01 0, 100, 100, + 0 A, 2 2008-01-01 2007-01-01 365, 110, 100, + -10 A, 3 2008-02-01 2008-01-01 31, 150, 110, + -40 A, 4 2008-02-04 2008-02-01 3, 140, 150, + 10 B, 1 2008-01-03 2008-01-03 0, 200, 200, + 0 B, 2 2008-02-01 2008-01-03 29, 190, 200, + 10 C, 1 2008-02-03 2008-02-03 0, 10, 10, + 0 C, 2 2008-02-05 2008-02-03 2, 20, 10, + -10 C, 3 2008-02-06 2008-02-05 1, 15, 20, + 5
Ooops got the weight subtraction back to front... and there may need to be a -1900 in the date delta calc.

Replies are listed 'Best First'.
Re: find difference in dates/items in same column
by ccn (Vicar) on Nov 14, 2008 at 09:56 UTC

    You can add a calculated column to your SQL query

    SELECT brand, date, weight, datediff(day, '2001/01/01', date) AS daysS +inceMillemium FROM test2 ORDER BY brand, date ASC

    And then use [daysSinceMillemium] column to calculate differences in days. Thus you use simple arithmetic instead of Time::Local.

      Thanks, unfortunately its more complicated...

      The SELECT datediff() is fine for finding ages from a fixed date but not for finding deltas between dates in the same column. Datediff can be used in SQL but then it begins to look like the SQL example and takes ages to run on my 120,000 records with about ~10,000 brands.

        I think there is no better way than you have choosen already. All that I can suggest is cosmetic improvements only.

Re: find difference in dates/items in same column
by dragonchild (Archbishop) on Nov 14, 2008 at 15:04 UTC
    While I can't offer any better solutions (because I haven't really spent any time thinking on the problem), it might help to know why you're having issues. In relational theory, a SELECT statement is a view transformation on each row in the resulting (often temporary) table from the JOIN clause (modified by the GROUP BY clause). The WHERE clause is a limiter on that table. Given that you want to do a transformation crossing rows, that's where your problem is. Essentially, you need to create a table where the column you're interested in appears twice in each row so that your SELECT function can work on a given row.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      I tried the techniques here:
      http://www.onlamp.com/pub/a/onlamp/excerpt/mysqlckbk/index3.html?page=2
      it looks like this:
      SELECT a.brand, a.date AS newdate, b.date AS olddate, DATEDIFF(a.date,b.date) AS diff FROM( SELECT brand, date, ( SELECT count(*) FROM t_detail_hkjc WHERE brand=T.brand AND date<T.date )+1 AS rank1 FROM test AS T ) AS a INNER JOIN ( SELECT brand, date, ( SELECT count(*) FROM t_detail_hkjc WHERE brand=T1.brand AND date<T1.date ) AS rank2 FROM test2 AS T1 ) AS b ON (a.rank1=b.rank2) AND (a.brand = b.brand) ORDER BY a.brand, a.date DESC ;
      This does work like you suggest by creating the T and T1 alias/copy tables then you line up the sequence numbers (rank1 and rank2) and process the columns of interest.

      The problem is it takes about 1 hour to return the query on about 120,000 records... Even merely sequencing the brands alone (the inner Select count(*) part) takes about 7 minutes). I think the SQL is slow because the whole table is searched each time to find the next in sequence. By using my perl program and executing the Select to return the table in the sorted order you know the next record is the one you are looking for. It runs in about 30secs - 1min on the same 120k record table and writes the answer to a new table. Fortunately the data table is only updated about once per week so the query table can be generated immediately after.

      When I started out looking to solve this problem I thought it would be a relatively easy piece of SQL to solve a very common problem, e.g. calculate the time between time stamps in a database file for example. But in SQL it seems not;) - I'm no expert in SQL found it tough going. I wish I'd thought of the perl solution earlier!
      oopps error in the earlier code I accidentally poseted while not logged in.

      I tried the techniques here:
      http://www.onlamp.com/pub/a/onlamp/excerpt/mysqlckbk/index3.html?page=2
      it looks like this:
      SELECT a.brand, a.date AS newdate, b.date AS olddate, DATEDIFF(a.date,b.date) AS diff FROM( SELECT brand, date, ( SELECT count(*) FROM test WHERE brand=T.brand AND date<T.date )+1 AS rank1 FROM test AS T ) AS a INNER JOIN ( SELECT brand, date, ( SELECT count(*) FROM test WHERE brand=T1.brand AND date<T1.date ) AS rank2 FROM test2 AS T1 ) AS b ON (a.rank1=b.rank2) AND (a.brand = b.brand) ORDER BY a.brand, a.date DESC ;
      This does work like you say by creating the T and T1 alias/copy tables then you line up the sequence numbers (rank1 and rank2) and process the columns of interest.

      The problem is it takes about 1 hour to return the query on about 120,000 records... Even merely sequencing the brands alone (the inner Select count(*) part) takes about 7 minutes). I think teh SQL is slow because the whole table is searched each time to find the next in sequence. By using my perl program and executing the Select to return the table in the sorted order you know the next record is the one you are looking for. It runs in about 30secs -1min onthe same table.

      When I started out looking to solve this problem I thought it would be a relatively easy piece of SQL to solve a very common problem, e.g. calculate the time between time stamps in a log file for example. But in SQL it seems not;)
        Add indices to your columns. Also, why do you need the COUNT(*)? Removing that if you don't need it will speed you significantly.

        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://723592]
Approved by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (6)
As of 2024-03-19 08:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found