Thanks for the offer of extra eyes - I am in fact somewhat databatially
challenged.
The goal is to take a list of real events and compare this with a list of
potential events in order to do a statistical analysis of the probability of
events occurring.
The two tables are created like this:
CREATE TABLE Real_Events (
Id INT,
Name1 VARCHAR,
Name2 VARCHAR,
Date INT,
Group VARCHAR,
)
CREATE TABLE Potential_Events (
Name1 VARCHAR,
Name2 VARCHAR,
Group VARCHAR,
)
The SELECTs I need are, for a given event in Real_Events:
- all the real events in the same Group with occur later, but not more than
a certain number of days later. e.g.
SELECT * from Events WHERE Group="physicists" AND Date>$event_date
+AND
Date<=($event_date + $window))
- all the events in Potential_Events in the same Group with all combinations of
Names found in the previous SELECT, e.g. if a real Alice-Bob event and a real
Bob-Charlie event were found, I would then have
SELECT * from Potential_Events WHERE
Group="physicists" AND ( ( Name1="Alice" AND Name2="Bob" ) OR (
Name1="Alice" AND Name2="Charlie" ) OR ( Name1="Bob" and Name2="Ch
+arlie" ) )
- All potential events found then have to be checked against real events. If
a potential event is found in the table of real events, it is discarded if has
already taken place relative to the date of the real event being considered.
So for a given potential event I have, say,
SELECT * from Events WHERE Group="physicists" AND Name1="Alice" AND
Name2="Bob" AND Date>$event_date)
I currently read the entire table of actual events into an array of hashrefs.
This allows me to define a look-up table on, say, 'Group':
push (@{$lookup_group{$_->{"Group"}}}, $_) for @rows;
I can use this to get the rows for a given group and then loop over these to
check the date. A similar approach can be used to create a look-up table
with "$Name1$Name2$Group" as the key, which can be used to check whether a potential event has already occured
I hope that gives a rough idea of what I'm trying to do. The program is orginally from a user whom I am trying to help get his data crunched before a deadline. So optimisation time plus run time has to comply with this restriction. Apart from that, unfortunately I also have other work to do, so the time I have to work on this also limited, but I would be greatful for any pointers to low-hanging fruit.
Thanks,
loris |