Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^6: Creating SELECT-like look-up tables as hashes

by loris (Hermit)
on Dec 05, 2013 at 09:13 UTC ( #1065730=note: print w/ replies, xml ) Need Help??


in reply to Re^5: Creating SELECT-like look-up tables as hashes
in thread Creating SELECT-like look-up tables as hashes

Thanks for the offer of extra eyes - I am in fact somewhat databatially challenged.

The goal is to take a list of real events and compare this with a list of potential events in order to do a statistical analysis of the probability of events occurring.

The two tables are created like this:

CREATE TABLE Real_Events ( Id INT, Name1 VARCHAR, Name2 VARCHAR, Date INT, Group VARCHAR, ) CREATE TABLE Potential_Events ( Name1 VARCHAR, Name2 VARCHAR, Group VARCHAR, )

The SELECTs I need are, for a given event in Real_Events:

  • all the real events in the same Group with occur later, but not more than a certain number of days later. e.g.
    SELECT * from Events WHERE Group="physicists" AND Date>$event_date +AND Date<=($event_date + $window))
  • all the events in Potential_Events in the same Group with all combinations of Names found in the previous SELECT, e.g. if a real Alice-Bob event and a real Bob-Charlie event were found, I would then have
    SELECT * from Potential_Events WHERE Group="physicists" AND ( ( Name1="Alice" AND Name2="Bob" ) OR ( Name1="Alice" AND Name2="Charlie" ) OR ( Name1="Bob" and Name2="Ch +arlie" ) )
  • All potential events found then have to be checked against real events. If a potential event is found in the table of real events, it is discarded if has already taken place relative to the date of the real event being considered. So for a given potential event I have, say,
    SELECT * from Events WHERE Group="physicists" AND Name1="Alice" AND Name2="Bob" AND Date>$event_date)

I currently read the entire table of actual events into an array of hashrefs. This allows me to define a look-up table on, say, 'Group':

push (@{$lookup_group{$_->{"Group"}}}, $_) for @rows;

I can use this to get the rows for a given group and then loop over these to check the date. A similar approach can be used to create a look-up table with "$Name1$Name2$Group" as the key, which can be used to check whether a potential event has already occured

I hope that gives a rough idea of what I'm trying to do. The program is orginally from a user whom I am trying to help get his data crunched before a deadline. So optimisation time plus run time has to comply with this restriction. Apart from that, unfortunately I also have other work to do, so the time I have to work on this also limited, but I would be greatful for any pointers to low-hanging fruit.

Thanks,

loris


Comment on Re^6: Creating SELECT-like look-up tables as hashes
Select or Download Code
Re^7: Creating SELECT-like look-up tables as hashes
by BrowserUk (Pope) on Dec 05, 2013 at 09:46 UTC

    Are you using any indexes on those tables?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Yes, on Id for Real_Events and on Name1, Name2, Group for Potential_Events.

      loris

        My best DB days are long behind me (DB2), and I've done very little with sqlite so take this with a big pinch of salt, but seems to me that given your queries, an index in (Real_?)Events.Id is doing you no good at all.

        I'd be tempted to try adding an index on (at least) the date field. (And possibly dropping the one on Id.)

        As far as your HoAs is concerned, it might be worth considering sorting the arrays by data and using a binary search.

        I suspect that this thread has gotton too deep to be getting the eyeballs you really need to get good suggestions for this problem. If you can find the time, you might try consolidating the information you've given in this sub-thread and making a post to the sqlite mailing list asking for the best way to optimise those queries.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1065730]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (9)
As of 2014-12-26 21:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (176 votes), past polls