Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^4: Creating SELECT-like look-up tables as hashes

by loris (Hermit)
on Dec 03, 2013 at 09:47 UTC ( #1065393=note: print w/ replies, xml ) Need Help??


in reply to Re^3: Creating SELECT-like look-up tables as hashes
in thread Creating SELECT-like look-up tables as hashes

I did try the in-memory approach, but this didn't seem any faster than reading from the file system, presumably because the entire database file, which is only 1.4 GB, is being cached anyway.

But I shall look into indexing.

Thanks,

loris


Comment on Re^4: Creating SELECT-like look-up tables as hashes
Re^5: Creating SELECT-like look-up tables as hashes
by BrowserUk (Pope) on Dec 03, 2013 at 16:07 UTC
    presumably because the entire database file, which is only 1.4 GB, is being cached anyway.

    Quite probably. And if this machine/these machines have no other processes that are making demands upon the system cache then its probably okay to rely upon that.

    Perhaps the thing to do would be post your schema and samples of your typical queries. I don't know your DB/design expertise level, but I do know from experience, that fresh eyes and alternate ways of thinking can often suggest small changes that make huge differences.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Thanks for the offer of extra eyes - I am in fact somewhat databatially challenged.

      The goal is to take a list of real events and compare this with a list of potential events in order to do a statistical analysis of the probability of events occurring.

      The two tables are created like this:

      CREATE TABLE Real_Events ( Id INT, Name1 VARCHAR, Name2 VARCHAR, Date INT, Group VARCHAR, ) CREATE TABLE Potential_Events ( Name1 VARCHAR, Name2 VARCHAR, Group VARCHAR, )

      The SELECTs I need are, for a given event in Real_Events:

      • all the real events in the same Group with occur later, but not more than a certain number of days later. e.g.
        SELECT * from Events WHERE Group="physicists" AND Date>$event_date +AND Date<=($event_date + $window))
      • all the events in Potential_Events in the same Group with all combinations of Names found in the previous SELECT, e.g. if a real Alice-Bob event and a real Bob-Charlie event were found, I would then have
        SELECT * from Potential_Events WHERE Group="physicists" AND ( ( Name1="Alice" AND Name2="Bob" ) OR ( Name1="Alice" AND Name2="Charlie" ) OR ( Name1="Bob" and Name2="Ch +arlie" ) )
      • All potential events found then have to be checked against real events. If a potential event is found in the table of real events, it is discarded if has already taken place relative to the date of the real event being considered. So for a given potential event I have, say,
        SELECT * from Events WHERE Group="physicists" AND Name1="Alice" AND Name2="Bob" AND Date>$event_date)

      I currently read the entire table of actual events into an array of hashrefs. This allows me to define a look-up table on, say, 'Group':

      push (@{$lookup_group{$_->{"Group"}}}, $_) for @rows;

      I can use this to get the rows for a given group and then loop over these to check the date. A similar approach can be used to create a look-up table with "$Name1$Name2$Group" as the key, which can be used to check whether a potential event has already occured

      I hope that gives a rough idea of what I'm trying to do. The program is orginally from a user whom I am trying to help get his data crunched before a deadline. So optimisation time plus run time has to comply with this restriction. Apart from that, unfortunately I also have other work to do, so the time I have to work on this also limited, but I would be greatful for any pointers to low-hanging fruit.

      Thanks,

      loris

        Are you using any indexes on those tables?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1065393]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (9)
As of 2014-10-20 23:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (92 votes), past polls