Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Daily Counters

by docbrown25 (Initiate)
on May 06, 2013 at 15:13 UTC ( #1032321=perlquestion: print w/ replies, xml ) Need Help??
docbrown25 has asked for the wisdom of the Perl Monks concerning the following question:

Hi All - Looking for some suggestions of how I can/should implement a daily counter solution for a large number of items/users (ie. millions). I need to be able to quickly lookup up the current count for a user and either increment it or skip the user based on a set limit. All the counters will be reset at every day midnight..

My first thought was to just use a BerkeleyDB file, tie it as hash, check the user_id and count, then either imcrement or skip if a limit is reached. Should i create a new file daily or reset all the values in the file?

Does any have any suggestions for alternatives that might be better? Should I use a db mysql? sqlite?

Here is my critera for this:


- must me able to be accessed my multiple scripts at the same time.
- fast lookups
- In the future might need to accessed from multiple servers

Any thoughts or suggestions is appreciated. Thanks

Comment on Daily Counters
Re: Daily Counters
by BrowserUk (Pope) on May 06, 2013 at 16:55 UTC

    I had a similar requirement a few years ago and (back then) the fastest mechanism available to me that provided shared access and fast lookup, was to use the file system.

    For sake of discussion, assuming that your userids consist of mixed case ANSI alphanemerics -- ie. 62 chars. If you have 10 million users and use the first 3 characters in their names as an index into a first level of subdirectories, you'll have (on average) 42 users in each second level subdirectory -- so lookup is fast.

    The directory structure looks like this:

    /yourapp/index/ash/ashford/7/ /bre/brent/3/ /cra/crawford/4/

    And the process of lookup/increment is:

    my $prefix = '/yourapp/index'; my $userid = ...; my $idx = substr $userid, 0, 3; my $limitReach = 1; { opendir DIR, "$prefix/$idx/$userid/" or die $!; my $count = readdir DIR; last if $count >= LIMIT; rename "$prefix/$idx/$userid/$count", "$prefix/$idx/$userid/" . $c +ount + 1 or redo; $limitReached = 0; } ## use $limitReached to decide further action

    If your data is to persist, you are going to have to do at least one directory lookup to find the DB file -- and usually more than one -- so the directory look is effectively free. And as rename is atomic, the shared data problems are taken care of without the need for time-costly, locking and polling.

    The more characters in the alphabet available for your userids, the more well spread your directory structure and the faster the lookups. The only real restriction is that the alphabet must be compatible with your file systems naming conventions, which isn't usually a problem.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
    /div

      Interesting. Thanks for the reply. I'm going to look into implementing like this.

      My user ids will be all numerics. Should I break each user_id up by each digit of the id?

      For example: $user_id = 5989358

      /pathtocountdir/$date/5/9/8/9/3/5/8/5989358/$count/

      this will also allow me to just clear out the whole /pathtocountdir/$date/ dir for previous days

      thoughts?
        Should I break each user_id up by each digit of the id?

        No. It just creates extra levels for the filesystem to lookup, which slows things down, for no benefit.

        Adding the date into the path however is a brilliant idea.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        By the way, as you only have 10 characters in your alphabet, you might want to consider using the first 4 digits split into two groups of 2:

        /pathtocountdir/date/11/22/1122333/

        Or perhaps two groups of 3:<code>/pathtocountdir/date/111/222/1112223/


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1032321]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2014-08-21 03:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (127 votes), past polls