Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: large perl module

by DrHyde (Prior)
on Mar 05, 2010 at 12:53 UTC ( [id://826963]=note: print w/replies, xml ) Need Help??


in reply to large perl module

Sounds like you're doing something remarkably similar to Number::Phone::UK::Data. That originally had a humungous hash in it. It got faster and smaller when I switched to embedding a DBM::Deep database in the module, as a __DATA__ segment. Check out what I've done in that module, and we should probably collaborate further too, as I also have Number::Phone::NANP::* modules in the same distribution ...

Replies are listed 'Best First'.
Re^2: large perl module
by techcode (Hermit) on Mar 05, 2010 at 18:50 UTC

    It depends on the thing you are searching - that is how are you searching it. I've tried DBM::Deep for prefix search exactly because of the speed it's docs mention. However, when tested, even from a RAM disk (aka /dev/shm) it was at least two/tree orders of magnitude (not sure exactly) slower than loading up everything in HoHoH... http://en.wikipedia.org/wiki/Trie and searching by going "up the tree".

    Of course, I had the luxury or all requests going through one process that's doing prefix search and routing requests depending on it. Which might also be a solution for the original question.

    You could implement the mod_perl handler only to pass on the requests, well put in work queue, and wait for results in results queue. And have one process (or several, possibly on several different servers) that do the actual (is it prefix?) searches - reading requests from the in/work queue, and putting them back in the out_queue.

    The trie implementation was able to find something like 50K random prefixes per second in a loop - from the pool of 45 to 50K prefixes in the database (loaded on startup into large HoH trie). 50.000 per second should outperform any web server. And that's on x2 AMD with 1 GB or RAM ...

    You could use Memcache for both input queue, and output - implementation (in Perl!) can be seen here: http://3.rdrail.net/blog/memcached-based-message-queues/

    PS - I've been contracting for the past year, for an company that's in SMS gateway business. So I get to tweak code that searches prefixes a lot ;)


    Have you tried freelancing/outsourcing? Check out Scriptlance - I work there since 2003. For more info about Scriptlance and freelancing in general check out my home node.
Re^2: large perl module
by minek (Novice) on Mar 06, 2010 at 01:28 UTC
    Hi Dave, I actually use couple of your modules in my project, and I'm doing exactly what you did for UK mobile data, but for USA/CAN. It's also an SMS messaging server.
    I couldn't find any reliable data for the carrier lookup for NANP numbers, and I ended buying the data from a commercial supplier.

    Instead of DBM I'm using my own object persistance module, which works basically the same as DBM::Deep but faster. I stored the data in one data file per area code.

    Each data file was on average 20-40kB. The lookups were doing fine and were fast, the only problem was with IMPORTING huge CSV files with e.g. 20k phone numbers.
    The whole import procedure had to be finished within 20s, including saving them in the DB, checking if they unique, etc..., and for 10k numbers, it means 10k lookups - reading on average 300MB of data.
    At this point it was taking on average 70s, so I moved the data into something like this:
    use constant _nanp_area => { 201 => { 4143 => 267, 206 => 357, .... }, 604 => { ... }, .... }; # { # { area_code1 => { prefix => carrier_id, ... }, # { area_code2 => { prefix => carrier_id, ... }, # }
    So it's basically a hash of hashes of 3 or 4 digit prefixes and corresponding carriers.
    I didn't do exact measurements, but it's fast enough, and works for us just fine. 10k phone numbers gets imported and stored in DB within 14s.
    The lookup function first tries to find the carrier based on area code and 4 digit prefix, and if undef, then with 3 digit prefix.
    At this point everything seems to work OK and fast, but I will keep my eye on it...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://826963]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (4)
As of 2024-04-25 09:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found