Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

GeoIP revisited

by cavac (Curate)
on Nov 23, 2018 at 12:35 UTC ( #1226223=CUFP: print w/replies, xml ) Need Help??

A couple of days ago i looked into GeoIP in Perl+PostgreSQL+GeoIP = Awesome!. Since then i have learned that the GeoIP lists i was using were out of support. The new public lists are in a new format. So i took the opportunity to rewrite the whole thing and do pretty much everything with Perl, not using external commands like "unzip" and "wget". This should make things a bit more portable.

I'm sorry, it isn't written "nice" and isn't really documented. I designed it as a cron job for a single private server ;-)

I'm still calling the perl interpreter from a bash script so i can set the correct environment variables and stuff. But it's a lot smaller now:

#!/usr/bin/env bash . ~/.bashrc_activestate cd /home/myuser/src/geoip perl updategeoip.pl

The database tables stays exactly the same as in the last post, here again for reference:

CREATE TABLE geoip ( netblock cidr NOT NULL, country_code text NOT NULL, country_name text NOT NULL, CONSTRAINT geoip_pk PRIMARY KEY (netblock) USING INDEX TABLESPACE "NAMEOFINDEXTABLESPACE" ) WITH ( OIDS=FALSE ) TABLESPACE "NAMEOFDATATABLESPACE"; ALTER TABLE geoip OWNER TO "mydatabaseuser";

And here is the new all-in-one script:

#!/usr/bin/env perl use strict; use warnings; use diagnostics; use LWP::Simple qw[get]; use Archive::Zip; use Data::Dumper; use DBI; my $dbh = DBI->connect("dbi:Pg:dbname=Cables_DB;host=localhost", 'myda +tabaseuser', 'secretpassword', {AutoCommit => 0, RaiseError => 0}) or die("can't connect to DB"); my $dodownload = 1; if($dodownload) { print "Downloading...\n"; my $url = 'http://geolite.maxmind.com/download/geoip/database/GeoL +ite2-Country-CSV.zip'; my $zipdata = get $url; if(!defined($zipdata) || !length($zipdata)) { die("Could not get ZIP data\n"); } open(my $OFH, '>', 'geoip.zip') or die($!); binmode $OFH; print $OFH $zipdata; close $OFH; } my $zip = Archive::Zip->new(); $zip->read('geoip.zip'); my @fnames = $zip->memberNames(); #print Dumper(\@fnames); # Read english "locations" file my ($locmembername) = $zip->membersMatching('.*\/GeoLite2\-Country\-Lo +cations\-en\.csv$'); $zip->extractMember($locmembername, 'locations.csv'); open(my $locfh, '<', 'locations.csv') or die($!); my %locations; my $tmp = <$locfh>; # Ignore header; while((my $line = <$locfh>)) { chomp $line; my ($id, undef, $continentcode, $continentname, $countrycode, $cou +ntryname) = split/\,/, $line; if($countrycode eq '' && $continentcode ne '') { $countrycode = $continentcode; } if($countryname eq '' && $continentname ne '') { $countryname = $continentname; } if($id eq '' || $countrycode eq '' || $countryname eq '') { print("Locations read error on line: $line\n"); next; } my %country = ( code => $countrycode, name => $countryname, ); $locations{$id} = \%country; } close $locfh; # Extract IPv4 my ($ipv4membername) = $zip->membersMatching('.*\/GeoLite2\-Country\-B +locks\-IPv4\.csv$'); $zip->extractMember($ipv4membername, 'ipv4.csv'); # Extract IPv6 my ($ipv6membername) = $zip->membersMatching('.*\/GeoLite2\-Country\-B +locks\-IPv6\.csv$'); $zip->extractMember($ipv6membername, 'ipv6.csv'); # Load both files my @netblocks; foreach my $fname (qw[ipv4.csv ipv6.csv]) { open(my $ipfh, '<', $fname) or die($!); $tmp = <$ipfh>; # Ignore header while((my $line = <$ipfh>)) { chomp $line; my ($cidr, $countryid, $countryid2, $countryid3) = split/\,/, +$line; if($countryid eq '' && $countryid2 ne '') { $countryid = $countryid2; } if($countryid eq '' && $countryid3 ne '') { $countryid = $countryid3; } if($cidr eq '' || $countryid eq '') { print ("Unknown country in File $fname on line: $line\n"); next; } if(!defined($locations{$countryid})) { die("Fail in file $fname, country code $countryid unknown +in line: $line\n"); } my %netblock = ( cidr => $cidr, code => $locations{$countryid}->{code}, name => $locations{$countryid}->{name}, ); push @netblocks, \%netblock; } close $ipfh; } print 'Loaded ', scalar @netblocks, " entries\n"; unlink('locations.csv'); unlink('ipv4.csv'); unlink('ipv6.csv'); print "Truncating table...\n"; $dbh->do("TRUNCATE geoip"); my $insth = $dbh->prepare_cached("INSERT INTO geoip (netblock, country +_code, country_name) VALUES (?,?,?)") or die($dbh->errstr); my $linecount = 0; print "Inserting data into table...\n"; foreach my $block (@netblocks) { $linecount++; if($linecount % 10000 == 0) { my $percent = int(($linecount / scalar @netblocks) * 10000) / +100; print " $linecount = $percent%...\n"; } my $countrycode = sanitize($block->{code}); my $countryname = sanitize($block->{name}); my $cidr = $block->{cidr}; if(!$insth->execute($cidr, $countrycode, $countryname)) { $dbh->rollback; print "Fail on $linecount: ", $dbh->errstr, "\n", Dumper($bloc +k), "\n"; exit(1); } } $dbh->commit; $dbh->disconnect; unlink('geoip.zip'); print "Done\n"; exit(0); sub sanitize { my ($val) = @_; $val =~ s/^\"//; $val =~ s/\"$//; return $val; }

This script uses the newish "GeoLite2" databases from MaxMind. If you use them, please make sure you comply to the open source licensing stated on their official page

.
perl -e 'use MIME::Base64; print decode_base64("4pmsIE5ldmVyIGdvbm5hIGdpdmUgeW91IHVwCiAgTmV2ZXIgZ29ubmEgbGV0IHlvdSBkb3duLi4uIOKZqwo=");'

Replies are listed 'Best First'.
Re: GeoIP revisited
by hippo (Chancellor) on Nov 23, 2018 at 14:11 UTC
    So i took the opportunity to rewrite the whole thing and do pretty much everything with Perl, not using external commands like "unzip" and "wget". This should make things a bit more portable.

    Very good (++). I must say that this was one of the first things which struck me while reading your original post and thought that it would be a good exercise to bring everything into Perl instead. Now having both posts here is a nice example for how to do just that.

      Thanks :-)

      perl -e 'use MIME::Base64; print decode_base64("4pmsIE5ldmVyIGdvbm5hIGdpdmUgeW91IHVwCiAgTmV2ZXIgZ29ubmEgbGV0IHlvdSBkb3duLi4uIOKZqwo=");'
Re: GeoIP revisited
by erix (Parson) on Nov 23, 2018 at 15:53 UTC

    Just a thought: you could insert a 'IF NOT EXISTS'-create just before the TRUNCATE-statement so that the program 'Just Works' if the table isn't there.

    $dbh->do(" CREATE TABLE IF NOT EXISTS geoip ( netblock cidr NOT NULL, country_code text NOT NULL, country_name text NOT NULL, CONSTRAINT geoip_pk PRIMARY KEY (netblock) );" ) or die($dbh->errstr);

    IF NOT EXISTS was added to CREATE TABLE long ago and all supported versions of postgres support it.

    (I removed the oids- and tablespace-stuff, as I didn't need that.)

      Good catch. A lot of other tables in my database already have triggers that access the geoip table (productive system with a slight dose of feature creep), so i never thought of adding the table creation into this script...

      perl -e 'use MIME::Base64; print decode_base64("4pmsIE5ldmVyIGdvbm5hIGdpdmUgeW91IHVwCiAgTmV2ZXIgZ29ubmEgbGV0IHlvdSBkb3duLi4uIOKZqwo=");'
Re: GeoIP revisited
by Tux (Abbot) on Nov 26, 2018 at 12:59 UTC

    My first stab at GeoIP2 is avalable on github:

    $ git clone github.com:Tux/GeoIP2 geoip $ cd geoip $ wget -q -m -L -nd -np -nH \ http://geolite.maxmind.com/download/geoip/database/GeoLite2-ASN-CSV.z +ip \ http://geolite.maxmind.com/download/geoip/database/GeoLite2-City-CSV. +zip \ http://geolite.maxmind.com/download/geoip/database/GeoLite2-Country-C +SV.zip $ echo "create database geoip;" | psql -f - CREATE DATABASE $ perl geoip Create table stamps Create table continent Create table country Create table ipv4 Create table provider Create table city Create table ipc4 Reading Country info ... Reading Country IPv4 info ... Reading Provider IPv4 info ... Reading City info ... Reading City IPv4 info ...

    Note that loading the database will require quite a bit of memory and may cause your machine to feel sluggish while loading. On my laptop this took 400 seconds. Once the data is in the database, the queries are quick.

    $ time perl geoip perlmonks.org GeoIP data for 66.39.54.27 - www.perlmonks.org: CIDR : 66.39.0.0/16 IP range : 66.39.0.0 - 66.39.255.255 Provider : pair Networks City : Pittsburgh, 508, 15203 Country : US United States Continent : North America Location : 40.4254 / -79.9799 (1000) 4025'31.44" / -7958' +47.64" https://www.google.com/maps/place/@40.4254,-79.9799,10z Timezone : America/New_York EU member : No Satellite : No Anon Proxy: No 0.112u 0.008s 0:00.69 15.9% 0+0k 0+0io 0pf+0w

    Enjoy, Have FUN! H.Merijn

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://1226223]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2019-10-17 08:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?