Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Perl+PostgreSQL+GeoIP = Awesome!

by cavac (Curate)
on Nov 21, 2018 at 09:23 UTC ( #1226112=CUFP: print w/replies, xml ) Need Help??


EDIT: WARNING, THIS USES A LEGACY DATABASE THAT IS NOT UPDATED ANYMORE. Please take a look at GeoIP revisited for an updated version that uses an up-to-date version of the MaxMind GeoIP database.


Sometimes you have to work with GeoIP, e.g. mapping an IP address to the origin country. Be it for legal reasons (geoblocking) or just so you know where your target audience is coming from.

You could just make online lookups for every request. But if you are running a PostgreSQL database backend anyway, there is a simple way to do it in DB, since PostgreSQL supports a CIDR column type.

First, let us define a database table:

CREATE TABLE geoip ( netblock cidr NOT NULL, country_code text NOT NULL, country_name text NOT NULL, CONSTRAINT geoip_pk PRIMARY KEY (netblock) USING INDEX TABLESPACE "NAMEOFINDEXTABLESPACE" ) WITH ( OIDS=FALSE ) TABLESPACE "NAMEOFDATATABLESPACE"; ALTER TABLE geoip OWNER TO "mydatabaseuser";

Next, we need a bash script we can run from crontab for our daily update:

#!/usr/bin/env bash cd /home/myuser/src/geoip wget http://geolite.maxmind.com/download/geoip/database/GeoIPCountryCS +V.zip wget http://geolite.maxmind.com/download/geoip/database/GeoIPv6.csv.gz gunzip GeoIPv6.csv.gz unzip GeoIPCountryCSV.zip rm GeoIPCountryCSV.zip perl inserttodb.pl rm *.csv

And of course some perl script to parse it all and write it to the database. Small problem here, the GeoIP files list IP ranges, but we need to convert it to subnet notation (CIDR). Net::CIDR to the rescue!

#/usr/bin/env perl use DBI; use Net::CIDR; my $dbh = DBI->connect("dbi:Pg:dbname=Cables_DB;host=localhost", 'myda +tabaseuser', 'secretpassword', {AutoCommit => 0, RaiseError => 0}) or die("can't connect to DB"); $dbh->do("TRUNCATE geoip"); my $insth = $dbh->prepare_cached("INSERT INTO geoip (netblock, country +_code, country_name) VALUES (?,?,?)") or die($dbh->errstr); foreach my $file (qw[GeoIPCountryWhois.csv GeoIPv6.csv]) { print "Running on file $file...\n"; open(my $ifh, '<', $file) or die("Can't open $file"); my $linecount = 0; while((my $line = <$ifh>)) { $linecount++; chomp $line; my ($firstip, $lastip, undef, undef, $countrycode, $countrynam +e) = split/\"\,\ ?\"/, $line; $firstip = sanitize($firstip); $lastip = sanitize($lastip); $countrycode = sanitize($countrycode); $countryname = sanitize($countryname); my @cidr = Net::CIDR::range2cidr($firstip . '-' . $lastip); foreach my $subcidr (@cidr) { $insth->execute($subcidr, $countrycode, $countryname) or d +ie("Fail on $linecount $line: " . $dbh->errstr); } } close $ifh; } $dbh->commit; sub sanitize { my ($val) = @_; $val =~ s/^\"//; $val =~ s/\"$//; return $val; }

And add a crontab entry with crontab -e:

20 1 * * 1 /bin/bash /home/myuser/src/geoip/updategeoip.sh

Now we can request the country code for any IP address we encounter:

my $geoip_country = ''; my $geosth = $dbh->prepare("SELECT country_code FROM geoip WHERE ? << +netblock LIMIT 1") or croak($dbh->errstr); if(!$geosth->execute($host)) { $dbh->rollback; # Not a big problem, GEOIP is just for information + anyway } else { my $line = $geosth->fetchrow_hashref; if(defined($line->{country_code})) { $geoip_country = $line->{country_code}; } else { $geoip_country = '??'; } $dbh->rollback; }

Of course, now that the up-to-date geoip lists are in the database, it's even possible to use an ON INSERT OR UPDATE trigger to any table that needs geoip data. But that i will leave as an excercise for the reader...

perl -e 'use MIME::Base64; print decode_base64("4pmsIE5ldmVyIGdvbm5hIGdpdmUgeW91IHVwCiAgTmV2ZXIgZ29ubmEgbGV0IHlvdSBkb3duLi4uIOKZqwo=");'

Replies are listed 'Best First'.
Re: Perl+PostgreSQL+GeoIP = Awesome!
by Tux (Abbot) on Nov 21, 2018 at 12:19 UTC

    NICE! I will use this, but with a twist:

    use 5.14.2; use warnings; use Text::CSV_XS qw( csv ); use DBI; use Socket; use Net::CIDR; my $tbl = "geoip"; my $dbh = DBI->connect ("dbi:Pg:dbname=cidr", undef, undef, { AutoCommit => 0, RaiseError => 1, PrintError => 1, ShowErrorStatement => 1, }); if (grep m/\b $tbl $/ix => $dbh->tables (undef, undef, undef, undef)) +{ say "Clear table $tbl"; $dbh->do ("truncate table $tbl"); } else { say "Create table $tbl"; $dbh->do (qq; create table $tbl ( netblock cidr not null primary key, type smallint not null, ip_from text not null, ip_to text not null, ip_from_n bigint, ip_to_n bigint, country_code text not null, country_name text not null); ); } my $sth = $dbh->prepare ("insert into $tbl values (?, ?, ?, ?, ?, ?, ? +, ?)"); foreach my $file (qw( GeoIPCountryWhois.csv GeoIPv6.csv )) { print "Inserting from $file...\n"; csv (in => $file, out => undef, allow_whitespace => 1, headers => [qw( firstip lastip x1 x2 iso name )], on_in => sub { foreach my $cidr (Net::CIDR::range2cidr ("$_{firstip}-$_{l +astip}")) { my @rng = Net::CIDR::cidr2range ($cidr); my ($f, $t) = split m/\s*-\s*/ => $rng[0]; my ($type, $F, $T) = $f =~ m/:/ ? (6, undef, undef) : (4, map { unpack "L>", inet_aton $_ } $f, $t); $sth->execute ($cidr, $type, $f, $t, $F, $T, $_{iso}, +$_{name}); } }, ); } $dbh->commit;

    because I already have some scripts that use "L>" representations of IPv4 addresses as those are easy to check.

    I did not know that this dataset was publicly available. I regularly check http://whois.domaintools.com/$ip in tools that analyze break-in attempts.


    Enjoy, Have FUN! H.Merijn

      Oooops, i just saw that this version of the databases isn't updated anymore: Deprecation notice: "Updated versions of the GeoLite Legacy databases are now only available to redistribution license customers, although anyone can continue to download the March 2018 GeoLite Legacy builds. Starting January 2, 2019, the last build will be removed from our website. GeoLite Legacy database users will need to switch to the GeoLite2 or commercial GeoIP databases and update their integrations by January 2, 2019."

      Working on an updated version of this post right now.

      perl -e 'use MIME::Base64; print decode_base64("4pmsIE5ldmVyIGdvbm5hIGdpdmUgeW91IHVwCiAgTmV2ZXIgZ29ubmEgbGV0IHlvdSBkb3duLi4uIOKZqwo=");'
Re: Perl+PostgreSQL+GeoIP = Awesome!
by erix (Parson) on Nov 21, 2018 at 14:52 UTC

    NICE indeed. But TIMTOWTDI and here's one more (not better, just shorter and how I would probably have done it)

    (bash + perl, Pg connects via PG* envvars)

    wget --timestamping http://geolite.maxmind.com/download/geoip/database +/GeoIPCountryCSV.zip wget --timestamping http://geolite.maxmind.com/download/geoip/database +/GeoIPv6.csv.gz echo " create table geoip ( netblock cidr not null, country_code text not null, country_name text not null, constraint geoip_pk primary key (netblock) );"| psql \ && ( unzip -p GeoIPCountryCSV.zip ; gunzip -c GeoIPv6.csv.gz ) \ | perl -MNet::CIDR -ne ' chomp; my @arr = split( /\"\,\ ?\"/, $_ ); my @cidr = Net::CIDR::range2cidr(sanitize($arr[0]) . "-" . sani +tize($arr[1])); for my $subcidr (@cidr) { print $subcidr, "\t", sanitize($arr[4]), "\t", sanitize($arr +[5]), "\n"; } sub sanitize { my ($val) = @_; $val =~ s/^\"//; $val =~ s/\"$//; return $val; } ' | psql -c "copy geoip from stdin with (format csv, header false, del +imiter E'\t');"

    (I get 326604 rows (in 12 sec))

      As just noted in the update/edit in my thread starter, i just found out that we need to switch to the new GeoLite2 databases, because the ones i have used here are no longer supported. I'll post a new version in CUFP after i get it working and fully tested.

      Otherwise, nice use of the psql COPY command ;-)

      perl -e 'use MIME::Base64; print decode_base64("4pmsIE5ldmVyIGdvbm5hIGdpdmUgeW91IHVwCiAgTmV2ZXIgZ29ubmEgbGV0IHlvdSBkb3duLi4uIOKZqwo=");'

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://1226112]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2019-02-19 17:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I use postfix dereferencing ...









    Results (105 votes). Check out past polls.

    Notices?
    • (Sep 10, 2018 at 22:53 UTC) Welcome new users!