Category: |
WWW scripts |
Author/Contact Info |
Diego Zamboni |
Description: |
Where I work, apache is configured not to resolve IP addresses
into names for the access logs. To be able to properly process
the access logs for my pages with a log-processing program
(I use webalizer) I wrote the following script to resolve the
IP addresses. Note that the local domain name needs to be
changed for when the resolved name is local (machine name only,
without domain). This happens sometimes when the abbreviated
name is before the full name in /etc/hosts, for example.
Updated: as suggested by kudra, added a comment
to the code about double-checking the name obtained, and why
we don't do it in this case. |
#!/usr/local/perl/bin/perl -w
#
# Resolve IP addresses in web logs.
# Diego Zamboni, Feb 7, 2000
use Socket;
# Local domain mame
$localdomain=".your.local.domain";
while (<>) {
@f=split;
if ($f[0] =~ /^[\d.]+$/) {
if ($cache{$f[0]}) {
$f[0]=$cache{$f[0]};
}
else {
$addr=inet_aton($f[0]);
if ($addr) {
$name=gethostbyaddr($addr, AF_INET);
if ($name) {
# NOTE: To ensure the veracity of $name, we really
# would need to do a gethostbyname on it and compare
# the result with the original $f[0], to prevent
# someone spoofing us with false DNS information.
# See the comments below. For this application,
# we don't care too much, so we don't do this.
# Fix local names
if ($name !~ /\./) {
$name.=$localdomain;
}
$cache{$f[0]}=$name;
$f[0]=$name;
}
}
}
print join(" ", @f)."\n";
}
else {
print $_;
}
}
|
RE: Resolve addresses in web access logs (risk of gethostbyaddr)
by kudra (Vicar) on May 10, 2000 at 15:27 UTC
|
Maybe I'm wrong about this, but it looks like you don't
verify the name returned by gethostbyaddr. You probably
don't need to if it's just for web statistics, but if
you, like me, are in the habit of looking back over old
code to remember how to do something, it might be a good
idea to put that in or at least put in a comment about
it, in case you need a more certain resolution for the ip
in the future.
There's a discussion of this in Perl Cookbook,
section 17.7 ('Identifying the Other End of a Socket').
It basically says that because a name lookup goes to the
name owner's DNS server, there's the possibility that the
machine could give false information. Using gethostbyname
and comparing the answer to find the original ip checks
that. It also mentions that it's still not 100% secure.
I wish I'd checked the code catacombs yesterday before I
wrote my own version of this for exactly the same purpose.
Bleh. Bad Kudra.
| [reply] [Watch: Dir/Any] |
|
Maybe you are wrong. or may be I am but:
gethostbyaddr returns the names matching the ip.
reverse name entries is as secure as dns gets. if
the ip has a reverse name, the ip for that name
will match the ip.
the discussion in the Cookbook is about
looking, whether the ipaddress you got when looking up by
name, matches the original name. which it
will not, unless you have a reverse entry for the same name.
| [reply] [Watch: Dir/Any] |
|
I don't think there's any looking up by name--in the
example the IP was grabbed with getpeername and
the name isn't known. If you were to get the ip with
gethostbyname and then use gethostbyaddr on the
result, you would be verifying it as they suggest, just
in reverse.
Quoting extensively from the Cookbook:
"...If you want the name of the remote end, call
gethostbyaddr to look up the name of the machine
in the DNS tables, right?
"Not really. That's only half the solution.
Because a name lookup goes to the name's owner's DNS
server and a lookup of an IP addresses goes to the
address's owner's DNS server, you have to contend with
the possibility that the machine that connecteed to you
is giving incorrect names. For instance, the machine
evil.crackers.org could belong to malevolent
cyberpirates who tell their DNS server that its IP address
(1.2.3.4) should be identified as
trusted.dod.gov. If your program trusts
trusted.dod.gov, a connection from
evil.crackers.org will cause getpeername to
return the right IP address (1.2.3.4), but
gethostbyaddr will return the duplicitous name
(my italics).
"To avoid this problem, we take the (possibly deceitful)
name returned by gethostbyaddr and look it up again
with gethostbyname..."
I'm just repeating, but it looks to me as if this is
talking about gethostbyaddr having the potential
to give incorrect information.
| [reply] [Watch: Dir/Any] |
|
You are correct. Reverse DNS can easily give wrong information
(if the bad guy controls his DNS server, he also controls the
reverse table). I know about this, but I don't care too much
about it for web access statistics.
--ZZamboni
| [reply] [Watch: Dir/Any] |
|
I wouldn't care either with web stats. I was just
suggesting that as an example piece of code you might
want to add a comment about that problem/feature so
that if someone adapted it for an application which did
require checking, s/he would know about it.
| [reply] [Watch: Dir/Any] |
|
|