Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

how to resolve IP's in an HTTPd that doesn't resolve them?

by taint (Chaplain)
on Jun 13, 2018 at 19:09 UTC ( #1216581=perlquestion: print w/replies, xml ) Need Help??

taint has asked for the wisdom of the Perl Monks concerning the following question:

Greetings,

I'm experimenting with a couple of HTTPd's (names removed to protect the guilty innocent) that are fairly light, but don't resolve the connecting IP addresses (provide the HOST name) in the logs. I've contacted the devs of both of them asking why not, or if when. But the answers were basically, too heavy && too slow. Which, to me, seemed more an excuse. As the facility could have simply been implemented as an option. But they're not my HTTPd's.

So, I did some experimentation, and was able to get my web pages to return the HOST names of the connecting IP's in the following manner (all my pages are CGI/Perl):

... use strict; use feature qw(say); use Socket; ... my $remote_ip = $ENV{REMOTE_ADDR}; # note: don't try this at home kids, unless you have a fast local DNS +(server) my $remote_host = gethostbyaddr(inet_aton($remote_ip), AF_INET);
Which, as you might imagine; allows me to return the HOST name contained inside $remote_host.

So, Perl will give me what the HTTPD won't -- I love Perl! So how might I best achieve the same simultaneously within my HTTPd logs? I'm guessing pipe is probably involved here. But I'd really rather run this whole thing past the many brilliant minds, and monks here. In hopes of getting the best solution. :-)

Thanks!

Evil is good, for without it, Good would have no value
λɐp ʇɑəɹ⅁ ɐ əʌɐɥ puɐ ʻꜱdləɥ ꜱᴉɥʇ ədoH

Replies are listed 'Best First'.
Re: how to resolve IP's in an HTTPd that doesn't resolve them?
by stevieb (Canon) on Jun 13, 2018 at 19:46 UTC

    Why not share the names of the culprit software(s)? I mean, if they are open source and written in a language somebody else knows, perhaps they can see where a couple of C-type calls could be made to do what you want.

    Otherwise, doing what you're doing will be fine; just run a DNS cache alongside your application/module, check/cache the result, then tee off to a custom log file as you said.

    That, or write a log parser instead, that either rewrites the log file when you want to read it, or one that reads line-by-line and displays to the consumer after the transformation has occurred.

    Perhaps stating your overall objective would be handy here to get more appropriate feedback.

      "Perhaps stating your overall objective would be handy here to get more appropriate feedback"

      Hmm... judging by your answer; I do indeed need to better define my objective -- Sorry. :-(

      What I'm hoping to ultimately achieve, is to have the current logging the HTTPd provides, return (resolve) the connecting IP addresses it currently dumps to the log(s). Maybe an example would be prudent here:

      66.249.69.38 my.web.host - [12/Jun/2018:12:32:26 -0700] "GET / HTTP/1. +1" 200 3306 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKi +t/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
      When what I'd really like to see, is the following:
      crawl-66-249-69-38.googlebot.com my.web.host - [12/Jun/2018:12:32:26 - +0700] "GET / HTTP/1.1" 200 3306 "-" "Mozilla/5.0 (Windows NT 10.0; Wi +n64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 + Safari/537.36"
      Seems that this should be possible. I could simply:
      #!/bin/sh - cat /var/log/my.web.host-access-log | awk '{ print $1; }' | ...
      or some such to feed the logs to a resolver. But I'm ideally looking for a way to process the log(s) (connections) in "real time". So that the logs have the correct access times. I can imagine filtering , or piping it. But am not sure if they're the only/best solutions. So here I ask. :-)

      Thanks, stevieb, for taking the time to respond!

      Evil is good, for without it, Good would have no value
      λɐp ʇɑəɹ⅁ ɐ əʌɐɥ puɐ ʻꜱdləɥ ꜱᴉɥʇ ədoH

        I could simply:

        #!/bin/sh - cat /var/log/my.web.host-access-log | awk '{ print $1; }' | ...

        or some such to feed the logs to a resolver. But I'm ideally looking for a way to process the log(s) (connections) in "real time". So that the logs have the correct access times. I can imagine filtering , or piping it.

        <beancounting>You don't need cat in that pipe, just let awk read directly from the logfile.</beancounting>

        Back on topic: Name resolving takes time, causes some extra load, and can fail. Hence web servers generally prefer not to resolve the remote address for performance reasons. However, you could simply log to a pipe instead of logging into a file. Apache comes with logresolve, which is intended to run offline, but you could also use it "live". It's a simple filter. It might be a little bit too simple-minded:

        To minimize impact on your nameserver, logresolve has its very own internal hash-table cache. This means that each IP number will only be looked up the first time it is found in the log file.

        In other words: logresolve completely ignores any TTLs and so your live log will contain nonsense after running for a while. It's not a bug, as logresolve is intended to run offline and only for a short time.

        Have a look at the daemontools. At least multilog is usable, it takes care of reliably logging, including rotating log files. There is no IP resolving program in daemontools, but djb also published djbdns, a modular DNS resolver. It contains dnsfilter that should do quite exactly what you want: Resolve an IP address at line start to a host name. You should perhaps install a local cache on the webserver. That way, DNS requests are cached by djb's dnscache, dnsfilter reads most responses from the local cache, and so, DNS requests become a lot less expensive.

        To recap: Install a local DNS cache. Then log to a pipe that writes into dnsfilter. dnsfilter then logs into multilog, which creates a nice set of log files.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: how to resolve IP's in an HTTPd that doesn't resolve them?
by Anonymous Monk on Jun 14, 2018 at 08:30 UTC
    So, Perl will give me what the HTTPD won't -- I love Perl! So how might I best achieve the same simultaneously within my HTTPd logs?

    Use Perl Glue! DO NOT resolve logs with your httpd. <- Period. Use File::Tail to give your logfile a Perl API so you can watch it in realtime from your own program. Parse each line, capture the IP, resolve it and print whatever you want:

    use strict; use feature qw(say); use Socket; use File::Tail; my $file = File::Tail->new("/some/log/file"); my $line; while (defined($line = $file->read)) { if ($line =~ /^DATE (IP) (WHATEVER)/) { my $remote_ip = $1; my $whatever = $2; my $remote_host = gethostbyaddr(inet_aton($remote_ip),AF_INET); say join "\t", qw/$remote_ip $remote_host $whatever/; } }

      AGREE! I'm no expert at this, but when I did do it long ago with my own personal web server, Apache just logged IPs and the Perl script I wrote for post log processing (and long since lost) did name resolution, concatenation and statistics and emailed summary to me.

      As long as we're on the topic of name resolution, you may not *yet* care about IPv6, but rather than rewriting your code later when you do, start with the address family independent resolution calls rather than the legacy ones:

      • gethostbyname => getnameinfo
      • gethostbyaddr => getaddrinfo
      • inet_ntoa => inet_ntop
      • inet_aton => inet_pton

      Perl Socket module has had support since around version 1.94 (Perl 5.14 or there-abouts). A brief example that can be reduced but should be compatible with Socket modules of a certain earlier version that had the new routines, but no IPv6 support:

      #!/usr/bin/perl use strict; use warnings; use Socket qw(inet_ntoa AF_INET IPPROTO_TCP); my $AF_INET6 = eval { Socket::AF_INET6() }; my $AF_UNSPEC = eval { Socket::AF_UNSPEC() }; my $AI_NUMERICHOST = eval { Socket::AI_NUMERICHOST() }; my $NI_NUMERICHOST = eval { Socket::NI_NUMERICHOST() }; # Required for reverse lookup my $NI_NAMEREQD = eval { Socket::NI_NAMEREQD() }; my %hints = ( family => $AF_UNSPEC, protocol => IPPROTO_TCP ); my ( $err, @getaddr ) = Socket::getaddrinfo( $ARGV[0], undef, \%hints +); if ( defined( $getaddr[0] ) ) { for my $addr (@getaddr) { my ( $err, $address ) = Socket::getnameinfo( $addr->{addr}, $NI_NUMERICHOST ); printf "getaddrinfo()/getnameinfo() Address = %s\n", ( defined($address) ) ? $address : $err; # Reverse Lookup my ( $host, $service ); ( $err, $host, $service ) = Socket::getnameinfo( $addr->{addr}, $NI_NAMEREQD ); printf " |_> getnameinfo() Name = %s\n", ( defined($host) ) ? $host : $err; } } else { print "$0: getaddrinfo() failed - error = $err\n"; }

      and run ...

      C:\> test.pl www.google.com getaddrinfo()/getnameinfo() Address = 172.217.15.100 |_> getnameinfo() Name = iad30s21-in-f4.1e100.net
        Wonderful example, VinsWorldcom . Thanks!

        Right you are; IPv6 should indeed be considered. I hadn't really blocked the whole process out yet. As I wanted to first look at just how it be best implemented up front -- what all the possibilities that might be available.

        I too used Socket in my OP. As well as inet_aton. But hadn't (yet) bothered with IPv6 resolution. But will be adding it. Thank you for such an elaborate example ++ !

        Thanks again, VinsWorldcom ! With yours, as well as the other excellent examples above. I feel pretty well armed for the task! :-)

        --Chris

        Evil is good, for without it, Good would have no value
        λɐp ʇɑəɹ⅁ ɐ əʌɐɥ puɐ ʻꜱdləɥ ꜱᴉɥʇ ədoH

      Good advice Anonymous Monk !

      ...and only adds slightly more than I'm already using in the web page example I posted in the OP

      Thanks!

      --Chris

      Evil is good, for without it, Good would have no value
      λɐp ʇɑəɹ⅁ ɐ əʌɐɥ puɐ ʻꜱdləɥ ꜱᴉɥʇ ədoH

A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1216581]
Approved by Discipulus
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2020-07-08 03:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?