Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

parsing whois data

by Discipulus (Monsignor)
on Dec 06, 2017 at 20:11 UTC ( #1205047=perlquestion: print w/replies, xml ) Need Help??
Discipulus has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks and nuns,

I'm in the situation where I want to extract some whois data, live. I tested Net::Whois::Raw and Net::Whois::Parser and while the first returns all the results in a single big string (as raw in the namespace suggests) the latter parses the results and output a hash of scalars/arrays.

The problem is that Net::Whois::Parser does not returns all informations for all tld domains: for example .it domains return no nameservers fields because it happens to be a multiline record for all domain i tested.

Net::Whois::Parser by other hand provides a way to specify a custom parser for specific whois servers.

Let's say I need Domain status and Nameserver fields (but maybe more) there is a better universalistic way to get them parsed for every top level domains?

Due to my ignorance i ignore if for each tld there is only one whois server or one format for the data or if i can have multiple format for different, for example, .it domains (this will be a pain..).


There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Replies are listed 'Best First'.
Re: parsing whois data
by shmem (Chancellor) on Dec 06, 2017 at 20:37 UTC

    It is up to the NICs WHOIS services what information they serve. Maybe a combination of the modules you use and Net::DNS::Dig fits your needs.

    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
Re: parsing whois data
by Discipulus (Monsignor) on Dec 07, 2017 at 12:43 UTC

    what follows is a partial implementation for parsing status and nameservers fields from whois data. It's a very elementar approach, but it seems a working one.

    I leave it here for eventual future readers of the thread. Please note that I choosed to parse nameservers from whois data even if there are better modules as shmem suggested.

    I difenitevely wait the advent of the RDAP protocol: actually whois data is a mess.

    use strict; use warnings; use Net::Whois::Raw; use Net::Whois::Parser; sub get_whois_infos { my $dom = shift; # strip eventual third level here? my($raw_dominfo, $whois_server) = whois($dom); my $parsed_info = parse_whois( raw => $raw_dominfo, domain => $who +is_server ); # extract domain status data my $pre_status = $parsed_info->{status} || $parsed_info->{domain_s +tatus}; my $status; if (ref $pre_status eq 'ARRAY'){ $status .= $_ for map{s/\s\S+/ /r} @{$pre_status}; } else{$status = $pre_status ? $pre_status=~s/\s\S+/ /r : '-not defi +ned-'} # extract nameservers data my @ns; # the following should be ok for .com .org .info .name if ($parsed_info->{nameservers} and ref $parsed_info->{nameservers +} eq 'ARRAY'){ foreach my $ele(@{$parsed_info->{nameservers}}){ push @ns, $$ele{domain}; } } # the following is needed for it biz eu net else{ my $switch = 0; foreach my $line(split /\n/,$raw_dominfo){ # .it format #Nameservers # .eu format #Name servers: # # if ($dom =~ /\.it$|\.eu$/){ if ($line =~/name\s?servers/i){$switch = 1; next} next unless $switch; push @ns, $1 if $line =~/^\s*([\S]+)$/; $switch = 0 if $line =~/^$/; } # .biz .net both can issue: "Maximum Daily connection lim +it reached. Lookup refused." #Name Server: #Name Server: elsif($dom =~/\.biz$|\.net$/){ push @ns, $1 if $line =~/^Name Server:\s+([\S]+)$/; push @ns, "-$1-" if $line =~/(Maximum Daily connection + limit reached. Lookup refused)/; } else{ push @ns, "-Unable to parse whois data for $dom-"; + } } } return ({status=>$status, ns=>\@ns},"($whois_server answer)\n\n$ra +w_dominfo"); }


    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1205047]
Front-paged by haukex
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (1)
As of 2017-12-16 19:41 GMT
Find Nodes?
    Voting Booth?
    What programming language do you hate the most?

    Results (458 votes). Check out past polls.