yasysad has asked for the wisdom of the Perl Monks concerning the following question:
Hello Monks ..
I am a SysAd who has the task of consolidating a large file with IP Addresses... and what better than PERL for the same ..
I have been through the FAQs and got a few pointers with file operations, etc. What really kills me is the algorithm to do so as IP numbers have their own maths.
Well, here's hoping for enlightenment ..
Input File format : From address, To Address, Name
202.1.2.0,202.2.255.255,a
202.3.0.0,202.3.255.255,a
202.4.0.0,202.4.0.255,b
The above eg. shows that line 2 is the continuation of line one and as the name is the same, the o/p needs to be :
Output File format : From address, To Address, Name
202.1.2.0,202.3.255.255,a
202.4.0.0,202.4.0.255,b
I have tried using a lot of if's and ands, but I'm not able to pick 2 lines and check throughout the file. Thanks a lot.
Re: IP Address consolidation
by tadman (Prior) on Aug 20, 2001 at 14:48 UTC
|
If you have the capacity, which you should, it would be
fairly straightfoward to load all the files into memory,
and then write them out. Since these are grouped by name,
why not use a Hash of Arrays (HoA):
my %data;
foreach my $file (@file_list)
{
open (INPUT, $file) || warn "Could not read $file\n";
while (<INPUT>)
{
chomp;
my ($start,$end,$name) = split (/,/);
push (@{$data{$name}}, "$start,$end");
}
close (INPUT);
}
foreach (sort keys %data)
{
print "@{$data{$_}},$_\n";
}
If you have overlapping entries in the different files,
then you will have to check on insert. This could be done
with a Hash of Hashes (HoH):
use Socket;
my %data;
foreach my $file (@file_list)
{
open (INPUT, $file) || warn "Could not read $file\n";
while (<INPUT>)
{
chomp;
my ($start,$end,$name) = split (/,/);
$start = inet_aton($start);
$end = inet_aton($end);
if (defined $data{$name}{$start})
{
# Resolve conflict?
}
else
{
$data{$name}{$start} = $end;
}
}
close (INPUT);
}
foreach my $name (sort keys %data)
{
foreach my $start (sort keys %{$data{$name}})
{
print join (',',
inet_ntoa($start),
inet_ntoa($end),
$name),
"\n";
}
}
The reason for using inet_aton (ASCII to Number) from the Socket module
is to simplify comparisons. "202.1.2.0" and "202.01.002.0"
are equivalent, and removing redundant zeros is a lot more
complicated than just "packing" them into their native
format (4 bytes). They are easily unpacked with the
complementary inet_ntoa (Number to ASCII), and should
always come out clean with no extraneous zeros.
Additionally, if you want to sort them, which I'm doing
here with the regular sort operator, they will sort
ASCII-betically, which should put them in order. Numeric
sorts are more complicated, especially those with
multiple points.
Update:
- Fixed inet_ntoa calls in first loop.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
I tried your code and got this error at this line
$start = inet_ntoa($start);
the error output is :
Bad arg length for Socket::inet_ntoa, length is 10, should be 4 at ipconsnew.pl line 16, <INPUT> line 1.
I am working on ActivePerl on Windows NT .. Am I doing anything wrong ?? | [reply] [Watch: Dir/Any] [d/l] |
|
Thanks tadman .. the functions are a revelation .. I may be able to work on them for the resolve conflict ..
actually, it's the algorithm I was looking for ..
| [reply] [Watch: Dir/Any] |
Re: IP Address consolidation
by tadman (Prior) on Aug 20, 2001 at 15:03 UTC
|
Your update helped clarify one particular thing here.
A little post-processing can help set things straight.
Since all the data is stored in organized structures, it
can be cleaned up before being printed out. In this case,
joining adjacent blocks is a no-brainer.
The idea is that since IP addresses are just numbers,
you can do math on them to add and subtract. In this case,
what you want to do is add one to the "end" to see if
it matches the next "start". You could write your own
add function, but this is a little tedious, with up to three
possible carries. Instead, it is much more efficient to
render the IP address as a simple 32-bit number and work
with it that way. This can be done with unpack which will
extract the "raw" 32-bit value of an inet_aton operation.
A "N"-type pack is a network-order 32-bit number, a standard
way of transporting numbers across the Internet.
So, once unpacked, you add one, and feed the result back
into inet_aton which will give you a new repacked address.
This can be extracted, if you like, into the pretty
human-readable version we've come to know, using inet_ntoa.
This code merely compares the data in the hash for any
adjacent matches, and when it finds them, puts the end
from the second as the end of the first, and deletes
the second.
# This function returns the 32-bit value
# of the IP address for numeric comparisons.
sub addr_value { return unpack("N", $_[0]); }
foreach my $name (sort keys %data)
{
my $carry;
foreach my $start (sort keys %{$data{$name}})
{
# Skip keys deleted after keys was calculated
next unless defined $data{$name}{$start};
# Add one to the end to determine the next start
my $next_start = inet_aton(addr_value($end)+1);
# If this block is adjacent to the next one...
if (defined $data{$name}{$next_start})
{
$carry ||= $start;
# ...end this block where that block ended...
$data{$name}{$carry} = $data{$name}{$next_start};
# ...and delete that block.
delete $data{$name}{$next_start};
}
else
{
# No match, so reset the $carry
undef $carry;
}
}
}
This is just off the top, so your mileage may vary. | [reply] [Watch: Dir/Any] [d/l] |
Re: IP Address consolidation
by dga (Hermit) on Aug 20, 2001 at 20:27 UTC
|
This little program has all the elements needed. It will acculumate IP ranges by the id name and expand the range to include the min and max ip addresses.
It does not check for holes between ranges in the same id which if the input data is sane is not needed.
#!/usr/bin/perl
use strict;
my($start, $end, $id, %range);
while(<>)
{
chop;
($start, $end, $id)=split(',');
$start=pack("C4", (split('\.',$start)));
$end=pack("C4", (split('\.',$end)));
if($range{$id})
{
my($os, $oe)=@{$range{$id}};
$os=$start if($start lt $os);
$oe=$end if($end gt $oe);
@{$range{$id}}=($os, $oe);
}
else
{
@{$range{$id}}=($start, $end);
}
}
foreach my $r ( sort keys %range )
{
printf "%vd,%vd,%s\n", @{$range{$r}},$r;
}
The addresses and apparently v strings in general are strings so you have to use string comparison operators on them.
But, due to the conversion, 10.20.30.40 is smaller than 10.20.30.100 which as normal strings would not be true.
the %vd prints out dotted decimal %vb prints out a bit string %vX prints out an IPv6 address.
| [reply] [Watch: Dir/Any] [d/l] |
Re: IP Address consolidation
by claree0 (Hermit) on Aug 20, 2001 at 14:24 UTC
|
I think you need to give us some more detail on what
you are trying to do - e.g. what makes line 2 a follow-on
from from line 1?
What comparisons are you doing?
Clare
| [reply] [Watch: Dir/Any] |
|
My apologies ..
The clarifications are thus :
1. IP Address file has the range of IP Addresses held by the name that appears in the same line.
2. I need to join adjacent ranges belonging to the same name; thus reducing the number of lines in the file.
3. In the first example, Line 2 follows Line 1 because 202.3.0.0 is the next IP Address after 202.2.255.255.
Similarly, if the "From" of one line is the next IP of the "TO" of the previous line, and the name is the same, the record needs to be replaced by the "From" of the first line, the "To" of the following line, and the common "Name"
Hope this helps
| [reply] [Watch: Dir/Any] |
|
|