How to remove duplicate IPs

sinhass has asked for the wisdom of the Perl Monks concerning the following question:

I am new to perl programming and trying to learn to do perl way. I have a 300000 lines text files and trying to extract the IP information and remove the duplicate IPs. So far I am getting all the IPs but it doesn't remove the duplicates. Here is the code.

#!/usr/bin/perl
use warnings;
use Regexp::Common qw/net/;
open (NMAP_DATA, "@ARGV") or die "Please type the filename. $!" ;
open (IP_DATA, ">ipdata") or die "can't write to ipdata file";
while (<NMAP_DATA>) {
my @ip_address = sort $1, "\n" if /($RE{net}{IPv4})/;
print IP_DATA @ip_address if (!/^\s*$/);
}
close (IP_DATA) ;
close (NMAP_DATA);
[download]

Comment on How to remove duplicate IPs Download Code

Replies are listed 'Best First'.
Re: How to remove duplicate IPs by rjt (Curate) on Aug 07, 2013 at 22:53 UTC
You're close; the next step is to keep track of the IP addresses you have already seen, and print to your output file only if you encounter a new one. I would structure your loop like this: `my %seen; # Remember IPs we have already seen while (<NMAP_DATA>) { next unless /($RE{net}{IPv4})/; print IP_DATA "$1\n" if not $seen{$1}++; }` [download] I used the following test data: `127.0.0.1 host_a whoops, no IP on this line 127.0.1.1 host_b 127.0.0.1 host_a (duplicate!) junk 127.0.0.2 host_c` [download] And got this output: `127.0.0.1 127.0.1.1 127.0.0.2` [download] `use strict; use warnings;` omitted for brevity.	[reply] [d/l] [select]
Re^2: How to remove duplicate IPs by sinhass (Initiate) on Aug 07, 2013 at 23:05 UTC
Hi RJT, thank you very much for your help. That solves my problem.	[reply]
Re: How to remove duplicate IPs by jwkrahn (Abbot) on Aug 08, 2013 at 00:41 UTC
`open (NMAP_DATA, "@ARGV") or die "Please type the filename. $!" ;` [download] That should probably be: `open (NMAP_DATA, $ARGV[0]) or die "Please type the filename. $!" ;` [download] But would be even better as: `open NMAP_DATA, '<', $ARGV[0] or die "Cannot open '$ARGV[0]' because: +$!";` [download] `my @ip_address = sort $1, "\n" if /($RE{net}{IPv4})/; print IP_DATA @ip_address if (!/^\s*$/);` [download] Why are you sorting the two values `$1` and `"\n"`? Why are you using an array when you only need a scalar? `print IP_DATA "$1\n" if /($RE{net}{IPv4})/;` [download]	[reply] [d/l] [select]
Re^2: How to remove duplicate IPs by sinhass (Initiate) on Aug 08, 2013 at 03:15 UTC
Hi jwkrahn , thanks for your advise. And frankly speaking I am learning perl so I have no clear idea yet.	[reply]

Back to Seekers of Perl Wisdom