Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

How to remove duplicate IPs

by sinhass (Initiate)
on Aug 07, 2013 at 22:36 UTC ( #1048449=perlquestion: print w/ replies, xml ) Need Help??
sinhass has asked for the wisdom of the Perl Monks concerning the following question:

I am new to perl programming and trying to learn to do perl way. I have a 300000 lines text files and trying to extract the IP information and remove the duplicate IPs. So far I am getting all the IPs but it doesn't remove the duplicates. Here is the code.

#!/usr/bin/perl use warnings; use Regexp::Common qw/net/; open (NMAP_DATA, "@ARGV") or die "Please type the filename. $!" ; open (IP_DATA, ">ipdata") or die "can't write to ipdata file"; while (<NMAP_DATA>) { my @ip_address = sort $1, "\n" if /($RE{net}{IPv4})/; print IP_DATA @ip_address if (!/^\s*$/); } close (IP_DATA) ; close (NMAP_DATA);

Replies are listed 'Best First'.
Re: How to remove duplicate IPs
by rjt (Deacon) on Aug 07, 2013 at 22:53 UTC

    You're close; the next step is to keep track of the IP addresses you have already seen, and print to your output file only if you encounter a new one. I would structure your loop like this:

    my %seen; # Remember IPs we have already seen while (<NMAP_DATA>) { next unless /($RE{net}{IPv4})/; print IP_DATA "$1\n" if not $seen{$1}++; }

    I used the following test data: host_a whoops, no IP on this line host_b host_a (duplicate!) junk host_c

    And got this output:
    use strict; use warnings; omitted for brevity.

      Hi RJT, thank you very much for your help. That solves my problem.

Re: How to remove duplicate IPs
by jwkrahn (Monsignor) on Aug 08, 2013 at 00:41 UTC
    open (NMAP_DATA, "@ARGV") or die "Please type the filename. $!" ;

    That should probably be:

    open (NMAP_DATA, $ARGV[0]) or die "Please type the filename. $!" ;

    But would be even better as:

    open NMAP_DATA, '<', $ARGV[0] or die "Cannot open '$ARGV[0]' because: +$!";

    my @ip_address = sort $1, "\n" if /($RE{net}{IPv4})/; print IP_DATA @ip_address if (!/^\s*$/);

    Why are you sorting the two values $1 and "\n"?    Why are you using an array when you only need a scalar?

    print IP_DATA "$1\n" if /($RE{net}{IPv4})/;

      Hi jwkrahn , thanks for your advise. And frankly speaking I am learning perl so I have no clear idea yet.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1048449]
Approved by McDarren
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2016-07-24 08:32 GMT
Find Nodes?
    Voting Booth?
    What is your favorite alternate name for a (specific) keyboard key?

    Results (221 votes). Check out past polls.