Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

How to remove duplicate IPs

by sinhass (Initiate)
on Aug 07, 2013 at 22:36 UTC ( #1048449=perlquestion: print w/replies, xml ) Need Help??
sinhass has asked for the wisdom of the Perl Monks concerning the following question:

I am new to perl programming and trying to learn to do perl way. I have a 300000 lines text files and trying to extract the IP information and remove the duplicate IPs. So far I am getting all the IPs but it doesn't remove the duplicates. Here is the code.

#!/usr/bin/perl use warnings; use Regexp::Common qw/net/; open (NMAP_DATA, "@ARGV") or die "Please type the filename. $!" ; open (IP_DATA, ">ipdata") or die "can't write to ipdata file"; while (<NMAP_DATA>) { my @ip_address = sort $1, "\n" if /($RE{net}{IPv4})/; print IP_DATA @ip_address if (!/^\s*$/); } close (IP_DATA) ; close (NMAP_DATA);

Replies are listed 'Best First'.
Re: How to remove duplicate IPs
by rjt (Deacon) on Aug 07, 2013 at 22:53 UTC

    You're close; the next step is to keep track of the IP addresses you have already seen, and print to your output file only if you encounter a new one. I would structure your loop like this:

    my %seen; # Remember IPs we have already seen while (<NMAP_DATA>) { next unless /($RE{net}{IPv4})/; print IP_DATA "$1\n" if not $seen{$1}++; }

    I used the following test data: host_a whoops, no IP on this line host_b host_a (duplicate!) junk host_c

    And got this output:
    use strict; use warnings; omitted for brevity.

      Hi RJT, thank you very much for your help. That solves my problem.

Re: How to remove duplicate IPs
by jwkrahn (Monsignor) on Aug 08, 2013 at 00:41 UTC
    open (NMAP_DATA, "@ARGV") or die "Please type the filename. $!" ;

    That should probably be:

    open (NMAP_DATA, $ARGV[0]) or die "Please type the filename. $!" ;

    But would be even better as:

    open NMAP_DATA, '<', $ARGV[0] or die "Cannot open '$ARGV[0]' because: +$!";

    my @ip_address = sort $1, "\n" if /($RE{net}{IPv4})/; print IP_DATA @ip_address if (!/^\s*$/);

    Why are you sorting the two values $1 and "\n"?    Why are you using an array when you only need a scalar?

    print IP_DATA "$1\n" if /($RE{net}{IPv4})/;

      Hi jwkrahn , thanks for your advise. And frankly speaking I am learning perl so I have no clear idea yet.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1048449]
Approved by McDarren
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2017-06-28 11:26 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (632 votes). Check out past polls.