http://www.perlmonks.org?node_id=1048449

sinhass has asked for the wisdom of the Perl Monks concerning the following question:

I am new to perl programming and trying to learn to do perl way. I have a 300000 lines text files and trying to extract the IP information and remove the duplicate IPs. So far I am getting all the IPs but it doesn't remove the duplicates. Here is the code.

#!/usr/bin/perl use warnings; use Regexp::Common qw/net/; open (NMAP_DATA, "@ARGV") or die "Please type the filename. $!" ; open (IP_DATA, ">ipdata") or die "can't write to ipdata file"; while (<NMAP_DATA>) { my @ip_address = sort $1, "\n" if /($RE{net}{IPv4})/; print IP_DATA @ip_address if (!/^\s*$/); } close (IP_DATA) ; close (NMAP_DATA);

Replies are listed 'Best First'.
Re: How to remove duplicate IPs
by rjt (Curate) on Aug 07, 2013 at 22:53 UTC

    You're close; the next step is to keep track of the IP addresses you have already seen, and print to your output file only if you encounter a new one. I would structure your loop like this:

    my %seen; # Remember IPs we have already seen while (<NMAP_DATA>) { next unless /($RE{net}{IPv4})/; print IP_DATA "$1\n" if not $seen{$1}++; }

    I used the following test data:

    127.0.0.1 host_a whoops, no IP on this line 127.0.1.1 host_b 127.0.0.1 host_a (duplicate!) junk 127.0.0.2 host_c

    And got this output:

    127.0.0.1 127.0.1.1 127.0.0.2
    use strict; use warnings; omitted for brevity.

      Hi RJT, thank you very much for your help. That solves my problem.

Re: How to remove duplicate IPs
by jwkrahn (Abbot) on Aug 08, 2013 at 00:41 UTC
    open (NMAP_DATA, "@ARGV") or die "Please type the filename. $!" ;

    That should probably be:

    open (NMAP_DATA, $ARGV[0]) or die "Please type the filename. $!" ;

    But would be even better as:

    open NMAP_DATA, '<', $ARGV[0] or die "Cannot open '$ARGV[0]' because: +$!";


    my @ip_address = sort $1, "\n" if /($RE{net}{IPv4})/; print IP_DATA @ip_address if (!/^\s*$/);

    Why are you sorting the two values $1 and "\n"?    Why are you using an array when you only need a scalar?

    print IP_DATA "$1\n" if /($RE{net}{IPv4})/;

      Hi jwkrahn , thanks for your advise. And frankly speaking I am learning perl so I have no clear idea yet.