Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

How to remove duplicate IPs

by sinhass (Initiate)
on Aug 07, 2013 at 22:36 UTC ( #1048449=perlquestion: print w/ replies, xml ) Need Help??
sinhass has asked for the wisdom of the Perl Monks concerning the following question:

I am new to perl programming and trying to learn to do perl way. I have a 300000 lines text files and trying to extract the IP information and remove the duplicate IPs. So far I am getting all the IPs but it doesn't remove the duplicates. Here is the code.

#!/usr/bin/perl use warnings; use Regexp::Common qw/net/; open (NMAP_DATA, "@ARGV") or die "Please type the filename. $!" ; open (IP_DATA, ">ipdata") or die "can't write to ipdata file"; while (<NMAP_DATA>) { my @ip_address = sort $1, "\n" if /($RE{net}{IPv4})/; print IP_DATA @ip_address if (!/^\s*$/); } close (IP_DATA) ; close (NMAP_DATA);

Comment on How to remove duplicate IPs
Download Code
Re: How to remove duplicate IPs
by rjt (Deacon) on Aug 07, 2013 at 22:53 UTC

    You're close; the next step is to keep track of the IP addresses you have already seen, and print to your output file only if you encounter a new one. I would structure your loop like this:

    my %seen; # Remember IPs we have already seen while (<NMAP_DATA>) { next unless /($RE{net}{IPv4})/; print IP_DATA "$1\n" if not $seen{$1}++; }

    I used the following test data:

    127.0.0.1 host_a whoops, no IP on this line 127.0.1.1 host_b 127.0.0.1 host_a (duplicate!) junk 127.0.0.2 host_c

    And got this output:

    127.0.0.1 127.0.1.1 127.0.0.2
    use strict; use warnings; omitted for brevity.

      Hi RJT, thank you very much for your help. That solves my problem.

Re: How to remove duplicate IPs
by jwkrahn (Monsignor) on Aug 08, 2013 at 00:41 UTC
    open (NMAP_DATA, "@ARGV") or die "Please type the filename. $!" ;

    That should probably be:

    open (NMAP_DATA, $ARGV[0]) or die "Please type the filename. $!" ;

    But would be even better as:

    open NMAP_DATA, '<', $ARGV[0] or die "Cannot open '$ARGV[0]' because: +$!";


    my @ip_address = sort $1, "\n" if /($RE{net}{IPv4})/; print IP_DATA @ip_address if (!/^\s*$/);

    Why are you sorting the two values $1 and "\n"?    Why are you using an array when you only need a scalar?

    print IP_DATA "$1\n" if /($RE{net}{IPv4})/;

      Hi jwkrahn , thanks for your advise. And frankly speaking I am learning perl so I have no clear idea yet.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1048449]
Approved by McDarren
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (14)
As of 2014-09-18 19:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (121 votes), past polls