Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

How to remove duplicate IPs

by sinhass (Initiate)
on Aug 07, 2013 at 22:36 UTC ( #1048449=perlquestion: print w/ replies, xml ) Need Help??
sinhass has asked for the wisdom of the Perl Monks concerning the following question:

I am new to perl programming and trying to learn to do perl way. I have a 300000 lines text files and trying to extract the IP information and remove the duplicate IPs. So far I am getting all the IPs but it doesn't remove the duplicates. Here is the code.

#!/usr/bin/perl use warnings; use Regexp::Common qw/net/; open (NMAP_DATA, "@ARGV") or die "Please type the filename. $!" ; open (IP_DATA, ">ipdata") or die "can't write to ipdata file"; while (<NMAP_DATA>) { my @ip_address = sort $1, "\n" if /($RE{net}{IPv4})/; print IP_DATA @ip_address if (!/^\s*$/); } close (IP_DATA) ; close (NMAP_DATA);

Comment on How to remove duplicate IPs
Download Code
Re: How to remove duplicate IPs
by rjt (Deacon) on Aug 07, 2013 at 22:53 UTC

    You're close; the next step is to keep track of the IP addresses you have already seen, and print to your output file only if you encounter a new one. I would structure your loop like this:

    my %seen; # Remember IPs we have already seen while (<NMAP_DATA>) { next unless /($RE{net}{IPv4})/; print IP_DATA "$1\n" if not $seen{$1}++; }

    I used the following test data:

    127.0.0.1 host_a whoops, no IP on this line 127.0.1.1 host_b 127.0.0.1 host_a (duplicate!) junk 127.0.0.2 host_c

    And got this output:

    127.0.0.1 127.0.1.1 127.0.0.2
    use strict; use warnings; omitted for brevity.

      Hi RJT, thank you very much for your help. That solves my problem.

Re: How to remove duplicate IPs
by jwkrahn (Monsignor) on Aug 08, 2013 at 00:41 UTC
    open (NMAP_DATA, "@ARGV") or die "Please type the filename. $!" ;

    That should probably be:

    open (NMAP_DATA, $ARGV[0]) or die "Please type the filename. $!" ;

    But would be even better as:

    open NMAP_DATA, '<', $ARGV[0] or die "Cannot open '$ARGV[0]' because: +$!";


    my @ip_address = sort $1, "\n" if /($RE{net}{IPv4})/; print IP_DATA @ip_address if (!/^\s*$/);

    Why are you sorting the two values $1 and "\n"?    Why are you using an array when you only need a scalar?

    print IP_DATA "$1\n" if /($RE{net}{IPv4})/;

      Hi jwkrahn , thanks for your advise. And frankly speaking I am learning perl so I have no clear idea yet.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1048449]
Approved by McDarren
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2014-11-01 13:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    If a safe, affordable anti-ageing treatment that extended life indefinitely were to become available, would you take it?



    Results (4 votes), past polls