Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Scan ARP cache dump - memory hog

by seanbo (Chaplain)
on Jul 03, 2002 at 17:48 UTC ( #179244=perlquestion: print w/replies, xml ) Need Help??

seanbo has asked for the wisdom of the Perl Monks concerning the following question:

WARNING!! I did not use strict

With that said... the code listed below is used to read a file and extract IP addresses so we can determine what IP addresses hadn't been used in a while and we can reclaim to give out for use.

As you will se by the script, it has about 250K records. When I run this script, I depleat all memory on our server (which happens to be our primary DNS server). Aside from running it on a non-production box, do any of you have suggestions for helping memory and speed performance?

I am having a HUGE problem with the line:
my @matches = grep {defined $_} map { /^($subnet\.\d+)/ and $1; } @lin +es;
Here is the whole program:
#!/usr/bin/perl -w use Getopt::Std; use Net::Netmask; my $subnet; my $ARP = '/tmp/arp.bak'; getopt('sm'); if ($opt_s ne '') { $subnet = $opt_s; } else { die "Please supply a subnet to scan."; } my $block = new Net::Netmask($subnet); my @range = $block->enumerate(); #We need to get our temp file system `tail -250000 /tmp/arp > $ARP`; my @lines = <DATA>; chomp(@lines); #Get the IPs that are preset with first 3 octets matching my @matches = grep {defined $_} map { /^($subnet\.\d+)/ and $1; } @l +ines; #Get the difference of the block and the IP's found my @intersection = my @difference = (); my %count = (); foreach $element (@matches, @range) { $count{$element}++ } foreach $element (keys %count) { push @{ $count{$element} > 1 ? \@intersection : \@difference }, +$element; } #make unique and then sort undef %pre; @pre{@difference} = (); @clean = keys %pre; my @sorted = sort { pack('C4' => $a =~ /(\d+)\.(\d+)\.(\d+)\.(\d+)/) cmp pack('C4' => $b =~ /(\d+)\.(\d+)\.(\d+)\.(\d+)/) } @clean; print join("\n",@sorted); __DATA__ 0x00804F429376 Vl2n22 0x00804F429466 Vl2n22 0x00804F4288E1 Vl2n22 0x00804F425776 Vl2n22 0x00804F461812 Vl2n22 0x0060B0280D05 Vl2n22 0x0001E62B36E7 Vl2n22 0x0060B030889D Vl2n22 0x0060B0521032 Vl2n22 0x00804F45F98E Vl2n22 0x00804F559E8D Vl2n22 0x00804F4627D8 Vl2n22 0x00804F58469D Vl2n22 0x0060B032E627 Vl2n22 0x00804F4627B2 Vl2n22 0x00804F462788 Vl2n22 0x00008086DEE1 Vl2n22 0x00804F41D3F7 Vl2n22 0x00804F4294F2 Vl2n22 0x00804F1E81F9 Vl2n22 0x00902784B532 Vl2n22 0x0060B012F54E Vl2n22 0x0040883FB2D1 Vl2n22 0x0001E63D35BE Vl2n22 0x0060B0F270BB Vl2n22 0x00405801279B Vl2n22 0x0040580008D6 Vl2n22

perl -e 'print reverse qw/o b n a e s/;'

Replies are listed 'Best First'.
Re: Scan ARP cache dump - memory hog
by ferrency (Deacon) on Jul 03, 2002 at 19:04 UTC
    Currently you're slurping the entire data set into memory at once; not only that, you're copying possibly huge chunks of it several more times. If you can build your code around a while() loop, and process each line at a time instead of slurping the entire file, you'd be much better off, memory-wise.

    # instead of this: my @lines = <DATA>; # do something like this: while my $line (<DATA>) { ... }
    Even if you only build @matches in that loop and keep the rest of the code the same, you may be much better off (assuming you have few matches compared to the size of the dataset). Deleting arrays after you're done with them (use my and arrange the code so they go out of lexical scope) will also help with memory reuse.

    If you can more clearly explain what this code is supposed to do, we might be able to find a much more straightforward solution. As it is, the code seems to be doing the same thing over again several times in different ways before printing its final results.


      Just to further explain what I am tryint to achieve. We currently manage acouple hundred subnets at my job. Each Monday morning, we get ARP cache dumps from all of our routers sent to us. People send us requests for IP addresses and DNS names. There are network admins that are notorious for not returning IP addresses.

      What I am trying to do, is take the last few months worth of ARP information (that is the tail -250000... command. It's jsut an approximation). We use that to determine which IPs have had no activity for a while and we remove the allocation from our records and notify the admin that we had it assigned to that we have reclaimed the address.

      Thanks for the input!

      perl -e 'print reverse qw/o b n a e s/;'
Re: Scan ARP cache dump - memory hog
by flocto (Pilgrim) on Jul 03, 2002 at 19:07 UTC

    If you're concerned about memory usage, you shouldn't read the entire file at once. Read it line by line and count the IPs in a hash, so you don't get duplicated entries. And, of course, use strict, but I guess you knew that already :) Anyhow, here's a snipped that came to my mind:

    #!/usr/bin/perl -w use strict; my $subnet = '192.168.87'; my %data = (); # precompile regex for performance.. my $regex = qr#^($subnet\.\d+)#; # read file line by line while (my $line = <DATA>) { chomp($line); if ($line =~ $regex) { $data{$1}++; } elsif ($debug) { print STDERR "Didn't match: $line\n"; } }

    If you really do want to stick to your own code, your line is better written as (it's not very nice either..):

    my @matches = grep { m/$regex/ } @lines; ($_) = m/$regex/ foreach @matches;


      I'll give this a try. I was trying to be elegant and do things faster than brute forcing my way line by line, but I guess you see where that got me... :-(

      Yea, I know I really should have been using strict and i felt like an idiot posting the code without it (thus the warning up top). Thanks for your input!

      when I clean up the code (and am using strict like I should be), I will repost the code.

      perl -e 'print reverse qw/o b n a e s/;'
Re: Scan ARP cache dump - memory hog
by seanbo (Chaplain) on Jul 04, 2002 at 14:15 UTC
    OK, here is the updated code. I modified it to read from a file instead of <DATA>. And guess what?!?! It runs under strict!! (Oh, it works too)

    ++ to ferrency and flocto for their help!!
    #!/usr/bin/perl -w use strict; use vars qw/ $opt_s /; use Getopt::Std; use Net::Netmask; my $subnet; my %data = (); my $ARP = '/tmp/arp.bak'; getopt('s'); if (defined($opt_s) && $opt_s ne '') { $subnet = $opt_s; } else { help(); } my $block = new Net::Netmask($subnet); if (defined($block->{'ERROR'})) { die "Invalid subnet/mask combinati +on."} my @range = $block->enumerate(); #We need to get our temp file system `tail -250000 /tmp/arp > $ARP`; #Open the file and read it line by line open(FH, $ARP) || die ("Couldn't open the arp file $ARP"); while (<FH>) { if (/^((?:\d{1,3}\.){3}\d{1,3})/) { if ($block->match($1)) { $data{$1}++; } } } close FH; #I need to specifically remove the network, gateway and broadcast #addresses since we don't care about those. #First remove from enumeration array shift @range; #Network Address shift @range; #Gateway Address pop @range; #Broadcast Address #Now remove from the IP's found in the ARP cache delete $data{$block->base()}; delete $data{$block->nth(1)}; delete $data{$block->broadcast()}; my @matches = keys %data; #Compare the array of matched IPs to the enumerated Netblock my @intersection = my @difference = (); undef %data; foreach my $element (@matches, @range) { $data{$element}++ } foreach my $element (keys %data) { push @{ $data{$element} > 1 ? \@intersection : \@difference }, $ +element; } #Now I'd like to sort the IPs (a little Schwartzian Transform action + here...) my @sorted = map { join '.', unpack 'N*', $_ } sort map { pack 'N*', split /\./ } @difference; print "Addresses that are candidates for reclaim in:\n"; print $block->desc(), "\n\n"; print join("\n",@sorted), "\n"; sub help { print <<'HELP'; You must supply a valid subnet. Acceptable formats are as follows: <--- The preferred form. syntax: -s HELP exit(1); }
    Update: Modified the IP sort to use a Schwartzian Transform so I can have them truly sorted like IP's should be.

    Update: Added a little help, support for VLSM's, and stripped out un-needed addresses (network, gateway, and broadcast). **note - the gateway is specific to our organization, yours may use a different address, we use network + 1. Thanks to tye, belg4mit, and arturo for your help with my regex issue. /msg me if I left you out.

    Update: Added code to check for invalid IP address/mask combination since some joker here already tried to enter something like

    Update: Fixed the regex (read as: removed a space that I would have never seen in a million years!) ++tye

    perl -e 'print reverse qw/o b n a e s/;'

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://179244]
Approved by zdog
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2020-11-24 09:44 GMT
Find Nodes?
    Voting Booth?

    No recent polls found