http://www.perlmonks.org?node_id=980591

maheshkumar has asked for the wisdom of the Perl Monks concerning the following question:

I am writing a code which extracts IP addresses from a text file and saves it another text file and then it saves each line of the text file in an array so that it can be processed for another task which I have not yet included in the code as the error that is showing up is "Out of Memory". Any thoughts on this?

system("clear"); use Geo::IPfree; use Regexp::Common qw/net/; use strict; use warnings; #my $file = 'MKS_1.txt'; open my $in, '<:raw', 'Sample_1.txt' or die; open my $out, '>:raw', 'Google_1' or die; my @IPPs; while (my $line = <$in>) { #print "Working.... ... .. . \n"; #my ($ip) = $line =~ /(?: $RE{net}{IPv4}) ( \d+ [ ] ms \s+ ){3}/ms +x; my ($ip) = $line =~ /(?: \d+ [ ] ms \s+ ){3} ($RE{net}{IPv4})/msx; #$ip = @IPPs; print {$out} "$ip\n" if defined $ip; } open (my $fh, "<", "Google_1.txt"); my @file_array; while (my $line = $fh){ chomp($line); push (@file_array, $line); } print "@file_array \n";

My text file looks something like this, though this is just a sample, but it is huge

;ADDITIONAL Traceroute: 21 1339259115 1339259076 1339259076 LocalDNS 7e.img.v4.sky +rock.net 193.93.124.172 Tracing route to 193.93.124.172 over a maximum of 20 hops 1 * * * Request timed out. 2 103 ms 99 ms 99 ms 10.94.1.6 3 109 ms 89 ms 318 ms 10.71.63.31 4 106 ms 99 ms 98 ms 10.71.64.197 5 114 ms 109 ms 119 ms 10.71.66.32 6 109 ms 88 ms 108 ms 41.190.1.66 7 105 ms 98 ms 99 ms 41.190.1.36 8 193 ms 199 ms 198 ms 79.99.198.210 9 205 ms 199 ms 199 ms 216.66.86.25 10 220 ms 208 ms 188 ms 195.66.224.42 11 298 ms 278 ms 278 ms 83.167.63.231 12 * 533 ms 288 ms 83.167.56.177 13 271 ms 289 ms 268 ms 83.167.52.234 14 238 ms 198 ms 209 ms 193.93.126.21 15 271 ms 278 ms 279 ms 193.93.126.74 16 299 ms 289 ms 318 ms 193.93.124.172 Trace complete. ;AUTHORITY ;ADDITIONAL Query: 11 1339259131 OpenDNS 9msn.com.au 1 True 0.212331455534 0.61037 +6458461 id 19383 opcode QUERY rcode NOERROR flags QR RD RA ;QUESTION 9msn.com.au. IN A ;ANSWER 9msn.com.au. 275 IN A 202.58.49.1 9msn.com.au. 275 IN A 202.58.48.1 ;AUTHORITY ;ADDITIONAL Traceroute: 25 1339259131 1339259063 1339259063 OpenDNS 6a.typepad.com + 204.9.177.195 Tracing route to 204.9.177.195 over a maximum of 20 hops 1 * * * Request timed out. 2 103 ms 99 ms 129 ms 10.94.1.6 3 110 ms 88 ms 108 ms 10.71.63.31 4 97 ms 109 ms 98 ms 10.71.64.197 5 124 ms 119 ms 109 ms 10.71.66.32 6 114 ms 127 ms 108 ms 41.190.1.66 7 93 ms 109 ms 88 ms 41.190.1.36 8 191 ms 199 ms 208 ms 79.99.198.210 9 199 ms 199 ms 219 ms 195.219.51.13 10 * * * Request timed out. 11 293 ms 189 ms 198 ms 195.219.83.102 12 * 222 ms 198 ms 4.69.166.157 13 298 ms 269 ms 309 ms 4.69.153.137 14 300 ms 259 ms 269 ms 4.69.137.70 15 369 ms 356 ms 339 ms 4.69.134.78 16 343 ms * 352 ms 4.69.148.45 17 443 ms 418 ms 419 ms 4.69.135.185 18 * 342 ms 349 ms 4.69.153.14 19 428 ms 409 ms 479 ms 4.69.153.29 20 435 ms 438 ms 439 ms 4.69.134.37 Trace complete.

Replies are listed 'Best First'.
Re: Out of Memory in Perl
by jethro (Monsignor) on Jul 08, 2012 at 16:58 UTC

    You probably need "while (my $line = <$fh>){" instead of "while (my $line = $fh){".

    Try to use "print" statements to show you contents of variables, so you know what your script is doing, then you can find such bugs yourself (there are also advanced methods for bug finding available, but everyone usually begins learning that by printing debug output to find bugs).

Re: Out of Memory in Perl
by Cristoforo (Curate) on Jul 08, 2012 at 19:08 UTC
    open my $in, '<:raw', 'Sample_1.txt' or die;
    open my $out, '>:raw', 'Google_1' or die;
    I don't think using the raw layer buys you anything - your files are text files and I think the raw layer is for reading binary files mainly, but then I am not certain. I've never used it. The common form would probably be better (unless you have a reason to use raw).
    open my $in, '<', 'Sample_1.txt' or die $!;
    open my $out, '>', 'Google_1.txt' or die $!;
Re: Out of Memory in Perl
by Kenosis (Priest) on Jul 09, 2012 at 02:11 UTC

    m{ms\s+\K(\S+)\s*$} matches:

    12 * 533 ms 288 ms 83.167.56.177 12 * 222 ms 198 ms 4.69.166.157 16 343 ms * 352 ms 4.69.148.45 18 * 342 ms 349 ms 4.69.153.14

    Whereas the OP's /(?: \d+ [ ] ms \s+ ){3} ($RE{net}{IPv4})/msx does not. Am not sure, however, whether that was the OP's intent.

      What?

        I prefer your regex, and am not sure if the OP intentionally omitted traceroute lines with "*" in them. Only noticed because I was also tweaking the OP's regex (the EOL IPs were just waiting to be captured).

Re: Out of Memory in Perl
by bulk88 (Priest) on Jul 08, 2012 at 16:58 UTC
    How big is your text file? gigs?

    Have you tried using a step through debugger to see how far you get before the out of memory error? If you dont know how to use a step through debugger, toss a couple of these in your perl code.
    system("read -p \"Press any key\"");#unix #or system("pause");#windows
Re: Out of Memory in Perl
by Anonymous Monk on Jul 08, 2012 at 19:21 UTC
     ack '--output=$1' 'ms\s+(\S+)\s*$' < infile > outfile
       ack  "ms\s+\K(\S+)\s*$" ....
        $ perl -lne " print $1 if m{ms\s+\K(\S+)\s*$}"  < infile > outfile
Re: Out of Memory in Perl
by maheshkumar (Sexton) on Jul 08, 2012 at 16:37 UTC

    Also when i try just the following code with the new file it is giving some problems like my system gets frozen and all

    open (my $fh, "<", "Google_1.txt"); my @file_array; while (my $line = $fh){ chomp($line); push (@file_array, $line); } print "@file_array \n";
      Hi, try wrapping $fh with the <> operator. Otherwise, you just keep setting $line to the $fh object, until you run out of resources.
      # wrong while (my $line = $fh){ # right while (my $line = <$fh >){
      Additionally, unless you need to process each line individually, it might be simpler to write
      open (my $fh, "<", "test.txt"); my @file_array = (<$fh>); print @file_array,"\n";

      P.S. Since you are dealing with big files, you might want to use ARGV's special line by line processing magic. That way, you never pull the whole file into memory. Google for perl ARGV magic.

      #!/usr/bin/perl use warnings; use strict; # use ARGV's magic @ARGV = "test.txt"; my @results; while ( <ARGV> ){ chomp $_; if ($_ =~ m/head/){ push (@results, $_); } } print "@results\n";

      I'm not really a human, but I play one on earth.
      Old Perl Programmer Haiku ................... flash japh

      Your problem is (potentially) two-fold: with this snippet you are tying to read the entire file into memory (but actually, you are getting stuck in a neverending while loop because you use $fh instead of <$fh>). Are you doing something special with this that requires the entire file to be read in at once or just printing it to STDOUT (or another file handle)? If the latter, a simple fix is:

      open my $fh, "<", "Google_1.txt"; print $_ while <$fh>;

      One more thing... If the input file is small enough and you really do want to read it into an array for something other than printing, you can use something like this instead of your while loop:

      open my $fh, "<", "Google_1.txt"; my @file_array = <$fh>; chomp @file_array;

        I used the <$fh> but the line does not get saved in the array? It also does not print anything