Perl script end up on saying "Out of Memory !"

syedumairali has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I am creating a perl script which searches for a specific text in the CSV text file (100000 rows & 30 KB) and there are huge huge numbers of files. I am usign hashkeys to first put one file into the hash. And then search the specific text. After It finishes searches. I uses the same hash function to copy the second csv file of same size and search for a specific function.

The script ran perfectly for the 60 odds files but after that it crashes with "Out of Memory !".

While running script I am also observing from task manager the size of available memory continuously decreasing (2GB ram).

I think I am missing clearing the hash variable (@data1) and the the error message comes when my hach fully utilizes the full memory.

Question : How can I erase or clear the hash before my perl script takes the second file ? here is the sample code (shown only relevant code)


# @ lines contain csv files

my %data1;
shift(@lines1); # remove column headings from file
shift(@lines1); # remove column headings from file


    foreach my $line (@lines1) {
        @words = split (/\,/, $line);            
        if ($words[6] > 90) {
        my $abstime = $words[1];
        my $payload = $words[5];
        $srcIPhex = substr $payload, 24, 8;
        my $dstIPhex = substr $payload, 32, 8;
        my $timestamp = substr $payload, 152, 12;            
        
        my $HashKey; # to get total number
        
        $HashKey = $srcIPhex.$abstime;        
        $data1{$HashKey}{ID} = $words[0];        
        $data1{$HashKey}{SRC_IP} = $srcIPhex;
        $data1{$HashKey}{DST_IP} = $dstIPhex;
        
        MeasureFiles(\%data1);
        
        }
                        
sub MeasureFiles
{
    my ($list_a_ref) = @_;        
  my %data1 = %$list_a_ref; # Dereference lists
  ....
  ....
  foreach (keys %data1) {                    
    $SrcIP_captured = inet_ntoa( pack( "N", hex( $data1{$_}{SRC_IP} ) 
+) );
    $DstIP_captured = inet_ntoa( pack( "N", hex( $data1{$_}{DST_IP} ) 
+) );
    foreach(my $i=0;$i<$ind;$i++){
       if ( $SrcIP_captured eq $SrcIP_ref[$i] &&  $DstIP_captured eq $
+DstIP_ref[$i]) {             
               $pkt_received++;                                       
+       
                   }
              }
          } 
         ....
         ....
    open(R1,">> $mainDirectory\\Results\\$file_result") || die("Cannot
+ Open File $file_result");
    my $results = "$SrcIP_ref[$i],$DstIP_ref[$i],$pkt_received";
    print R1 "$results\n";
    close(R1);
}
[download]

Comment on Perl script end up on saying "Out of Memory !" Download Code

Replies are listed 'Best First'.
Re: Perl script end up on saying "Out of Memory !" by moritz (Cardinal) on Sep 20, 2011 at 07:54 UTC
Question : How can I erase or clear the hash before my perl script takes the second file ? The simplest way is to declare it in such a way that it goes out of scope when you stop processing the file. Something along the lines of: `for my $filename (@files) { my %data1; # do all processing of file $filename here }` [download] Alternatively you can use undef `%data1` `my %data1 = %$list_a_ref; # Dereference lists` That doesn't just dereference, it also creates a copy. Do you want that? Perl 6 - second systems done right	[reply] [d/l] [select]
Re^2: Perl script end up on saying "Out of Memory !" by syedumairali (Initiate) on Sep 20, 2011 at 10:38 UTC
Thanks Moritz, for your guidance. Refer to your question. Infact I donot want the copy of hash inside a MeasureFiles subroutine. Can you help me may how to only get the reference and not the copy of the hash inside the routine. Thanks !	[reply]
Re^3: Perl script end up on saying "Out of Memory !" by moritz (Cardinal) on Sep 20, 2011 at 11:38 UTC
Just use `keys %$list_a_ref` and `$list_a_ref->{$_}{SRC_IP}`. See perlreftut for more details. Perl 6 - second systems done right	[reply] [d/l] [select]
Re^4: Perl script end up on saying "Out of Memory !" by syedumairali (Initiate) on Sep 20, 2011 at 16:18 UTC
Re^5: Perl script end up on saying "Out of Memory !" by moritz (Cardinal) on Sep 21, 2011 at 05:14 UTC
Re: Perl script end up on saying "Out of Memory !" by armstd (Friar) on Sep 23, 2011 at 14:35 UTC
Since it appears each file is processed independently of each other, and no state is maintained in memory, you might also consider forking processes to handle each file instead of doing it directly in one process. Your parent process won't be affected by any memory consumed by child processes. Also, if you 'exec "/bin/true"' or some-such instead of 'exit()' at the end of each child process, you'll find that memory frees up much faster than waiting for perl garbage collection, helping performance too. --Dave	[reply]
Re: Perl script end up on saying "Out of Memory !" by pvaldes (Chaplain) on Sep 23, 2011 at 19:25 UTC
`foreach my $line (@lines1) { @words = split (/\,/, $line); if ($words[6] > 90) { ... }` [download] I miss an else statment here, or maybe `while my $line(@lines1) { @words = split (/\,/, $line,8); next if $words[6] <= 90; ...` [download] Foreach requires typically more memory than while (and you have several foreach loops), use while instead unless you have good reasons to use a foreach loop `I am usign hashkeys to first put one file into the hash. And then search the specific text.` If you have a lot of files and you expect a lot of non matching lines try to discard these undesired files/lines as soon as possible. Sounds to me like a work for grep, regexp and next You could want not to care for what's after the seventh field, if this is your case, put a max num of fields in split. Thus split should end before and require less memory.	[reply] [d/l] [select]

Back to Seekers of Perl Wisdom