http://www.perlmonks.org?node_id=926841

syedumairali has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I am creating a perl script which searches for a specific text in the CSV text file (100000 rows & 30 KB) and there are huge huge numbers of files. I am usign hashkeys to first put one file into the hash. And then search the specific text. After It finishes searches. I uses the same hash function to copy the second csv file of same size and search for a specific function.

The script ran perfectly for the 60 odds files but after that it crashes with "Out of Memory !".

While running script I am also observing from task manager the size of available memory continuously decreasing (2GB ram).

I think I am missing clearing the hash variable (@data1) and the the error message comes when my hach fully utilizes the full memory.

Question : How can I erase or clear the hash before my perl script takes the second file ? here is the sample code (shown only relevant code)
# @ lines contain csv files my %data1; shift(@lines1); # remove column headings from file shift(@lines1); # remove column headings from file foreach my $line (@lines1) { @words = split (/\,/, $line); if ($words[6] > 90) { my $abstime = $words[1]; my $payload = $words[5]; $srcIPhex = substr $payload, 24, 8; my $dstIPhex = substr $payload, 32, 8; my $timestamp = substr $payload, 152, 12; my $HashKey; # to get total number $HashKey = $srcIPhex.$abstime; $data1{$HashKey}{ID} = $words[0]; $data1{$HashKey}{SRC_IP} = $srcIPhex; $data1{$HashKey}{DST_IP} = $dstIPhex; MeasureFiles(\%data1); } sub MeasureFiles { my ($list_a_ref) = @_; my %data1 = %$list_a_ref; # Dereference lists .... .... foreach (keys %data1) { $SrcIP_captured = inet_ntoa( pack( "N", hex( $data1{$_}{SRC_IP} ) +) ); $DstIP_captured = inet_ntoa( pack( "N", hex( $data1{$_}{DST_IP} ) +) ); foreach(my $i=0;$i<$ind;$i++){ if ( $SrcIP_captured eq $SrcIP_ref[$i] && $DstIP_captured eq $ +DstIP_ref[$i]) { $pkt_received++; + } } } .... .... open(R1,">> $mainDirectory\\Results\\$file_result") || die("Cannot + Open File $file_result"); my $results = "$SrcIP_ref[$i],$DstIP_ref[$i],$pkt_received"; print R1 "$results\n"; close(R1); }

Replies are listed 'Best First'.
Re: Perl script end up on saying "Out of Memory !"
by moritz (Cardinal) on Sep 20, 2011 at 07:54 UTC
    Question : How can I erase or clear the hash before my perl script takes the second file ?

    The simplest way is to declare it in such a way that it goes out of scope when you stop processing the file. Something along the lines of:

    for my $filename (@files) { my %data1; # do all processing of file $filename here }

    Alternatively you can use undef %data1

    my %data1 = %$list_a_ref; # Dereference lists

    That doesn't just dereference, it also creates a copy. Do you want that?

      Thanks Moritz, for your guidance. Refer to your question. Infact I donot want the copy of hash inside a MeasureFiles subroutine. Can you help me may how to only get the reference and not the copy of the hash inside the routine. Thanks !
Re: Perl script end up on saying "Out of Memory !"
by armstd (Friar) on Sep 23, 2011 at 14:35 UTC

    Since it appears each file is processed independently of each other, and no state is maintained in memory, you might also consider forking processes to handle each file instead of doing it directly in one process. Your parent process won't be affected by any memory consumed by child processes.

    Also, if you 'exec "/bin/true"' or some-such instead of 'exit()' at the end of each child process, you'll find that memory frees up much faster than waiting for perl garbage collection, helping performance too.

    --Dave

Re: Perl script end up on saying "Out of Memory !"
by pvaldes (Chaplain) on Sep 23, 2011 at 19:25 UTC
    foreach my $line (@lines1) { @words = split (/\,/, $line); if ($words[6] > 90) { ... }

    I miss an else statment here, or maybe

    while my $line(@lines1) { @words = split (/\,/, $line,8); next if $words[6] <= 90; ...

    Foreach requires typically more memory than while (and you have several foreach loops), use while instead unless you have good reasons to use a foreach loop

    I am usign hashkeys to first put one file into the hash. And then search the specific text.

    If you have a lot of files and you expect a lot of non matching lines try to discard these undesired files/lines as soon as possible. Sounds to me like a work for grep, regexp and next

    You could want not to care for what's after the seventh field, if this is your case, put a max num of fields in split. Thus split should end before and require less memory.