Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Multithreading leading to Out of Memory error

by sundialsvc4 (Abbot)
on Jun 08, 2013 at 14:31 UTC ( #1037857=note: print w/ replies, xml ) Need Help??


in reply to Multithreading leading to Out of Memory error

It is also relevant to consider whether these could be processes instead of threads.   In the former case, an entire separate memory-management context is created; in the latter, both Perl’s memory manager and its own particular flavor of quasi-threads implementation is in effect throughout.   Threads all run in the same memory-management context (with suitable complications), which is not torn-down.   (My knowledge of the perlguts of the thread implementation is minimal; others here are experts and gurus.)

It would be useful to know if the same behavior occurs when there is only one thread, and/or when the processing is done sequentially in the main thread.   Does it, or does it not, foul-up after processing a certain number of files?   Does alteration of the number of threads, alter the point at which it hoses-up?   You should also note exactly which Perl version you are using.

Yes, “committing hari-kiri” is a legitimate way to forestall memory-leak problems especially in unknown processes.   (The technique is useless for threads, as described.)   FastCGI and mod_perl programs are sometimes deliberately arranged to process some n number of requests before they voluntarily terminate, upon which case the parent-process wakes up, reaps the child, then launches another copy until the pool of worker-threads is restored.   (Some separate provision would need to be made for the parent to be aware of end-of-job.)


Comment on Re: Multithreading leading to Out of Memory error
Re^2: Multithreading leading to Out of Memory error
by Anonymous Monk on Jun 08, 2013 at 14:56 UTC

    (The technique is useless for threads, as described.)

    Can you demonstrate that?

      BrowserUK, didn’t you “forget” to log in?

      There is nothing “personal,&rduo; nor technically un-informed, about my specific comments here, most specifically including the comment about processes vs. threads.   Threads in every programming system share a single process-level context; hence, the same memory-management system.   During the course of execution, a “leaky” procedure can, in time, accumulate an excess amount of unrecoverable memory.   In a context of threads, that memory is never cleaned-up, whereas by definition the entire context of a process, is.

      If the “hari kiri” approach didn’t work, as a way of dealing in a black-box fashion with leaky faucets, then it would not be the case that Apache, nginix, and PSGI (Plack) all have specific means by which to do just that.   My comments are technically valid at face value, as they were intended to be.   If you have disagreement, then (a) show yourself, and (b) comment about the technical statements, not the Monk making them.

        BrowserUK, didn’t you “forget” to log in?

        Wasn't me. I'm not the only one who's seen through your bs.

        And I've never flinched from having my calling of your bluff attributed to me. I fact, I'm kinda proud of it.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        ... i know i'm technically right ...

        So can you write a ten line perl program that shows this, yes or no?

Re^2: Multithreading leading to Out of Memory error
by joemaniaci (Sexton) on Jun 14, 2013 at 20:28 UTC

    So after a few days of intermittent network connectivity(data is on a networked drive) and testing I figured it out. I think.

    push(@array, $data);

    I went down the path of using only 1 thread for processing. I went down the path of parsing only a single file type. Once I started the behavior went away, and of course it wasn't until the final file type that the behavior came back. I looked through my code to see what it had that no other file type had and it was...

     @array = sort { $a <=> $b } @array;

    So I took out that code and tested again, the problem was still present. So I kind of commented things out piece-meal until I narrowed it down to the single line of code above. With that single line of code 3 additional MB of memory is used up as the thread leaves the parsing method for that particular file type.

    So here is the basic rundown of this file.

    sub parsefiletypeX { my $filename = shift; #get the directory from the filename(w/ directory) open(IN, $filename) or die... open(OUT, $outfile) or die... my $lineCount = 1; my $nextline = <IN>; my $headerlines; my $samplesize = 1; my @array; $nextline = trim_whitespace($nextline);#my subroutine ++$LineCount; #Did this before I learned about $. #read the header for the file(five lines) #Do a bunch of regex checks on the header lines #Read in the first record, which contains its own line of sub he +ader data as well as the two lines of actual data in pascal float for +mat I believe. Or fortran actually. #Regex on the first line and push a certain piece of data. This +line is NOT the bad line push( @array, @fi[4]);#1st line has 5 values #Regex checks on the next two lines #Then read in the rest of the 3-lined records until(eof(IN)) { #get the same three lines and do the regex checks #now the faulty call is made push( @array, @fi[4]); } //Do stuff to the @array, like sorting and determining certain c +hecks. close IN; close OUT; #call function that uploads a record to the database.

    ,

    So after a few days of intermittent network connectivity(data is on a networked drive) and testing I figured it out. I think.

    push(@array, $data);

    I went down the path of using only 1 thread for processing. I went down the path of parsing only a single file type. Once I started the behavior went away, and of course it wasn't until the final file type that the behavior came back. I looked through my code to see what it had that no other file type had and it was...

     @array = sort { $a <=> $b } @array;

    So I took out that code and tested again, the problem was still present. So I kind of commented things out piece-meal until I narrowed it down to the single line of code above. With that single line of code 3 additional MB of memory is used up as the thread leaves the parsing method for that particular file type.

    So here is the basic rundown of this file.

    sub parsefiletypeX { my $filename = shift; #get the directory from the filename(w/ directory) open(IN, $filename) or die... open(OUT, $outfile) or die... my $lineCount = 1; my $nextline = <IN>; my $headerlines; my $samplesize = 1; my @array; $nextline = trim_whitespace($nextline);#my subroutine ++$LineCount; #Did this before I learned about $. #read the header for the file(five lines) #Do a bunch of regex checks on the header lines #Read in the first record, which contains its own line of sub he +ader data as well as the two lines of actual data in pascal float for +mat I believe. Or fortran actually. #Regex on the first line and push a certain piece of data. This +line is NOT the bad line push( @array, @fi[4]);#1st line has 5 values #Regex checks on the next two lines #Then read in the rest of the 3-lined records until(eof(IN)) { #get the same three lines and do the regex checks #now the faulty call is made push( @array, @fi[4]); } //Do stuff to the @array, like sorting and determining certain c +hecks. close IN; close OUT; #call function that uploads a record to the database.

    I even tried clearing out that array after I was done using it, such as...

    @array = (); undef @array;

    But this has no effect!?!? So what is going on?

      So what is going on?

      There is simply not enough information here to even begin to guess.

      If you want this debugged, you are going to have to find some way around your 'can't post the real code' problem and supply -- publicly or privately -- real, runnable source code + sample data that demonstrates the problem. If not, you're on your own I'm afraid.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        To get the same behavior I would have to mock the structure of the file with false data, but even the structure of the file is classified. So if I made an attempt to approximate the structure/data of the file, that would be fine, but knowing my luck, the problem would go away. I'll just keep whacking at it.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1037857]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (16)
As of 2014-10-23 09:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (125 votes), past polls