Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
OK, another try with

tags as suggested:-) Hi all,

Sorry if this is a dump/redundant question but I couldn't find a definitive answer anywhere so I decided to ask for collective wisdom.

I am working on data classification and machine learning project. So I have a large data set that I need to process. The job will run on multi-core CPU and since the data set items are independent the set can be split for processing into multiple units in order to take advantage of the multicore CPU.

Obviously the first things that come to mind are threads and forking. I wrote a version based on forking and it works fine but it is a RAM hog b/c when you fork, every new process is a copy of the parent and the parent in my case is quite large b/c it loads an AI model that consumes about 1GB of RAM. So each child becomes a 1GB monster and I run the risk of either thrashing the swap, which kills performance or running out of RAM altogether if another process kicks in somehow.

With threads it seems that it would be easier since threads have access to global variables defined in the parent, so all spawned threads would share the same AI model and I won't have multiple 1GB copies of the parent. In the thread case I obviously have to worry about locking but that's not an issue as I can implement it. The bigger issue is that it seems that Perl threads live INSIDE the spawning process, so they don't get scheduled on separate CPUs but simply compete for run time within the spawning process. I tried some tests and indeed on Linux the "top" command shows only one Perl process running on one of the 8 available CPUs even though I have 8 threads running. So with threads I am not achieving any speedup on multi-core CPUs.

Does anybody know if Perl supports kernel threads that the OS can then schedule on multiple CPUs? I read the Perl thread tutorial and all it says is that each thread loads a new Perl interpreter. But from what I see that doesn't result in a new runnable object separate from the spawning process that can be scheduled to run on a CPU other than the one used by the spawning process. That said, are there any modules on CPAN that provide through parallelization of tasks so that Perl can take advantage of multiple CPUs? Any help is appreciated. Thanks.


In reply to Perl Threads and multi-core CPUs by haidut

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2024-03-19 07:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found