Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Why use threads over processes, or why use processes over threads?

by pg (Canon)
on Nov 11, 2003 at 04:32 UTC ( [id://306063]=perlmeditation: print w/replies, xml ) Need Help??

Whether to use threads over processes, or to use processes over threads? I am sure that, not just we monks but all Perl users have quite different ideas on this, and could go totally opposite directions.

It is neccessary to have a thread for this, so that everyone can have a voice. We should make this an open-end discussion, as I strongly believe a healthy and fruitful discussion should be like that.

Multi-process and multi-thread are two types of concurrent programming models, and all languages have the choice to implement them. Although the models are not special to Perl, Perl does have its specials on this topic, that is Perl's thread is quite new, and unfortunately started with a not-ideal design, but I still would like to use threads over processes.

Now let me go the main topic, why use threads over processes? (well, I am sure lots of replies will be why use processes over threading ;-)

Creating a new process can be expensive. It takes time. (A call into the operating system is needed, and if the process creation triggers process rescheduling activities, the operating system's context-switching mechnism will become involved.) It takes memory. (The entire process must be replicated.) Add to this the cost of interprocess communication and synchronization of shared data, which also may involve calls into the operating system kernel, and threads provides an attractive alternative.

Threads can be created without replicating an entire process. Futhermore, some, if not all, of the work of creating a thread is done in user space rather than kernel space. When processes synchronize, they usually have to issue susyem calls, a relatively expensive operation that involves trapping into the kernel. But threads can synchoronize by simply monitoring a variable - in other words, staying within the user address space of the program.

The performance difference is also a winning point for thread over processes. The testing and prove of this itself can be a very big topic, and there are lots of numbers and well accepted testing results, which you can find all over the places.

Yes, multi-threading programming is more complex, and requires a different set of skill, and more importantly a different set of mind. I realize that this is actually one of the major reason why lots of people tried to avoid it from the beginning.

Back to Perl 5.8 thread, obviously it has lots of problems, such as unbelievable huge memory leaks, lacking of basic functionalities, but those are problems of Perl's implementation, not a problem of the threading model itself. I trust Perl creators are trying their best to fix those. In fact, if you read Perl 5.10's threading doc, the memory leaks are getting tamed, if not entirely.

One reason that you can start/continue with perl thread, even with so many bugs is that, I trust they will try every effort to fix those with minimum modification to the existing interface, as they are wise people. You can see this pattern base on their design considerations towards Perl 6. On one hand, people are benefited from Perl's rich functions and unique strong points, at the same time, Perl creators are trying to attract people, and there are certain conventions they have to follow.

More threading functions (interfaces towards end users) will be added, but that does not negatively affect existing Perl prorgams done in multi-threading. To be frank, you can never avoid modification to existing programs, regardless of how well it is designed and implemented, unless it will soon be replaced by something else. This because, people always want to utilize more (if not the most) advanced new technology (technology is progressing al the time), as well as user requirements are changing with active applications

Perl 5.8 threading does function, and you can expect your multi-threading programs getting better result with future Perl versions without much modification to your application. For example, most likely, their fixing of momery leaks would not affect your coding at all (from a static point of view, not from the execution point of view, obviously you would get better execution results.)

When you design or implement something, you should always try to create a foundation that survives the future, not just today.

Now go back to the level beyond Perl again. There is a good reason why Perl creators wanted threading in Perl. Without threading, Perl's usage will be framed tightly, Perl creators obviously realized this. Remember Perl is no longer a traditional scripting language, it is a violation of the creation purpose, to go without threading.

For some applications, people and firms, it might be a good idea to wait for a while before picking up thread, but picking up thread will eventaully becomes a decision one cannot avoid.

To be frank, the technology trend of using threading over processing whenever possible, is already a long gone old time story to the computer science itself, but unfortunately in the Perl world, real thread is quite new.

As some Perl programmers are still trying to avoid threading, lots of people were trying to avoid perl simply because it didn't support multi-threading. This is real story around me, and I am just telling the truth.

When some of us are trying to avoiding thread, Perl creators is using threading as a selling point. Is that pure commercial? I don't think so.

  • Comment on Why use threads over processes, or why use processes over threads?

Replies are listed 'Best First'.
Re: Why use threads over processes, or why use processes over threads?
by chromatic (Archbishop) on Nov 11, 2003 at 06:13 UTC
    Creating a new process can be expensive.

    Just for kicks, read the code required to create a new thread in Perl. Any operating system that can't launch a new process more quickly than Perl can launch a new thread is severely broken.

    If you like threads better, that's fine, but this line of reasoning doesn't hold much water. You're probably going to hit the kernel when creating a thread — you need things like memory, at some point. You're going to hit the scheduler, at some point, at least with system threads. (Of course, that's not what Perl has.)

    Throwing out handwavy performance arguments altogether, your arguments are:

    • They're advertised as working in Perl.
    • They'll be in Perl 6.
    • Some people think they're necessary.
    • They're less buggy than they were in previous versions of Perl.

    I'd also add that sharing data between threads tends to be nicer than messing with shared memory portably, mostly because Unix shared memory never really made sense to me. That, to me, is a compelling reason. I mostly ignore the others.

      ++ This is very accurate for me as well. I've done a bit fork programming and a bit of thread programming. Sharing data with forks has always struck me as "funky".

      I have another good reason to use threads. According to the threads doc, fork under windows is a hack using threads. Many of my programs run on Windows. I believe this sometimes led to "funky" (seemingly unforkish) behavior when I tried to use forks.

      As listed in the threads doc:

      Prior to perl 5.8 this has only been available to people embedding perl and for emulating fork() on windows.

      So I pretty much just use threads now when I program for something that will run under windows. Thankfully, the memory leaks are being weeded out :) It would be really nice to get the best out of both worlds with threads, but I suppose it will just take time to polish it.

      smiles

      Just for kicks, read the code required to create a new thread in Perl. Any operating system that can't launch a new process more quickly than Perl can launch a new thread is severely broken.

      Is that a consequence of implementation, or is it a real design flaw in Perl's threading code? I really don't know enough about threading implementation to know, but I suspect this is a problem that is fixable in future versions.

      ----
      I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
      -- Schemer

      : () { :|:& };:

      Note: All code is untested, unless otherwise stated

        That's a consequence of implementation. Creating a new thread means cloning a Perl interpreter. That requires a lot of work. That's where something like Thread::Pool is handy; you still pay the price for launching threads, but you do it at startup.

        I'd also add that sharing data between threads tends to be nicer than messing with shared memory portably, mostly because Unix shared memory never really made sense to me.
      Personally, I was never impressed with it on VMS. On Unix it's an absolute dream in comparison. (At least this is what my unreliable synapses are telling me.)

      Shared memory is not perfect (what is, come to that), but it allowed me on one project, years ago, to share "live, up-to-date" data between users at twelve different sites, and did so in a reliable, relatively stable way. I've not done much with threading, so my view may be prejudiced, but you would have to prove to me that threading would be a better solution on such a situation. I'd listen, but I'd listen with a cynical ear.

      --
      tbone1
      Ain't enough 'O's in 'stoopid' to describe that guy.
      - Dave "the King" Wilson

Re: Why use threads over processes, or why use processes over threads?
by Aristotle (Chancellor) on Nov 11, 2003 at 07:23 UTC

    Update: please note I'm talking about threads in the general sense. See my clarification on Perl threads.

    I don't see the appeal of threads. Modern kernels on CPUs with modern MMUs can fork processes with very little effort and switch them pretty quickly. I expect the performance will increase further with time.

    I don't even care that much about the performance argument - but it used to be a big problem once upon a time long past, so I thought I should get that out of the way first.

    And then there's the real argument: safety. Forked processes default to not sharing; threads default to sharing everything. With the former, you have to explicitly share what you desire to be shared while with the latter you have to explicitly make thread local copies of sensitive data.

    Every person in their right mind will tell you that the correct approach to security is to disallow by default and exempt desired interactions. Every Perl programmer worth their salt untaints data by denying anything but explicitly permitted input. The list goes on; the correct approach is always to disallow by default and explictly permit where desired.

    Threads break this fundamental principle.

    This is hard to argue with - by using threads you inevitably expose yourself to potential for all sorts of bugs. Since correctness is the primary concern in software development, and all else is secondary, I don't really see any choice but forking.

    The current state of affairs is not perfect of course; shared memory or other forms of IPC are harder to use in practice than they ought to be.

    Makeshifts last the longest.

      And then there's the real argument: safety. Forked processes default to not sharing; threads default to sharing everything. With the former, you have to explicitly share what you desire to be shared while with the latter you have to explicitly make thread local copies of sensitive data.

      Perhaps you haven't actually looked at the docs for threads? To wit:

      It is very important to note that variables are not shared between threads, all variables are per default thread local. To use shared variables one must use threads::shared.
        Actually, I know (and knew) about this and thought about it while writing my reply; but I was talking about threads in the general sense, not the threading model found in Perl 5.6+. For all intents and purposes, "threads" in 5.6+ are userland forks. So in Perl I have the choice between kernel forks and userland forks (whose performance and memory use is as is to be expected - just what kernel forks used to be once upon a time) - now guess which ones I'll prefer. At least on sane OSes where kernel forks are available..

        Makeshifts last the longest.

      Forked processes default to not sharing; threads default to sharing everything.
      Um, no.. Unless you're talking about the 5005threads, *they* default to sharing everything. The newer ithreads default to sharing nothing (everything gets copied at the point you start the thread, after that each thread only uses its own copy of the variables).

      Personally, I found the 5005 behaviour easier to program with than the new one, but that's possibly just me..

      C.

        Personally, I found the 5005 behaviour easier to program with than the new one, but that's possibly just me..

        Nope, not just you. The 5005 way was, more or less, following the standard model of sharing found elsewhere. If I had to describe the new thread model (and I've had to on a few occassions) it is as if it were a slow, bloated fork emulation but with convenience functions for synchronizing access to shared data.

      Modern kernels on CPUs with modern MMUs can fork processes with very little effort and switch them pretty quickly.

      Win NT (any version) processes are quite expensive, and a significant amount of Perl code runs on Win2k web servers. This is a stark contrast with Linux, where processes are very cheep (they have to be, since Linux threads are almost identical to processes).

      Forked processes default to not sharing; threads default to sharing everything . . . correct approach to security is to disallow by default.

      I don't think you can make the analogy between taking user input and a process/thread model. In taking data from (for example) a CGI form, you usually have no idea where the information is coming from, so it is reckless to not validate it. In sharing data between processes, you presumably control everything that happens between the two processes. The data shared is no less untrustworthy than the data you pass between subroutines. If you happen to be in a situation where you don't control what happens in one of the procesess or threads, then you definatly need to do validation. However, I doubt such a situation pops up much.

      ----
      I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
      -- Schemer

      : () { :|:& };:

      Note: All code is untested, unless otherwise stated

        Win NT (any version) processes are quite expensive
        Windows doesn't even have forks, so the point is moot anyway. Notice how the Perl 5.6 thread model was largely an attempt to emulate fork() for platforms which don't have it (even if that wasn't its stated goal, it certainly made that impression).
        In sharing data between processes, you presumably control everything that happens between the two processes.
        "Presumably" being the keyword, because this is about the effect of a) bugs and b) security holes. With threads, both occurences can kill off your entire application. With forked processes, they can only affect the child in question except where a resource is explicitly shared.

        Makeshifts last the longest.

Re: Why use threads over processes, or why use processes over threads?
by Abigail-II (Bishop) on Nov 11, 2003 at 09:49 UTC
    Let's first address some of your points.

    Threads can be created without replicating an entire process. Futhermore, some, if not all, of the work of creating a thread is done in user space rather than kernel space. When processes synchronize, they usually have to issue susyem calls, a relatively expensive operation that involves trapping into the kernel. But threads can synchoronize by simply monitoring a variable - in other words, staying within the user address space of the program.
    It's not clear to me what kind of threads you are talking about. See, threads have been invented repeatedly, and in incompatible ways. Many modern operating systems have kernel level threads - that is, the kernel itself is aware of threads, and it may even treat them as processes when it comes to sceduling (but not all OSses do). OTOH, threads can be implemented entirely in a single process, without the OS being aware of it. I think some implementations of Java do it this way.

    Now, forking on modern OSses is *cheap*. From the Linux fork() manual page:

    Under  Linux,  fork  is  implemented  using  copy-on-write
    pages,  so  the  only penalty incurred by fork is the time
    and memory required to duplicate the parent's page tables,
    and to create a unique task structure for the child.
    

    Now, if we look at how Perl threads are implemented, you'll notice that quite a lot of work needs to be done to start a thread. Far more work then needs to be done than a kernel level fork().

    The performance difference is also a winning point for thread over processes. The testing and prove of this itself can be a very big topic, and there are lots of numbers and well accepted testing results, which you can find all over the places.
    Could you please post some benchmarking results of Perl threads vs Perl forking? Or post pointers to them?
    When some of us are trying to avoiding thread, Perl creators is using threading as a selling point.
    They are? I guess that's why ./Configure -des; make creates a threaded Perl. Oh wait, it doesn't.

    Here are some reasons why I prefer to use forking over threading:

    • If you want thread, you need to build a thread capable perl. But a thread capable perl is slower than a non-thread capable perl, even if you don't use threads. You're *always* paying a price.
    • The fork()ing behaviour is cross-platform - or at least, it works the same on platforms I care about. Any mainstream UNIX knows how to fork(), and I get similar behaviour in Perl as in C. This kind of cross-platform behaviour (same in Perl as in C) is important to me (and probably others as well).
    • Sharing data between threads in Perl is off by default. You need to enable it, and each access needs to be mutexed anyway. Might as well use shared memory.
    • Forking is cheap, and done by the kernel. Perl threads are expensive and done in userland. Which for Perl means: bugs.
    • I don't need a specially build Perl.

    Abigail

Re: Why use threads over processes, or why use processes over threads?
by BrowserUk (Patriarch) on Nov 11, 2003 at 13:10 UTC

    Threading ostensibly has two main advantages over forking.

    1. Spawning a thread is, on those OS's that support this natively, cheaper in terms of memory and CPU, than forking a process.

      This is because a thread should consist of little more than a scheduler object containing a set of registers, a stack segment and some scheduler administration state. Unlike a process, there should be no need to copy large amounts of memory, as threads should be able to re-use the existing process' copy of memory.

    2. Inter-thread communication is inherently faster and cheaper then inter-process communication.

      It's the very fact that memory is shared between the threads, that make this possible.

    The problem with perl's implementation of threads is that the non-reentrancy of perl's code segments, combined with the lack of a suitable engrained model for semaphores and synchronisation. The result is the only way to approach a 'safe' and cross-platform implementation of threads in perl is to emulate forking. Performing this emulation throws away both advantages that threading had to start with.

    Despite heroic efforts on behalf of the developers, the underlying, rather old-fashioned, non reentrant nature of perl's core requires that everything, code and data segments be copied. This completely wipes out both advantages. First, the copying is done in user mode rather than kernel mode, throwing away all the years of optimisations and testing that now exists in kernels that support forking natively. Having copied everything, each thread now has it's own copy of every piece of memory, throwing away the benefits of COW on those platforms that support it, and requiring hookey and clumsy a bit-at-a-time, duplication and serialisation of all shared data, effectively throwing away the second advantage of threads -- direct, fast access to shared memory.

    The result is, that perl threading implementation as is, is at best utilitarian, and at worst, broken. This is unsurmountable given the nature of perl's core as is, and would only be fixable with a complete re-write of the perl core. The problem runs very deep. Even the POSIX C-runtime that underlies so much of the perl core is inherently non-reentrant, and without reentrancy built-in from the lowest levels, making effective use of threads, where these are natively fast and efficient, simply isn't possible.

    Personally, I am pinning my hopes on Perl 6 (and maybe Ponie) to bring the true benefits of threading to perl, but I have my fears that the predominately unix-guru based nature of the development teams, and indeed, the whole development processes, mean that even these will likely concentrate on the strengths of unix-based models and modes of operation, and that will prevent the kernel from being tuned (or tunable) for models and modes of operation that originate outside of the unix platform.

    Sadly, I don't see the cross-platform support at the deepest levels of perl development improving to the point where those of us that use perl on non-unix or unix-clone platforms will ever truly gain access to the kernel strengths of our platforms. The nature of 'other' platforms isn't that they are worse or inferior to unix, nor that they are better or superior to unix, they are simply different. They tackle the same set of problems and choose different solutions. It might be possible to come up with a perl 'model' of the OS, that is flexible enough to encompass all underlying platforms in a transparent(ish) manner if sufficient effort where expended to do so, but while the development process continues to be "Do what works on unix and then try and force fit that mechanism onto non-unix platforms.", the lot of those who choose or have to use non-unix platforms will always be one of playing catch-up on builds, reliance upon 3rd-parties, and a mismatch between what we know is possible and what we can actually do.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!
    Wanted!

      non reentrant nature of perl's core
      This non-reentrant nature of perl's core goes much deeper than many people realize. It's not that there's a bunch of function that are reentrant, and that perl now has to call the *_r equivalents. It goes deeper. Even things that on the Perl level are static are non-reentrant. Like fetching the value of a scalar. For instance, print $var can't be done in parallel, because $var might not have the pOK flag set, which means the Perl has to convert the numerical value of $var to a string value - which means that the underlaying datastructure is modified. (This is the reason why variables are not shared by default).

      Abigail

        Agreed. I recently spent a lot of time exploring ways of trying to untilise threads more effectively (under Win32), and discovered the nature and scale of the problem. It hasn't put me off of the benefits of threads over processes in my environment, nor in the wider context where they are a natural part of the system. It has, however, put me off of trying to utilise them to any great extent from perl.

        They still have their utility under Win32, as the absence of a native fork, means that there is little or nothing to choose between forking and threading, but using threads does expose a little (very little) more control of the multi-processing facilities than the fork emulation.

        I wish I could be more enthusiastic, but the deep rooted nature of perl's non-reentrancy, as you noted, make it near impossible to improve the situation, give the current implementation of perl.

        From what I have understood of the Parrot architecture, the objectification of the fundemental types, should mean that it would (will?) be possible to cheaply and transparently serialise access to individual scalars. Embedding a semaphore within every scalar, and using it for every write operation may sound horrendous, but applied at the core level, it shouldn't be that problematic or costly. However, if it is not done at the core level, trying to retroactively add it would be disasterous.

        Time will tell, but I won't be holding my breath.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        Hooray!
        Wanted!

      This is because a thread should consist of little more than a scheduler object containing a set of registers, a stack segment and some scheduler administration state. Unlike a process, there should be no need to copy large amounts of memory, as threads should be able to re-use the existing process' copy of memory.

      You could change a few words and the same would apply to forking a process on Linux and, I believe, *BSD. (I also believe it fits to most modern Unix variants these days, but I've only read the source for Linux.)

        I'm not familiar with linux or most other unixes, but there would have to be at least the replication of filesystem objects -- duping of open file handles, sockets etc. and associated state.

        I would imagine that it also requires the creation of handles to the existing memory objects in order to handle COW etc.

        In addition, every write, which on the evidence of Abigail above, can frequently mean a perl-level read, will result in a memory copy operation (though I'm not sure what the granularity is). There will also be some amount of overhead associated with detecting writes to shred memory segments. Whether this is a software or hardware interupt, the effect upon L1 and L2 caches etc. can be expensive too.

        It's unclear to me how forking handles other shared handles like DB connections, hardware connections to tape drives, serial ports and the like, but I think that it is probably down to the user to handle this rather than fork.

        None of these things is individually expensive, but the convenience of spawning a thread, without requiring any of this is considerable. The greatest use, and the greatest benefit from threads is for performing asynchronous reads (from whatever). This use is simply not possible with forks. The select model just doesn't compare for usability, and event-driven models require you to throw away even standard structured programming techniques, never mind object-oriented models and revert to relying upon global state.

        Finally, the benefits of co-routines are totally absent from the forking model, but are almost trivial to implement using threads.

        I don't see threads and processes as an either/or proposition. In an ideal world, the programmer would have both spanners in his toolkit, and would be free to choose whichever is appropriate for the task at hand. For some tasks one is appropriate, for others, the other. In some cases a mixture of the two makes perfect sense, if the underlying system supports both efficiently. The best choice will sometimes be dictated by the underlying system.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        Hooray!
        Wanted!

        Of course, the big difference between threads and processes remains the fact that the latter need their personal page tables. There is probably quite a bit of room for optimization of this process on the MMU design level left though. (Why recreate the entire page table set? COW could be applied there as well.)

        This is not going to happen overnight, but I'm certain that at some point, the effective overhead of processes over threads will be zero. It is very close already, but not there yet.

        Makeshifts last the longest.

      Does it make you more comfortable with the future of threading to realize that elian was for years the threading champion for Perl 5?

        Yes. I have no doubt as to the quality or skills of any of the people involved in the perl development -- be it P5, or P6. Were it that I was half as good as most of them.

        My only fear is that a lack of non-unix platform people in that august band will result in a lack of consideration to non-unix concepts and strengths.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        Hooray!
        Wanted!

Re: Why use threads over processes, or why use processes over threads?
by liz (Monsignor) on Nov 11, 2003 at 11:48 UTC
    A lot has been said in this thread already. And in Things you need to know before programming Perl ithreads I describe the current status of threads in Perl as I see it.

    I agree with chromatic's comments. Perl threads currently provide an interface to doing threads in Perl over doing non-portable stuff in shared memory. It makes it easy and portable. It's not really production ready, especially for a mod_perl environment.

    I also agree with most of Abigail-II's comments. Fork() is very cheap on modern *nix systems.

    Which is why my forks.pm module gives you a bit of the best of both worlds: the memory savings of fork() and the ability to use a standard Perl executable (even with 5.6.x!) and the portable interface of threads.pm and its shared variables. At the expense of CPU and latency: there is always a price to pay!

    Liz

      liz, do you have any benchmarks (performance and memory usage) of threads vs. forks.pm?
        I thought I could easily adapt Benchmark::Thread::Size to use forks, but there is more to it than I can easily fix right now. This will probably need new versions of both forks and Benchmark::Thread::Size. In the meantime, I'm posting the result of using threads on Mac OS X:
        $ perl5.8.2-threaded -MBenchmark::Thread::Size Performing each test 10 times (ref) 10 100 # (ref) + 0 1730 6 1 2085 6 2 2374 6 5 3235 6 10 4664 20 7536 4 50 16144 2 100 30489 8
        I'll post the result using forks as a response to this node.

        Liz

        There still seems to be a problem with processes not getting reaped when calculating sizes. The combination of Benchmark::Thread::Size and forks becomes something akin to a fork bomb. Preliminary size test shows this:
        $ perl5.8.2-unthreaded -Mforks -MBenchmark::Thread::Size=times,1 (ref) 1 100 # (ref) + 0 3364 1 2836 2 2800 5 2800 10 2800 20 2800 50 2800 100 2804
        but I'm starting to have doubts whether the size calculation is correct. I don't have time to look into it further right now...

        This test was done with my internal development versions of forks and Benchmark::Thread::Size. Don't try this at home with the release versions yet!

        Liz

Re: Why use threads over processes, or why use processes over threads?
by Anonymous Monk on Nov 11, 2003 at 06:17 UTC
    The performance difference is also a winning point for thread over processes.

    Excuse me? Perhaps you missed this post by liz discussing Perl's ithreads. Sure, in theory threading might be better than forking, but in practice (in Perl) it is anything but "lightweight" or "fast". And that's not even considering leaks or bugs, that's just considering the current design. I'll stick with fork for concurrent programming on *nix, and one of these days I'll get around to trying out liz's drop-in forks replacement for threads.

      And that's not even considering leaks or bugs, that's just considering the current design.

      To be fair, I believe that they didn't mean to say memory leak is feature. We all know that, to use big amount of memory is different from momery leak.

      I guess what they meant was that the current thread design and implementation in Perl took lots of memory.

      I think what they tried to do is to make it work first, in a easier way, and try to cut down memory usage at a later point. I agree it is not ideal, and not a good design, but I believe that all of us, at some point, have experienced the situation where you have to deliver your project phase by phase.

      Really, take a look at 5.10 doc on thread, they are fixing the memory leaks, and they also did something to cut down the meory usage for thread.

Re: Why use threads over processes, or why use processes over threads?
by castaway (Parson) on Nov 11, 2003 at 08:14 UTC
    Here's my take on it:

    I learned perl using the Camel, starting at the beginning.. Threads have more coverage there than forking, in fact, forking is mentioned only as a way of doing interprocess communication. Which would make sense to me, were I aiming to create separate processes, which I havent wanted to do, yet.. (Actually I did once, but then I just split my program into two, and communicated using sockets, which was easier in the end..)

    I don't come from a unix/C background, so I've no history of forks, the why and how of them, and since threading works fine for me, I'll continue to use it (both 5005 and ithreads)

    None of this says if threads are better than forks, or vice versa, the only thing I know about forks is that it seems to be more difficult to communicate between forked processes (using Thread::Queue or shared vars in threads is easy..) So, no clue there.

    I'd say, threads for self-contained programs in which the threaded bits need to communicate, and forks for extra processes which do something entirely separate.. I'd assume that people use forks for things I'd use threads for just because perl threads weren't seen as any good for a long time, but that's probably just my opinion :)

    C.

    (Nodes are threaded, not forked, here, makes perfect sense to me :)

      I did notice that, many people (not all) from a unix/c background, tend to use threads over processes. I believe this is largely because unix/c thread has been very mature for quite a long time already, and on the same page, fork has been obsoleted in lots of those people's mind for quite a long time.

      I acknowledge that Perl has its unique situation, however as Perl is largely based on c, I don't see a reason why Perl will not follow the same path, although the trend is just started.

      If I didn't observe it wrong, Perl creators are pushing the use of Perl thread, and the more mature perl thread is, the more of this kind of push will come from them.

        I did notice that, many people (not all) from a unix/c background, tend to use threads over processes.

        I noticed the exact opposite.

        I acknowledge that Perl has its unique situation, however as Perl is largely based on c, I don't see a reason why Perl will not follow the same path, although the trend is just started.

        Perl copies the entire interpreter when a thread is created. This is slow, inefficient and memory consuming. With forking on the other hand, all the modern platforms use Copy-on-Write, meaning that most of the interpreter is never copied, but stays shared all the time. A fork is fast, efficient and by itself doesn't use a lot of memory.

        Perl's C roots have almost nothing to do with this all. Perl is VERY different from C. It just happens to have a few similar functions, and a syntax that to some looks much alike.

        Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

        I was actually observing (in myself and others), the exact opposite, unix/c people tend to prefer forks over threads. (I don't belong to that group..)

        C.

Re: Why use threads over processes, or why use processes over threads?
by Abigail-II (Bishop) on Nov 11, 2003 at 15:49 UTC
    I did some benchmarking of forking vs threading. Below is a program that compares the running times of programs that fork and that use threads. Two programs are compared: first a program where the children don't do much (exit for the fork, empty sub for the threads). In the second program, the children print a single character to /dev/null.
    #!/usr/bin/perl use strict; use warnings; my $perl = "/opt/perl/bin/perl"; my $thrperl = "/opt/perl/bin/thrperl"; my $action = << '--'; open my $fh => "> /dev/null" or die "open: $!"; print $fh "."; close $fh; -- my $prog = << '--'; use strict; use warnings; my $times = shift; my @pids; foreach (1 .. $times) { defined (my $pid = fork) or die "fork: $!"; unless ($pid) { ACTION; exit; } push @pids => $pid; } map {waitpid $_ => 0} @pids; -- my $thrprog = << '--'; use strict; use warnings; use threads; my $times = shift; my @threads; foreach (1 .. $times) { my $thread = threads -> create (sub {ACTION}) or die "threads: $!" +; push @threads => $thread; } map {$_ -> join} @threads; -- my @amounts = (0, 5, 10, 25, 50, 100, 250, 500, 1000, 2500); print STDERR "Empty childs\n"; (my $t_prog = $prog) =~ s/ACTION//; (my $t_thrprog = $thrprog) =~ s/ACTION//; for my $amount (@amounts) { my $format = sprintf "%5d: %%U user, %%S system, %%E elapsed" => $ +amount; system time => '-f', "(F) $format", $perl, '-e', $t_prog, $a +mount; system time => '-f', "(T) $format", $thrperl, '-e', $t_thrprog, $a +mount; } print STDERR "\nChilds writing\n"; ($t_prog = $prog) =~ s/ACTION/$action/; ($t_thrprog = $thrprog) =~ s/ACTION/$action/; for my $amount (@amounts) { my $format = sprintf "%5d: %%U user, %%S system, %%E elapsed" => $ +amount; system time => '-f', "(F) $format", $perl, '-e', $t_prog, $a +mount; system time => '-f', "(T) $format", $thrperl, '-e', $t_thrprog, $a +mount; } __END__ Empty childs (F) 0: 0.01 user, 0.00 system, 0:00.01 elapsed (T) 0: 0.04 user, 0.00 system, 0:00.03 elapsed (F) 5: 0.01 user, 0.00 system, 0:00.01 elapsed (T) 5: 0.09 user, 0.00 system, 0:00.09 elapsed (F) 10: 0.01 user, 0.01 system, 0:00.02 elapsed (T) 10: 0.16 user, 0.00 system, 0:00.15 elapsed (F) 25: 0.01 user, 0.02 system, 0:00.03 elapsed (T) 25: 0.32 user, 0.02 system, 0:00.34 elapsed (F) 50: 0.03 user, 0.04 system, 0:00.06 elapsed (T) 50: 0.60 user, 0.08 system, 0:00.70 elapsed (F) 100: 0.04 user, 0.08 system, 0:00.11 elapsed (T) 100: 1.25 user, 0.11 system, 0:01.42 elapsed (F) 250: 0.04 user, 0.12 system, 0:00.50 elapsed (T) 250: 3.41 user, 0.24 system, 0:03.87 elapsed (F) 500: 0.02 user, 0.14 system, 0:01.16 elapsed (T) 500: 7.37 user, 0.55 system, 0:08.42 elapsed (F) 1000: 0.05 user, 0.19 system, 0:02.49 elapsed A thread exited while 46 threads were running. (T) 1000: 15.02 user, 1.36 system, 0:17.42 elapsed (F) 2500: 0.04 user, 0.71 system, 0:06.73 elapsed Command terminated by signal 9 (T) 2500: 37.39 user, 3.30 system, 2:10.56 elapsed Childs writing (F) 0: 0.06 user, 0.21 system, 0:01.60 elapsed (T) 0: 0.20 user, 0.23 system, 0:01.66 elapsed (F) 5: 0.01 user, 0.01 system, 0:00.03 elapsed (T) 5: 0.12 user, 0.02 system, 0:00.28 elapsed (F) 10: 0.03 user, 0.00 system, 0:00.07 elapsed (T) 10: 0.17 user, 0.04 system, 0:00.37 elapsed (F) 25: 0.02 user, 0.06 system, 0:00.08 elapsed (T) 25: 0.49 user, 0.09 system, 0:01.07 elapsed (F) 50: 0.04 user, 0.21 system, 0:00.46 elapsed (T) 50: 0.73 user, 0.12 system, 0:01.72 elapsed (F) 100: 0.04 user, 0.11 system, 0:00.14 elapsed (T) 100: 1.22 user, 0.06 system, 0:01.29 elapsed (F) 250: 0.08 user, 0.12 system, 0:00.52 elapsed (T) 250: 2.93 user, 0.30 system, 0:04.29 elapsed (F) 500: 0.06 user, 0.20 system, 0:01.30 elapsed (T) 500: 6.29 user, 0.67 system, 0:07.79 elapsed (F) 1000: 0.03 user, 0.24 system, 0:02.50 elapsed A thread exited while 46 threads were running. (T) 1000: 11.52 user, 1.46 system, 0:13.84 elapsed (F) 2500: 0.06 user, 0.72 system, 0:06.78 elapsed Command terminated by signal 9 (T) 2500: 26.72 user, 13.98 system, 3:57.20 elapsed

    As we can see, forking is much faster than creating threads. Furthermore, creating more than a thousand processes is not a problem - creating a thousand threads *is*. And what you didn't see was the slow response of my system when the threads were being handled, and neither did you hear the grinding of the disk when dealing with threads (2500 processes however was hardly noticeable).

    Abigail

      Excellent point made here.

      And it shows a key element of this entire debate. The decision to fork or to thread should be based on need, form and function of the code. In some cases, it's is clear that fork is the better solution, others it is thread. A good programmer / developer will try to have both skills under their belt (though I'm not quite a that level, yet).

      Another consideration, though, is the fact that not all systems have Perl threading available. Some systems do not have a version of Perl installed that allows threads, and may not be able to install an outside module that will allow it (like the place where I work). I know that will sound unusual, but consider the fact that some places hold a VERY tight control over what is installed from an outside source. If it is a secure (i.e. DoD, NSA, etc.), the site may not allow additions to the system that way becaused they are not 'blessed' by certain security-minded individuals.

      All in all, sometimes the 'better' choice isn't an option. Best to have all the tools you can available in your own head.

      Can you do us a favor, modify your thread program a little bit, and get another set of number (under the same testing environment)?

      My suggested modification is to detach threads when they get created, so you explicitly tell Perl that, no join will be done later, and Perl doesn't need waste time/space to keep return structures. (Trust this situation will be changed in future versions.)

      threads->create(\&blah)->detach();

      I really appreciate it.

        Well, I gave you the source, so you could have easily done this yourself. Now I had to first compile a threaded Perl (as I'm on a different machine now). Anyway, I modified the program so it does it all four times: fork and wait, fork with SIGCHLD being ignored, joining threads, and detachted threads, with labels Fw, Fi, Tj and Td. Not surprisingly, there's not much difference between Fw and Fi, and between Tj and Td.
        Empty childs (Fw) 0: 0.01 user, 0.01 system, 0:00.02 elapsed (Fi) 0: 0.01 user, 0.00 system, 0:00.01 elapsed (Tj) 0: 0.05 user, 0.00 system, 0:00.06 elapsed (Td) 0: 0.05 user, 0.00 system, 0:00.06 elapsed (Fw) 5: 0.01 user, 0.00 system, 0:00.01 elapsed (Fi) 5: 0.02 user, 0.00 system, 0:00.02 elapsed (Tj) 5: 0.11 user, 0.02 system, 0:00.14 elapsed (Td) 5: 0.12 user, 0.00 system, 0:00.14 elapsed (Fw) 10: 0.01 user, 0.00 system, 0:00.03 elapsed (Fi) 10: 0.01 user, 0.00 system, 0:00.03 elapsed (Tj) 10: 0.20 user, 0.00 system, 0:00.24 elapsed (Td) 10: 0.21 user, 0.00 system, 0:00.24 elapsed (Fw) 25: 0.00 user, 0.06 system, 0:00.10 elapsed (Fi) 25: 0.01 user, 0.01 system, 0:00.09 elapsed (Tj) 25: 0.45 user, 0.47 system, 0:01.32 elapsed (Td) 25: 0.42 user, 0.04 system, 0:00.58 elapsed (Fw) 50: 0.01 user, 0.00 system, 0:00.11 elapsed (Fi) 50: 0.01 user, 0.00 system, 0:00.16 elapsed (Tj) 50: 0.86 user, 0.01 system, 0:01.07 elapsed (Td) 50: 0.79 user, 0.07 system, 0:01.00 elapsed (Fw) 100: 0.02 user, 0.00 system, 0:00.21 elapsed (Fi) 100: 0.01 user, 0.00 system, 0:00.35 elapsed (Tj) 100: 1.60 user, 0.12 system, 0:01.99 elapsed (Td) 100: 1.62 user, 0.08 system, 0:01.99 elapsed (Fw) 250: 0.01 user, 0.00 system, 0:00.50 elapsed (Fi) 250: 0.02 user, 0.00 system, 0:00.85 elapsed (Tj) 250: 4.20 user, 0.23 system, 0:05.18 elapsed (Td) 250: 4.28 user, 0.20 system, 0:05.22 elapsed (Fw) 500: 0.02 user, 0.00 system, 0:01.01 elapsed (Fi) 500: 0.02 user, 0.04 system, 0:01.73 elapsed (Tj) 500: 9.09 user, 0.56 system, 0:11.28 elapsed (Td) 500: 9.19 user, 0.46 system, 0:11.25 elapsed (Fw) 1000: 0.01 user, 0.01 system, 0:02.02 elapsed (Fi) 1000: 0.02 user, 0.12 system, 0:03.47 elapsed A thread exited while 48 threads were running. (Tj) 1000: 17.54 user, 14.73 system, 3:07.04 elapsed (Td) 1000: 22.57 user, 23.33 system, 4:38.73 elapsed Childs writing (Fw) 0: 0.03 user, 0.07 system, 0:01.50 elapsed (Fi) 0: 0.02 user, 0.00 system, 0:00.01 elapsed (Tj) 0: 0.03 user, 0.04 system, 0:00.83 elapsed (Td) 0: 0.05 user, 0.00 system, 0:00.06 elapsed (Fw) 5: 0.02 user, 0.00 system, 0:00.02 elapsed (Fi) 5: 0.01 user, 0.00 system, 0:00.02 elapsed (Tj) 5: 0.12 user, 0.02 system, 0:00.17 elapsed (Td) 5: 0.11 user, 0.02 system, 0:00.14 elapsed (Fw) 10: 0.01 user, 0.01 system, 0:00.03 elapsed (Fi) 10: 0.02 user, 0.00 system, 0:00.04 elapsed (Tj) 10: 0.19 user, 0.01 system, 0:00.23 elapsed (Td) 10: 0.20 user, 0.01 system, 0:00.26 elapsed (Fw) 25: 0.02 user, 0.00 system, 0:00.06 elapsed (Fi) 25: 0.01 user, 0.00 system, 0:00.09 elapsed (Tj) 25: 0.41 user, 0.04 system, 0:00.53 elapsed (Td) 25: 0.45 user, 0.01 system, 0:00.53 elapsed (Fw) 50: 0.01 user, 0.01 system, 0:00.11 elapsed (Fi) 50: 0.02 user, 0.00 system, 0:00.18 elapsed (Tj) 50: 0.81 user, 0.07 system, 0:01.03 elapsed (Td) 50: 0.79 user, 0.07 system, 0:01.01 elapsed (Fw) 100: 0.01 user, 0.00 system, 0:00.21 elapsed (Fi) 100: 0.02 user, 0.00 system, 0:00.34 elapsed (Tj) 100: 1.63 user, 0.11 system, 0:02.05 elapsed (Td) 100: 1.63 user, 0.09 system, 0:01.99 elapsed (Fw) 250: 0.02 user, 0.01 system, 0:00.52 elapsed (Fi) 250: 0.01 user, 0.00 system, 0:00.85 elapsed (Tj) 250: 4.22 user, 0.31 system, 0:05.29 elapsed (Td) 250: 4.28 user, 0.20 system, 0:05.24 elapsed (Fw) 500: 0.02 user, 0.00 system, 0:01.01 elapsed (Fi) 500: 0.01 user, 0.03 system, 0:01.71 elapsed (Tj) 500: 9.38 user, 0.51 system, 0:11.50 elapsed (Td) 500: 9.16 user, 0.54 system, 0:11.28 elapsed (Fw) 1000: 0.01 user, 0.00 system, 0:02.01 elapsed (Fi) 1000: 0.02 user, 0.13 system, 0:03.49 elapsed A thread exited while 51 threads were running. (Tj) 1000: 17.47 user, 15.01 system, 2:34.14 elapsed (Td) 1000: 22.57 user, 23.21 system, 5:01.54 elapsed

        Abigail

Re: Why use threads over processes, or why use processes over threads?
by Anonymous Monk on Nov 11, 2003 at 12:08 UTC

    Creating a new process can be expensive. It takes time. (A call into the operating system is needed, and if the process creation triggers process rescheduling activities, the operating system's context-switching mechnism will become involved.) It takes memory. (The entire process must be replicated.)

    This is somewhat of a misnomer - Whilst early versions of UNIX typically copied each writable page of the parent's address space to a corresponding page of the child's address space, modern operating systems typically implement fork() using copy-on-write. With this implementation, parent and child are able to share the same page until either the parent or child write to the page, thereby minimising the number of pages which need to be copied between the address spaces and reducing memory requirements.

    Additionally, there is little distinction in this article between threading implementations. The history of thread implementation has been littered with proprietary implementations from differing vendors for different platforms, and indeed, whilst the implementation of POSIX threads in C may be considered relatively cheap, the implementation of threads in Perl most certainly is not. The implementation of threads in Perl carries with it much overhead which is carried at all times within a threaded interpreter, not merely when threading is employed - For a quick example, take a look at the additional overhead required for threading in the perl_construct and perl_destruct methods in perl.c.

    In short, this is a very poor discussion piece - The article has been presented with the guise that it provides an unbiased overview of the differences between threaded and fork-based concurrent programming models, yet in delivery, this article shows both poor research and a biased presentation. A better understanding of the advantages and disadvantages of these concurrent programming models I believe can be attained from some of the replies in this thread and some independent research.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://306063]
Approved by Coruscate
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2024-04-12 12:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found