Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Threads vs Forking (Java vs Perl)

by tilly (Archbishop)
on Aug 25, 2000 at 17:47 UTC ( #29620=perlmeditation: print w/ replies, xml ) Need Help??

This is an answer to RE: RE (tilly) 2: When is a while not a while?. I figured that if I was going to put the energy in to write this up, I might as well put it on the front page.

One of the links at tilly discusses it, Linus on Unix Design. As I say there, this is a summary of an informative flame-fest.

Technically the difference is simple. fork() creates a new process that looks exactly like the old. But it is a separate process. It has its own memory, etc. By contrast spawn() keeps one process, but has two spots in your code where you are executing.

So what does this mean in real life?

The big win with threads is that two threads in a process can be switched very efficiently by the OS because they share more stuff so less has to be moved around before you switch them. (In particular they share the same Translation Lookaside Buffer.) Also creating them is less work since you don't have to create a whole new process. (Modern *nix systems are very efficient at it though thanks to the joys of mmap().)

The big loss with threads is that there is no natural protection from having multiple threads working on the same data at the same time without knowing that others are messing with it. This is called a race condition. Conceptually a race condition is much like the problems with global data that people are familiar with. If everyone sees and works on the variables directly, you will get it being modified in two places at once, with problems.

However where mistakes with global data lead to reproducable problems, race conditions by definition only happen when you get bad timings between threads. Therefore they do not consistently happen. They do scale quadratically under load (making them hard to see in testing, and hard to avoid in production). For a number of reasons, the problem when it happens shows up in random code that is nowhere near the mistake. (For instance a race causes a reference count to go out of whack, something then gets deallocated prematurely, the memory is assigned to someone else, and then you get a mess.) And they are conceptually hard to understand (since the 1970's and the Mythical Man-Month it has been well-known that interaction bugs are the nastiest class of bugs out there).

None of this is to say that forking cannot get into race conditions. Any time you have shared resources, you have a potential problem. And there are solutions. See Simple Locking for an example. But locking solutions will get into all sorts of bugs of their own - deadlocks, priority inversion, etc. (None of which I dealt with in Simple Locking - as long as you keep it simple you don't have to. Otherwise...) It is just that the more you share, the more possible places you have to have problems in. And threads by default share a heck of a lot more than processes do.

Now there are a number of possible solutions.

Unix took the philosophy that if processes are easier to understand, then encourage people to use processes everywhere. Perl comes out of that philosophy and was never designed with threading in mind. (Reverse-engineering threading onto an application is usually a nightmare. Perl has proven to be a good example of this rule.)

Some functional languages address it by having the language designed from the ground up in such a way that it is easy for the compiler to prove that there are no side-effects possible, and then the compiler can safely decide for you when to move logic into threads.

Java addresses it by letting you organize your threading logic, but hampering you in subtle ways so that you are encouraged to program in a way that avoids races. For instance your basic select() loop has a race in it. Therefore Java only offers blocking IO constructs, which forces you to spawn a separate thread for any IO, and therefore avoids that race. Java discourages globals (note that accessing globals from multiple threads is a bad idea). So on and so forth.

The upshot is that people who don't really understand the underlying issues will program reasonably threadsafe code in Java. (-:ObFlameBait: Which needs all of the performance help it can get!:-) But at the cost of a little B&D...

In Perl, today, even if you know how to program threadsafe code, you can't because behind your back perl itself is not threadsafe. Which is why it is labelled "experimental".

Comment on Threads vs Forking (Java vs Perl)
RE: Threads vs Forking (Java vs Perl)
by KM (Priest) on Aug 25, 2000 at 18:01 UTC
    A similar article which explains fork() on Unix and Win32 OSs can be found here, and is also a good read.


      In fact if you are unfamiliar with some of the terms I was using, this article is a much better starting place than mine! :-)
RE: Threads vs Forking (Java vs Perl)
by gaspodethewonderdog (Monk) on Aug 25, 2000 at 18:35 UTC
    Well done, and I certainly do think you are correct in that Perl threading is rather... dangerous. I have written several wrapper programs (mostly java and c) to avoid Perl threading at all costs. The performance hit isn't so bad and in the long run I wasn't using up anything more than a lot of memory really.

    I've played around with Perl threading a little and found plenty of bizarre problems that I could attribute to Perl, my code or anything else so I just have to assume it is all in the nature of Perl threads not being terribly safe.

RE: Threads vs Forking (Java vs Perl)
by JanneVee (Friar) on Aug 25, 2000 at 19:16 UTC
    Interesting points. And to contribute to this discussion from a technological point of view. There is a small thing of Multiprocessor vs. Cluster.

    I've done a few experiment runs with Mosix and MPI/PVM type of stuff. For those who don't know what I'm talking about. An overview:

    Mosix - An augmentation to the Linux kernel to migrate processes from node to node.

    MPI - An API for programming multiprocessor/cluster Architectures.
    PVM - Almost strictly cluster architectures.

    Now as I seen it. I think that mosix is the best tool a "perl"-programmer can use to do cluster stuff. Using simple fork's and if a script runs up processor usage it gets migrated. But I wouldn't call it exactly totaly efficient.

    And the relation with MPI/PVM stuff to threading is close. It is a "threading" api that is designed to conserve bandwith.(the first version of MPI didn't have any support for shared variables that I could see)

    One common thing among all these things is that communication between processes/threads isn't encouraged by the "standards". To conserve bandwith. So accessing global variables through different threads isn't exactly a good idea for more than one reason...

    I wont rant anymore...


      Actually Cluster may have more to do with Multiprocessor than you think.

      Larry McVoy has some interesting thoughts on this. Basically you never worry about SMP beyond about 4 CPUs or so. Instead you turn a thousand processor box into a virtual cluster...

      Slides for his talk may be found here.

        Actually Cluster may have more to do with Multiprocessor than you think.

        This I know... :)

        You can buy a 4 processor SMP or a 16 processor cluster at roughly the same cost. So start worrying. But the point is... It is not a good thing to use global variables (shared variables) for the locking problems(including the fine grained stuff that is in the slides) or bandwith reasons.

        But my conclusion on the slides (that I can make out with in a few minutes)... That it is more pro-forking than pro-threading. Am I correct?


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://29620]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (7)
As of 2014-07-11 08:24 GMT
Find Nodes?
    Voting Booth?

    When choosing user names for websites, I prefer to use:

    Results (221 votes), past polls