<?xml version="1.0" encoding="windows-1252"?>
<node id="29620" title="Threads vs Forking (Java vs Perl)" created="2000-08-25 13:47:56" updated="2005-08-13 05:44:17">
<type id="120">
perlmeditation</type>
<author id="26179">
tilly</author>
<data>
<field name="doctext">
This is an answer to [id://29610].  I figured that if I
was going to put the energy in to write this up, I might
as well put it on the front page.&lt;P&gt;

One of the links at [tilly] discusses it,
&lt;A HREF=http://kt.zork.net/kernel-traffic/kt20000103_49.html#1&gt;Linus on Unix Design&lt;/A&gt;.
As I say there, this is a summary of an informative
flame-fest.&lt;P&gt;

Technically the difference is simple.  fork() creates
a new process that looks exactly like the old.  But it
is a separate process.  It has its own memory, etc.
By contrast spawn() keeps one process, but has two spots
in your code where you are executing.&lt;P&gt;

So what does this mean in real life?&lt;P&gt;

The big win with threads is that two threads in a process
can be switched very efficiently by the OS because they
share more stuff so less has to be moved around before
you switch them.  (In particular they share the same
Translation Lookaside Buffer.)  Also creating them is
less work since you don't have to create a whole new
process.  (Modern *nix systems are very efficient at it
though thanks to the joys of mmap().)&lt;P&gt;

The big loss with threads is that there is no natural
protection from having multiple threads working on the
same data at the same time without knowing that others
are messing with it.  This is called a race condition.
Conceptually a race condition is much like the problems
with global data that people are familiar with.  If everyone
sees and works on the variables directly, you will get
it being modified in two places at once, with problems.&lt;P&gt;

However where mistakes with global data lead
to reproducable problems, race conditions by definition
only happen when you get bad timings between threads.
Therefore they do not consistently happen.  They do
scale quadratically under load (making them hard to see
in testing, and hard to avoid in production).  For a
number of reasons, the problem when it happens shows up
in random code that is nowhere near the mistake.  (For
instance a race causes a reference count to go out of
whack, something then gets deallocated prematurely, the
memory is assigned to someone else, and then you get a
mess.)  And they are conceptually hard to understand
(since the 1970's and the Mythical Man-Month it has been
well-known that interaction bugs are the nastiest class
of bugs out there).&lt;P&gt;

None of this is to say that forking cannot get into race
conditions.  Any time you have shared resources, you have
a potential problem.  And there are solutions.
See [Simple Locking] for an example.  But
locking solutions will get into all sorts of bugs of their
own - deadlocks, priority inversion, etc.  (None of which
I dealt with in [Simple Locking] - as long as you keep it
simple you don't have to.  Otherwise...)  It is
just that the more you share, the more possible places
you have to have problems in.  And threads by default
share a heck of a lot more than processes do.&lt;P&gt;

Now there are a number of possible solutions.&lt;P&gt;

Unix took the philosophy that if processes are easier
to understand, then encourage people to use processes
everywhere.  Perl comes out of that philosophy and was
never designed with threading in mind.
(Reverse-engineering threading onto an application is usually a nightmare.  Perl has proven to be a good
example of this rule.)&lt;P&gt;

Some functional languages address it by having the
language designed from the ground up in such a way that
it is easy for the compiler to prove that there are no
side-effects possible, and then the compiler can safely
decide for you when to move logic into threads.&lt;P&gt;

Java addresses it by letting you organize your threading
logic, but hampering you in subtle ways so that you are
encouraged to program in a way that avoids races.  For
instance your basic select() loop has a race in it.
Therefore Java only offers blocking IO constructs, which
forces you to spawn a separate thread for any IO, and
therefore avoids that race.  Java discourages globals
(note that accessing globals from multiple threads is a
bad idea).  So on and so forth.&lt;P&gt;

The upshot is that people who don't really understand the
underlying issues will program reasonably threadsafe code
in Java.  (-:ObFlameBait: Which needs all of the performance
help it can get!:-)  But at the cost of a little
&lt;A HREF=http://www.tuxedo.org/~esr/jargon/html/entry/bondage-and-discipline-language.html&gt;B&amp;amp;D&lt;/A&gt;...&lt;P&gt;

In Perl, today, even if &lt;B&gt;you&lt;/B&gt; know how to program
threadsafe code, you can't because behind your back perl
itself is not threadsafe.  Which is why it is labelled
"experimental".</field>
</data>
</node>
