|Keep It Simple, Stupid|
Firstly: perl 5's threads are in a pretty grim shape - they're slow, they're quirky, and lots of stuff isn't as easy as it should be. I would recommend against using them if you can find another alternative.
Second: since you asked for an explanation of how fork is efficient, here's how it works:
Processors have real and protected addressing modes. Real mode is what people would normally expect - every pointer points to an address in memory. Protected mode is what is actually used 99% of the time in practice. Under protected mode, fetching a memory address is an indirect operation - the address is translated from the virtual address to a real one by the MMU (the memory management unit).
This is sensitive to the current process, and the way the notion of the current process is defined varies from processor to processor.
Memory is handled by the MMU in chunks called pages, which are typically 4kb long. These are the smallest unit the MMU will take care of in it's virtual/real addressing scheme.
When a process forks (under true fork, not vfork which I will shortly discuss) all of it's memory is set to be read only, and the process itself is duplicated, returning different values to each process. All the At this point almost no data is updated - the pages themselves are marked read only, but this is quick and trivial, and the process handle in the kernel is duplicated, and this is not a large structure, in comparison to the amount of memory a process can actually consume.
Whenever the processor tries to store a value in a read-only memory page the MMU raises a fault, which the kernel has to handle. The kernel will ask the MMU to copy the page, to a new location in virtual memory (all of the physical memory and the swap space), but makes the mapping appear, to the process, to be at the same memory location.
When the memory page has been copied, it is safe to actually write to it. By keeping a reference count of the number of processes sharing each page you can also unlock read-only pages when they are accessible by only one process.
vfork is a version of fork that is suited for fork and exec sequences... When you call 'vfork' nothing is copied - instead the child process has complete control of the parent's resources (file descriptors, memory, etc), until it calls exec to load another process image into memory.
When the child process finally executes the new program the parent process is resumed.
The reason the parent is suspended while the child is running is that since the addressing space is shared, so is the stack. Since the stack (together with the instruction pointer) represents the state of the process, including the return value from 'vfork' (and any called function for that matter), this value cannot be both 0 for the child and the child's process ID for the parent at the same time.
Slightly off topic but nevertheless interesting is the page swap system of an OS with virtual memory - when a page is resolved by the MMU, and it isn't in physical memory, a page fault is sent to the kernel. The kernel is then responsible for loading the memory page from disk, by swapping it with a physical memory page. When the data has been loaded to physical memory the access to the memory can finish.
Update: Ven'Tatsu cought a wordo confusing protected mode with real mode.
zz zZ Z Z #!perl