Well, the trick is you can go into and out of STM monad from the IO monad at any time, since STM is essentially a subset of IO. So you enter STM when concurrent atomicity is required, and do the real IO (say, writing to screen) outside it.
in reply to Re^7: GHC shuffle more efficient than Perl5.
in thread Why is the execution order of subexpressions undefined?
But it is true that, although STM can automatically scale over SMP settings, it still assume essentially a shared memory model; that is why it's called a concurrency tool instead of a (cross-machine) parallelizing tool, which has other fault-tolerance factors to consider.
However, for its targetted use (that is, a compelling replacement over select loops and thread locks/semaphores), STM is still damn useful.