Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

how did blocking IO become such a problem?

by zentara (Archbishop)
on Feb 17, 2012 at 09:58 UTC ( #954488=perlmeditation: print w/ replies, xml ) Need Help??

Many recent posts, such as Locked threads and tcp timeouts confront the problem of blocking IO operations. While I'm not a kernel or low level C expert, I would like to know how did we get into a position where IO can block a whole process, without having the ability to send a interrupt signal to it, or easily code around it?

My thoughts are the following:

Dosn't the socket method is_connected give you the ability to detect if the socket is stuck? Couldn't is_connected be made to bounce a message off the other end? Thereby indicating whether the lines went down. Then you could do a is_connected test before each read. That is not perfect, is wasteful of time, but would work for small data transfers, considering how fast the lines are now.

Would it be that wasteful in low level IO code, to have it listen for an interrupt in it's read loop, and have that timed out with an alarm?

On linux, the only sure fired method for avoiding the blocking IO problem is to make sure the code is forked off, and kill -9 it's pid

Concerning it's problems in threads, is this blocking IO problem the reason venerable old masters like merlyn refused to jump onto the threads bandwagon, saying that forks and shared memory segments work just fine. You can keep your problem prone threads. :-)

I've touched on quite a few points here, all connected in sub space by the blocking IO problem, and even gave thought it may be in there like a StuxNet device, to give external devices the ability to lock up programs. A very useful tool for network engineers to have.

So before I put on my titanium foil hat, would someone care to explain why blocking IO should even be a problem? Is it in the processor design itself?


I'm not really a human, but I play one on earth.
Old Perl Programmer Haiku ................... flash japh

Comment on how did blocking IO become such a problem?
Re: how did blocking IO become such a problem?
by BrowserUk (Pope) on Feb 17, 2012 at 10:37 UTC

    The "blocking IO problem" has been there forever. It all comes back to the fact that asynchronous signals are a fundamentally flawed concept.

    To illustrate the problem, imagine your parents/boss/significant-other could press a button on their phone at any time that would immediately, irrevocably and without warning cause your car to do a U-turn and return to your home/work regardless of where you were; the traffic conditions; your inputs to the controls; or your desire to not do so. Imagine the chaos as your car decides to take you back the wrong way back up the freeway.

    That is what signals do!

    Signals can and will interrupt your code at any point and leave data structures and communications channels in broken, corrupted and irrecoverable states.

    At the C level, this can be mitigated to some extent by the careful and judicious use of signal masks to prevent interrupts at critical moments. But 1) it takes considerable effort to do this well; 2) if a signal arrives whilst a mask is in force, the signal is lost. And to solve that requires the programmer to implement some kind of signal queuing or deferral mechanism.

    This is the reason for Perl's SAFE SIGNALS. Perl uses a lot of 'fat' internal data-structures that frequently require several different modifications to be made for them to remain in a coherent, usable state. If Perl allows signals to occur at any time, then it can result in (for example) hash structures that have had nodes removed, but the key count not correctly updated; or array size doubling that gets interrupted before all the values from the old array have been copied to the new one. And many more forms of irrecoverable internal state corruption.

    This problem (with signals) was there from the very outset. Hence why *nix programs with frequently start a whole new process (fork) in order to do something seemingly trivial -- eg. perform IO. Because then they can interrupt (signal) it, safe in the knowledge that they can throw away the process and so not have to deal with the corruption caused by the interruption.

    Imagine having to clone your phone every time before making a call, so that you could throw it away afterwards if you received a second call during the first :)

    As for the resistance of venerable old masters to threading, you'll have to ask them to know for sure, but in part it may be because the early threading libraries on *nix were rubbish; in part because it means learning something new.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      As for the resistance of venerable old masters to threading, you'll have to ask them to know for sure, but in part it may be because the early threading libraries on *nix were rubbish; in part because it means learning something new.

      and part lack-of-love for other platforms :)

      Imagine having to clone your phone every time before making a call, so that you could throw it away afterwards if you received a second call during the first :)

      True, but these are not real entities, like phones, they are easily created and discarded magnetic fields. With the increase in modern processor speeds, surely we could spare a few extra cpu cycles, to make that happen to avoid the problem.

      By the way, what would be a good way to generate a blocked I/O condition, for testing? Download a big file, then drop your network?


      I'm not really a human, but I play one on earth.
      Old Perl Programmer Haiku ................... flash japh
        With the increase in modern processor speeds, surely we could spare a few extra cpu cycles, to make that happen to avoid the problem.

        That was only a tongue-in-cheek analogy, but the problem of with the leap into the stack and transfer the program flow to some other place in the program at any given instance in time remains.

        Perhaps a better analogy is allowing the audience, or even the actors on stage, to take phone calls in the middle of a performance. The thought trains involved are even more nebulous patterns of firing neurons, but still the impact is not confined to the individual, but upon all other partisipants. Audience and players combined. And recovery is just as difficult.

        There are much better ways of dealing with the indeterminate nature of blocking IO: namely. Asynchronous IO which is available on most platforms, and has been for years. But I don't think it ever made it into that dead dodo of a standard that is POSIX?

        By the way, what would be a good way to generate a blocked I/O condition, for testing?

        The very simplest is my $input = <STDIN>;.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

      I mostly agree with you, but...
      As for the resistance of venerable old masters to threading, you'll have to ask them to know for sure, but in part it may be because the early threading libraries on *nix were rubbish; in part because it means learning something new.
      Separate address spaces with explicit sharing are much easier to reason about than a single address space with implicit sharing. Accepting the latter in exchange for faster context switches is often a mistake.
        ... in exchange for faster context switches ...

        If that were all threading bought you, I'd agree. But it isn't. It isn't even the primary benefit.

        The primary benefit is the simplified code that results from having each logical part of your application run as a simple linear flow or loop, with only that state it needs, visible to it.

        The second benefit is the ability to prioritise some of those logical flows over others, secure in the knowledge that when something is ready to be done, it will get done in a timely fashion, within the priorities specified.

        Old timers tend to concentrate on the perceived -- usually second-hand -- problems, rather than the very real benefits.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Re: how did blocking IO become such a problem?
by sundialsvc4 (Monsignor) on Feb 21, 2012 at 18:39 UTC

    The notion of “a request equals a thread” is what I refer to as the “flaming-arrow strategy.”   (Take an arrow, light it off, fire it into the air and forget it.)   Each thread is then supposed to fend for itself, lock what it needs to, issue its requests for data (and wait for the response).   But soon there are problems, often creeping into the designer’s field of vision much too late.   Those problems are workflow dependencies.   Certain things need to be done in a certain order.   Bottlenecks develop as the various fully independent units of dispatchable work try to get things done.   Where certain things must be done in a specific order, it suddenly becomes necessary to devise mutual-exclusion or counting-semaphore kludges.   An “easy and intuitive” initial design does not scale up.

    Borrowing an idea from a fast-food restaurant, it is much better to devise the workload as perhaps independent objects, but to manage those using a tote-board of sorts.   The worker-bees work on those units of work according to some heuristic, but they do not “wait for” anything .. ever.   All I/O operations performed are asynchronous, and they are performed against requests that are sitting in some particular stage in the workflow.   The number of requests that are known to the system, which is variable and perhaps extremely large, is completely distinct from the number of workers that are pursuing them.   The entire life cycle of a request, and much of the outer request-handling heuristics, is most easily described using a finite-state machine (FSM) algorithm.

    There are plenty of workflow management scaffolds in Perl and otherwise.   With them, I have been able to turn many a recalcitrant and unstable application around, and to retrofit those systems to efficiently work in clustered environments.

      There are plenty of workflow management scaffolds in Perl ...

      Any that you particularly recommend?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://954488]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (11)
As of 2014-08-29 10:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (280 votes), past polls