Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

confusing fork/readline behaviour

by flipper (Beadle)
on Aug 13, 2015 at 17:37 UTC ( #1138461=perlquestion: print w/replies, xml ) Need Help??

flipper has asked for the wisdom of the Perl Monks concerning the following question:

perl -e ' open X, "</etc/passwd"; fork(); while (readline(X)){ print "$$: $_"; sleep 1 } print "$$: all done\n" '
I was expecting to see every line printed twice (once by each process, as they don't share a file pointer), or at a push, every line printed once-ish, (if they do).

I definitely wasn't expecting one process not to print anything:
8488: root:x:0:0:root:/root:/bin/bash 8489: all done 8488: daemon:x:1:1:daemon:/usr/sbin:/bin/sh 8488: bin:x:2:2:bin:/bin:/bin/sh 8488: <rest of the file> 8488: all done
Can someone explain what is going on please? At the start of the while loop I would expect the two processes to be identical, bar the return value of fork...
$ perl -v This is perl 5, version 14, subversion 2 (v5.14.2) built for x86_64-li +nux-gnu-thread-multi (with 89 registered patches, see perl -V for more detail)

Replies are listed 'Best First'.
Re: confusing fork/readline behaviour
by afoken (Canon) on Aug 13, 2015 at 17:49 UTC
    Can someone explain what is going on please?

    Both processes share the same filehandle file descriptor, plus the libc usually reads ahead. If /etc/passwd is small enough, the libc slurps the entire file in one process during the first getline(), leaving nothing for the other process.

    Quoting from the linux man page of fork(2):

    The child inherits copies of the parent's set of open file descriptors. Each file descriptor in the child refers to the same open file description (see open(2)) as the corresponding file descriptor in the parent. This means that the two descriptors share open file status flags, current file offset, and signal-driven I/O attributes (see the description of F_SETOWN and F_SETSIG in fcntl(2)).

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      Ah that makes sense, thanks. Presumably my script above is unsafe in the general case - I imagine libc could readahead 4k and not finish on a line boundary, then the second process would call readline and start halfway through a line...
        Presumably my script above is unsafe in the general case - I imagine libc could readahead 4k and not finish on a line boundary, then the second process would call readline and start halfway through a line.

        Correct.

        Why do you want two processes to read the same file?

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: confusing fork/readline behaviour
by KurtSchwind (Chaplain) on Aug 13, 2015 at 18:19 UTC
    <quote>as they don't share a file pointer </quote>

    Actually for reading they do share the same file descriptor. Try moving that open after the fork().

    perl -e ' fork(); open X, "</etc/passwd"; while (readline(X)){ print "$$: $_"; sleep 1 } print "$$: all done\n" '
    --
    “For the Present is the point at which time touches eternity.” - CS Lewis
Re: confusing fork/readline behaviour
by (anonymized user) (Curate) on Aug 14, 2015 at 11:39 UTC
    (Updated:) In addition, parents should wait for children before exiting, otherwise children either get killed or usually stop running and become zombie processes.
    perl -e ' my $pid = fork(); open X, "</etc/passwd"; while (readline(X)){ print "$$: $_"; sleep 1 } print "$$: all done\n"; waitpid $pid, 0 if $pid; '

    One world, one people

      In addition, parents should wait for children before exiting

      Yes, that is a good general practice. The most common problem of not doing that is that the shell that launched the command waits for the parent to exit, then displays the next prompt, then the child outputs a bit more, making a confusing display. Things get worse if a child might be reading from the same STDIN. And even if the parent process wasn't launched as a command from an interactive shell, it is often good to not have the parent exit before the children; for example, you often don't want to restart some service or daemon when some children from the last instance are still hanging around.

      otherwise children either get killed or stop running and become zombie processes

      But neither of those are valid justifications for that practice.

      The parent exiting doesn't kill the child. The closest thing to that is that the login process exiting will send SIGHUP to all processes that share that controlling tty. (And your Perl script is pretty darn unlikely to be a login process.)

      When a child process exits, it becomes a zombie process until its parent waits for it, or until the parent process exits (because then the child gets inherited by process 1 which scrupulously wait()s for any expired children). So preventing zombies is a good reason to wait() for children, but the one time that it doesn't matter is right before the parent process exits.

      - tye        

        Yes I agree - the kill case is rather exceptional, so for clarity I'll strike through it. The case where the child doesn't zombify because the parent exits immediately is also exceptional, as you say.

        One world, one people

      Not completely true. Using fork is a natural way to perform processes in the background.

      $ cat forkit.pl #!/usr/bin/env perl use strict; use warnings; my $pid = fork(); if ($pid) { print "Parent is exiting now!\n"; } else { print "Child is waiting a bit\n"; sleep 15; print "Child is done!\n"; } $ perl forkit.pl Parent is exiting now! Child is waiting a bit $ date Fri Aug 14 10:22:06 EDT 2015 $ date Fri Aug 14 10:22:15 EDT 2015 $ Child is done!

      There are differences on different operating systems, to be sure. Some processes could be killed on some operating systems (though I've not experienced it myself). You can accumulate zombie processes if you don't take steps to avoid it. If you're going to use fork, you need to educate yourself on what it does and doesn't do on your platform.

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

        Yes that's true. Although it is more common that the parent has more work to do, it is also a well-known way to create a daemon by having a main program fork and exit immediately, leaving the child running detached.

        One world, one people

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1138461]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2020-02-24 00:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What numbers are you going to focus on primarily in 2020?










    Results (104 votes). Check out past polls.

    Notices?