http://www.perlmonks.org?node_id=199591

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
Please forgive me if the answer to this mystery is obvious. I've got a Perl script running on a Solaris 2.6 server, which processes some data files. At the end of the script I use a system call to sort the final output file, like so:
system(" cat $file | sort -t\\| +1 -2 > $file.out ");
The script normally works fine, and has done so for months. Suddenly this week it has begun to misbehave, depending on the number of data files I'm processing. If I process more than 3 files (about 30mb each) it just skips over the system call, and the "sort" isn't executed. I thought there could be some memory limitation or something like that, but what's puzzling me is that I get no error messages. The script just skips the system call and goes on to the end. I've tried -w, $!, and all that to try and catch any sort of error, but nothing happens. The system call is just ignored. It's weird! I would be very, very grateful for any suggestions.
Thanks!

Replies are listed 'Best First'.
Re: vanishing system call
by dws (Chancellor) on Sep 20, 2002 at 19:30 UTC
    I've tried -w, $!, and all that to try and catch any sort of error, but nothing happens ... The system call is just ignored. It's weird!

    Show us how you're checking for errors.

    The doc for system (perldoc -f system) suggests checking $? for details of how the child process dies.

    Several factors could contribute to this no longer working. Over time, the dataset could have outgrown virtual memory, and over time there could be less virtual memory available for other reasons (e.g., more or bigger processes competing for space, disk space getting used up).

      Hi
      Thanks for the suggestions. There are a lot of helpful people here!
      I tried to catch errors like this:
      system(" cat $file | sort -t\\| +1 -2 > $file.out ") || die "can't sort the file: $!";
      ... and I also tried $?. That's the creepy thing - it never even gets that far. It just skips the whole statement altogether.
      I agree with your thoughts about virtual memory, and in fact this server has had some memory problems in the past. It just seems odd to me that I'm not getting some "out of memory" error (which I've seen before on this server) or even an out-and-out crash of the script. I need something to take to the Unix admins to convince them that there's a problem with the server (there are a number of production databases on there, so they won't reboot or anything like that w/o a good reason).
      Oh, I forgot to mention that if I take the "sort" statement and put it in a separate Perl script, then run that to sort the file, it works fine. Which again leads me down the virtual memory path. In the short term, maybe I'll have to do the data processing and then the sort in two separate scripts, called up by a shell script or something. That's ugly but I'll bet it will work.
      Thanks!
        system(" cat $file | sort -t\\| +1 -2 > $file.out ") || die "can't sort the file: $!";

        Generally, system will return 0 when it succeeds not when it fails. That code would die everytime the system call worked.

        Update: As ChemBoy points out in his reply, $! is not what you want to look at. As I pointed out in a reply to another post of ChemBoy's, it only makes sense to look at $! when system returns -1 meaning that the command was not executed. This doesn't change the advice above.

        -sauoq
        "My two cents aren't worth a dime.";
        
Re: vanishing system call
by RMGir (Prior) on Sep 20, 2002 at 19:50 UTC
    Is it possible someone's put a program or script named either "cat" or "sort" somewhere on your path before /usr/bin?

    Another possibility: did you chdir somewhere else in your script? $file may not be in the current directory, in that case.

    You're also needlessly using cat. It probably doesn't matter, but why not just do this:

    system("sort -t\\| +1 -2 -o $file.out $file");
    Also, make sure sort is on the path when the script runs; if it runs from a crontab or webserver, it might not be.

    It's probably safer to do:

    system("/usr/bin/sort -t\\| +1 -2 -o $file.out $file");
    You might have to change the /usr/bin if that's not where your sort binary is.
    --
    Mike
Re: vanishing system call
by thelenm (Vicar) on Sep 20, 2002 at 19:27 UTC
    Are you sure that the output is not just being placed somewhere you didn't expect? For example, do any of your filenames have spaces in them? If so, then you should use "> '$file.out'" instead of "> $file.out". Is there anything else that's changed recently that might have an unexpected effect? Other than some error like that, I'm stumped.

    -- Mike

    --
    just,my${.02}

      Thanks for the ideas. There are no spaces in the filenames, and I've searched high and low on the server for some other place the output file could be hiding. By the way, I've also checked permissions and stuff like that -- I know this is what everyone says, but nothing has changed with the script. And it does work, until we reach a certain amount of data that we're processing. When it works, the sort normally takes a good 10 or 15 minutes to complete; when it doesn't work, the script just zips by that statement and finishes up. I was going to approach our Unix admins for help (because it "feels" like a server issue to me), but I'm sure that without some kind of error message, they'll say it's a Perl problem and tell me to get lost. It's a puzzler.... Thanks again.
Re: vanishing system call
by BrowserUk (Patriarch) on Sep 20, 2002 at 19:48 UTC

    The first thing I would add, and leave in place is

    system(" cat $file | sort -t\\| +1 -2 > $file.out ") or warn "sort fai +led $!\n";

    Also, you said that "I tried -w". Is there some reason that you don't leave -w on all the time?

    The second thing I notice, though my memory of *nix is sketchy is that you are starting 2 processes and using a pipe when 1 process, no pipe would be fine.

    system("sort -t\\| +1 -2 <$file > $file.out ") or warn "sort failed $! +\n";

    Actually I am fairly sure that you don't even need the '<' infront of the filename but my memory ain't what it once was.

    Some other possibilities

    • Try using the full path to sort (/usr/bin/sort?). Maybe you were picking up a copy of the sort binary through a symbolic link or environment variable and one or the other has changed or gone away?
    • Try prefixing your filenames with a relative (./file) or absolute path? Not really sure why I suggest this. I just remember doing that a lot in my *nix time when things didn't seem to work without.
    • Finally, a silly one, but you wouldn't be the first person to have been caught out by it. Are you subject to a diskquota limit? Do you have enough freespace to create the output file whilst the input file still exists?

    What happens if you issue the sort command as you have it immediately after the program has run? If it works as expected then, that will eliminate several possiblities.


    Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!

      It seems this is a meme that just won't roll over and die...

      The construct, system(...) or die_warn_etc() is almost never what you want. The return value of system() is the exit value of the command executed. Usually, this will be 0 on success. Some alternatives:

      system(...) and die; # Ugly. if ( system(...) != 0 ) { die } # C-ish if ( system(...) ) { die } # C-ish and ugly. unless ( system(...) == 0 ) { die } # Nicer

      I think it is better to explicitly compare to 0. It serves as a reminder that system() is a bit different than most functions in this respect.

      -sauoq
      "My two cents aren't worth a dime.";
      
        Or you can simply system(@LIST) == 0 or die; :-)

        Makeshifts last the longest.

Re: vanishing system call
by bluto (Curate) on Sep 20, 2002 at 19:59 UTC
    In addition to checking the return code and printing $? if the return code is non-zero, make sure you have enough file temp space.

    Sort creates temporary files under /tmp or where $ENV{TMPDIR} points to depending on your version of sort. Make sure you have upto double the size of what you are sorting (e.g. 60 MB). If I'm sorting something large I specifically set the temp sorting directory (many sort's use -T) so I know I have enough space and am not competing with someone else's space usage.

    bluto

      Hi all,
      This reply is to the previous three posts, as some possibilities are raised more than once. Thanks for all the help...

      Temp space - checked that, and there's plenty of room. Helpfully enough, we did have some trouble with space limitations on /var/tmp on this server previously, and the error messages were fairly explicit.

      Output space - same thing, there's plenty of room in the output directory.

      Path to "sort" - a good point; I hadn't tried using the full path to the binary. But the script works perfectly until I increase the number of data files I'm processing past a certain point (3 of them). So I doubt the path is a problem.

      Unnecessary "cat" - oops. Quite right. I honestly don't know why I put that in there! I'll try it without "cat" - could that extra process be using up all our allocated memory? Again, I still don't understand why I wouldn't get some kind of error, if that were the case.

      Here's another interesting wrinkle. Before and after the "sort" system call, I just tried putting some other system calls to see if they worked. So now it looks like this:
      system("echo 'now executing sort...' "); system(" cat $file | sort -t\\| +1 -2 > $file.out"); system("ls -la");
      If I process 2 data files, all 3 system calls execute. If I process 3 data files, the first 2 system calls execute, but the third one doesn't. If I process 4 data files, none of them execute. Spooky! So, okay, if I'm running out of memory, why do the system calls die so silently? I guess I'm probably trying to determine if this is a Perl thing, or a Solaris thing. I'm baffled, though, because I've always found Perl to have exceptionally useful errors. I'll toss it by the Unix admins just to see what happens....
      Thanks!
        So, okay, if I'm running out of memory, why do the system calls die so silently?

        How do you know they are dying silently? You haven't been listening to what system may be trying to tell you. I recommend you debug this based starting strictly with the system() return values, since you say it was working before. Don't complicate the problem by changing several things at once.

        if (system "your command") != 0){ my $exitval = $? >> 8; # return val from your cmd my $signum = $? & 127; # set if cmd stopped by a signal my $coredumped = $? & 128; # set if cmd dumped core }
        What perl version are you running? On what Solaris?

        What does this program do for you?

        #!/usr/bin/perl -w use strict; $|=1; my $limit=20; for(my $i=0; $i<$limit; ++$i) { print "$i\n"; system("/bin/echo echoing -- $i"); }
        It works fine on the stock 5.005_03 that came with my Solaris 5.8 box. I can even increase $limit to 2000 with no problems.

        If that all works fine, all I can think of is that maybe you're using up too many filehandles as part of your other processing? What _else_ are you doing for each file, apart from the system calls?
        --
        Mike

A clue!
by Anonymous Monk on Sep 20, 2002 at 21:39 UTC
    Aha! Got something, finally.

    I tried using the unless (system(...) == 0) {die...} construct suggested by sauoq above, and finally got an error message, albeit one that doesn't make much sense to me(at least initially). If I use $!, I get a message that says "not enough space"... which suggests that "sort" doesn't have enough room to create its temporary files. But I monitored /var/tmp (which is where it normally happens on this server) as I was running the script, and it never got above 33%.

    So maybe something or someone changed "sort" so that it doesn't use /var/tmp anymore? That is, of course, a question for the Unix admins. I just tried the script with "sort -T" to direct the temporary files to a place where I know there is room, and it still failed with the same message. And then I tried a separate script that does nothing but the system call with the "sort -T", and it worked. Hmmmmm.... so the error message points to a space problem, but the behavior of the script points to a memory problem. I'm still confused, but it's a starting point, at least.

    Thanks to all for the help, especially sauoq for the method of catching errors from a system("...") call. This has been kicking me in the head for two days now. I'm pretty sure the problem lies in the server environment, and now at least I've got something more substantial to take to the Unix admins.

    Thanks....

      If I use $!, I get a message that says "not enough space"...

      This is because $! is not the right variable to use, and probably has never been set in your program (and so contains some random and unrelated value). See the above post by virtualsue for the correct debugging method for system.



      If God had meant us to fly, he would *never* have given us the railroads.
          --Michael Flanders

        This is because $! is not the right variable to use

        The only time $! is meaningful with regard to system() is when the return value in $? (the right variable to use) is -1. A -1 indicates that the program didn't start. In that case, $! should tell you why.

        -sauoq
        "My two cents aren't worth a dime.";
        
      Talking about quotas... Have you ensured that you are allowed to use that much memory?
      Make sure that you're not exceeding your segment size. To determine this type "ulimit -a" at your solaris prompt. You'll get something like:
      core file size (blocks) unlimited data seg size (kbytes) 131072 file size (blocks) unlimited max memory size (kbytes) 1019872 open files 4096 pipe size (512 bytes) 8 stack size (kbytes) 2048 cpu time (seconds) unlimited max user processes 64 virtual memory (kbytes) 1048576
      Notice the "data seg size" there. If it's less than the total size of your files you may be hitting your limit. For example if it's set at 100MB for you then 3 files at 30MB each will process easily, but a forth file bringing the total up to 120MB will hit your segmentation size (and probably cause a "segmentation fault") and it'll be as if the sort never occured.

      If you're very lucky (assuming your system administrators like you etc) you'll be able to raise the size of your data segments with "ulimit -d <larger number here>". You probably won't be able to raise it above the "max memory size" that you've been given though.

      Hope that helps.

      jarich

      "Hmmmmm.... so the error message points to a space problem, but the behavior of the script points to a memory problem."

      That reminds me of a Solaris 2.6 box where /var/tmp appeared to be memory mapped. Which was very awkward when a script ran amomk and logged the same old error message every few seconds. So we ran out of Swap rather than just ordinary disk space and all kinds of regular daemon processes died, we weren't even able to login anymore.

      You may want to talk to your sysadmins abou this one...

      --
      Cheers, Joe

Re: vanishing system call
by mshiltonj (Sexton) on Sep 21, 2002 at 10:54 UTC
    are you trying:
    my $status = system(" cat $file | sort -t\\| +1 -2 > $file.out "); print STDERR $status;

    This will tell you what the system call is returning to your program. If the system call fails, your perl script will keep on going with no errors.
    You will have to detect the errors your system call returns and act accordingly.
      Hi all, Thanks for the debugging tips. You are absolutely right, of course - what I've been trying to get is the reason why it's failing. I'm trying it out with virtualsue's suggestions right now... will post the results when it's done. Thanks
Re: vanishing system call
by Dr. Mu (Hermit) on Sep 21, 2002 at 23:35 UTC
    Okay, this is a little off the wall, but it happened to me, so I'll pass it along FWIW. My Linux box got into a situation where the /tmp directory, which has its own partition, contained a file advertising a size larger than the entire partition! (I still don't know how.) But my system monitor reported that /tmp was 80% free, and most programs didn't have any problems accessing it. Netscape's mail client, however, choked every time it needed to create a temporory file, with an "out of space" error. Once I discovered and eliminated the anomalous file, things were back to normal. (A fsck is recommended, BTW.)

    If, in fact, your problem turns out to be a file space issue, you may want to give your /tmp directory a thorough perusal -- just in case.

      Dr. Mu, I know that this is off-topic, but what you ran into there is likely a 'sparse file'. This is a file system feature that lets the system skip large chunks in the middle of a file that have no content. If you open a sparse file and put a character at offset 1E9, an OS that supports sparse files won't write all the intermediate blocks, just the one with the data, and a note of the offset of that block. It's a feature.
      --
      Spring: Forces, Coiled Again!
Re: vanishing system call
by bart (Canon) on Sep 21, 2002 at 15:56 UTC
    I've tried -w, $!, and all that
    You should be looking at $? instead.

    And try capturing the output of STDERR, for example into a text file, using the "2>logfile.txt" syntax. I'm sure some useful info will be found in there.

Re: vanishing system call
by grantm (Parson) on Sep 22, 2002 at 05:30 UTC

    No one seems to have suggested looking at the STDERR output. You might want to try:

    system("sort -t\\| +1 -2 $file >$file.out 2>/tmp/errors");

    Another possibility is that you have so many filenames you've exceeded your shell's maximum command length. A bit unlikely I suspect (specially at only 30 - unless they have very long pathnames).

    I'll also put my 'pedant' hat on and point out that you're not making a 'system call', but a call to Perl's system function. These are two very different things.

Re: vanishing system call
by graff (Chancellor) on Sep 22, 2002 at 23:41 UTC
    It's been a rather long discussion already, but it seems that you haven't given any of the context that precedes the "system()" line in the original post. Are you sure that the "final output file" that results from the preceding steps really exists with the expected content when 3 or more input files are involved?

    How sure are you that your script isn't dying quietly at some point before "system" (and the other diagnostics you've tried now) when you have 3 or more input files?

Re: vanishing system call
by Anonymous Monk on Sep 23, 2002 at 05:20 UTC
    One more idea. Could you be suffering from a quota/limit on open file descriptors? Try making sure that all files you have finished processing are properly closed.
Re: vanishing system call
by LanceDeeply (Chaplain) on Sep 23, 2002 at 12:31 UTC
    how about...

    exiting right before you call, and actually doing the command on the command line. any system errors should pop right up. i know this doesnt solve your error handling problem but atleast this way you should be able to see the error.

    just my cheesy way of debugging.
much appreciated
by Anonymous Monk on Sep 23, 2002 at 14:46 UTC
    Hi all,
    This is the Anonymous Monk (I guess I should give myself a login) who originally posted this question. Thanks to everyone for all the very helpful suggestions. It's very gratifying to know there are so many helpful Perl folks out there (not that much of a surprise, though).

    I wanted to let you know I am not ignoring your suggestions. I am going to try them out, one by one, today, and will post results when I can get to the bottom of this question. But this will probably take all day as I can only work on this in my "spare time". In the meantime, here are some quick replies to a few of the posts.

    virtualsue - I tried your $exitval, $signum, and $coredumped suggestions - unfortunately (or maybe not) they all returned zero. ??? I'll have another shot at this... I tried it on Saturday and wasn't paying that much attention.

    sauoq - Thanks for clarifying the difference between $! and $?. I tried both - $? returned a zero, which explains why $! gave me the seemingly erroneous error "not enough space" (perhaps).

    graff - Yes, I've confirmed that the "final output file" (the one I'm trying to sort) does exist, and in fact it sorts just fine when I do it either on the command line or within a separate Perl script, using the exact same system(" ") command. That's why I'm thinking it may be some memory issue. (Oh, I guess that reply also covers LanceDeeply.)

    helgi - It's funny you should suggest I use Perl's "sort" for this, because that's what I originally wanted to do, but after reading about it in the Camel book I was afraid it might hog too much memory, if I could get it to work at all. I've never tried Perl's "sort" before, but the impression I get from the book is that it's intended to be used for arrays, not big old text files. If you have a suggestion as to how to use it, I'd be grateful.

    jarich - Thanks for the ulimit command. It looks like I've got a limit of 2gb for the "data seg" ... I'll double check this with the Unix admins.

    Okay, I'll get to work on the debugging tips and will post the results, for the curious, later today. Thanks again, to everybody, for all the help.
Re: vanishing system call
by helgi (Hermit) on Sep 23, 2002 at 10:15 UTC
    You have received a lot of answers to this, but as far as I can see, no one has given you the correct way to check the error from your system call. If someone has, I apologise.

    system returns 0 if there are no errors, the opposite of most other operations.

    $? contains the error for daughter processes

    . Anyway, here is one way:

    system(" cat $file | sort -t\\| +1 -2 > $file.out ") == 0 or die "Erro +r:$?\n";
    However, that being said, my preference would be to use Perl's sort for this.

    -- Regards,
    Helgi Briem helgi AT decode DOT is

Re: vanishing system call
by strider corinth (Friar) on Sep 23, 2002 at 19:23 UTC
    Disclaimer: I may be totally wrong about this. The actual event described was a few years ago, and may have accumulated some bit rot since then.

    Some time ago, I was running a Perl script that used a great deal of memory; so much, in fact, that it hit its ulimit. As soon as that happened, the program would die without warning or error. Only by watching my memory consumption did I realize what was going on (this was back on Linux 2.0.x).

    If your sort was indeed using too much memory and is hitting the ulimit, it might die without doing the Right Thing in terms of returning information to Perl, and possibly in turn shutting down your program. This doesn't explain why your previous command wouldn't execute, though.

    On a Solaris system I used to work with, one of the tmp directories was also used for swap... that seems to be a fairly common practice. I'm betting it's that, as joe++ said.

    - strider( corinth );
    --

    Love justice; desire mercy.
Re: vanishing system call (solution)
by Anonymous Monk on Sep 24, 2002 at 14:44 UTC
    Hi all
    Again, thanks for the help with this question. I tossed it by the Unix admins - there was some freaky problem with the swap space on our server. It's still a mystery to them and they're investigating, but as a temporary solution they added some more space.
    So I didn't get a chance to try out all the wonderful debugging tips for this issue, but they've been duly noted for future reference. Thanks!