Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Re^2: Help with $File:Find

by roperl (Beadle)
on Feb 14, 2018 at 21:24 UTC ( #1209179=note: print w/replies, xml ) Need Help??

in reply to Re: Help with $File:Find
in thread Help with $File:Find

There is more to my get_files sub that does a foreach (@array) and then checks the files for certain criteria.
I left out that section for clarity here.
My $TYPES is defined like so:
my $TYPES = 'txt|gz|zip';

The only thing I can think of is that the find could be finding a number of files in a directory first and then filling in the names and so if the file is gone in the middle of the operation $File::Find::name will fail because the file name isn't defined.

Replies are listed 'Best First'.
Re^3: Help with $File:Find
by Marshall (Abbot) on Feb 15, 2018 at 01:03 UTC
    I and other Monks aren't sure what is happening here. Let's get more info:

    Add the line $|=1; at the top of your program. This will unbuffer STDOUT. Then put in some print statements as I suggested earlier. Then when the error happens, we will have an idea of what the program was doing. By default, STDOUT is buffered meaning that it only prints when its line buffer is full. STDERR is non-buffered by default meaning that it's error lines print right away. When you un-buffer STDOUT, the time sequence of the normal prints and error prints are preserved. The line right before the error will show what the program was doing right before the error occured.

    Update: you said "There is more to my get_files sub that does a foreach (@array) and then checks the files for certain criteria. I left out that section for clarity here." It could very well be that your simplification obscures the actual problem. Can you reproduce the problem with your simplified code?

      I can't seem to reproduce the problem. It doesn't happen very often. This program runs a daemon and only occurred once in 2 weeks. I'm pretty certain it's not after the $File::Find::name as when the error occurs it references the line where push @array, $File::Find::name is. Here is my actual code.
      find( { wanted => \&get_files, preprocess => \&nodirs }, "$BASEDIR +/$dir" ); sub nodirs { grep !-d, @_; } sub get_files { my @array; push @array, $File::Find::name if ( (/^(?!\.).*\.($INTYPES)$/i +) || ( (/^(?!\.).*\.($INTYPES)\.($ENGZTYPES)$/i) && !(/\.($OUTTYPES)\ +.($ENGZTYPES)$/i) ) ); foreach (@array) { if ( ( exists( $globalfiles{$_} ) ) && ( ( $globalfiles{$_ +} ne 'submitted' ) || ( $globalfiles{$_} ne 'working' ) ) ) { next; } else { if ( -e $_ ) { my $lckfile = getlckfile($_); if ( -e $lckfile ) { next; } push @tmparray, $_; } } } }
        Wow! This is a class of errors like a "UFO report". This can be hard to track down, but you already know that!

        I will "think out loud" for a bit... I don't see any possible way that the wanted routine could actually be given an undef $_ value. It could be that there is something going wrong in the regex that causes an error message that is imprecise. I would look at quotemeta in the Perl docs. Using Perl variables in the regex can cause some weird things to happen, possibly related to some very unusual file name. I would use some form of quotemeta for the variables in your regex.

        This is pretty short code and I would simplify it.

        One issue is that I think your file test is not the best and as I mentioned before the nodirs() sub is not needed. There are things that are not directories, but which are also not actual files in the normal sense. Maybe some weird and unusual happens for these "strange files". I would eliminate nodirs() entirely and add a return unless -f $_; at the top of get_files(); That would eliminate the directories and also "files" which are not really files, like some kinds of links.

        The use of my @array; in get_files() shows a fundamental misunderstanding of what the "wanted" routine gets and what it should do. @array will either be undefined or have at most one entry in it. Therefore your foreach loop is unnecessary. As general advice, I would keep the "wanted" subroutine as simple as possible, perhaps even declare my @array at a higher scope and just have get_files() just do the push? Find will actually cd (change directory) as it traverses the dir structure. Sometimes figuring out what Find was doing or doing some kind of error recovery can be problematic. I don't think that necessarily applies here. But my thinking would be that the code in the foreach loop doesn't needs to be in get_files() - mileage varies and I have no knowledge of your actual application.

        One technique that can be used here is to install a signal handler for Warnings that would just apply to this section of code. When the WARNING happens, this code would intercept the WARNING message and execute some code that spews out what it knows about. This is one technique to capture info for "once every 2 week" sort of problems. This is complex enough that you should start a new SOPW question about how to do that. If I post code here like that, nobody with a similar problem will ever find it.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1209179]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2018-06-22 21:50 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (124 votes). Check out past polls.