Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^4: Help with $File:Find

by roperl (Beadle)
on Feb 16, 2018 at 18:16 UTC ( [id://1209320]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Help with $File:Find
in thread Help with $File:Find

I can't seem to reproduce the problem. It doesn't happen very often. This program runs a daemon and only occurred once in 2 weeks. I'm pretty certain it's not after the $File::Find::name as when the error occurs it references the line where push @array, $File::Find::name is. Here is my actual code.
find( { wanted => \&get_files, preprocess => \&nodirs }, "$BASEDIR +/$dir" ); sub nodirs { grep !-d, @_; } sub get_files { my @array; push @array, $File::Find::name if ( (/^(?!\.).*\.($INTYPES)$/i +) || ( (/^(?!\.).*\.($INTYPES)\.($ENGZTYPES)$/i) && !(/\.($OUTTYPES)\ +.($ENGZTYPES)$/i) ) ); foreach (@array) { if ( ( exists( $globalfiles{$_} ) ) && ( ( $globalfiles{$_ +} ne 'submitted' ) || ( $globalfiles{$_} ne 'working' ) ) ) { next; } else { if ( -e $_ ) { my $lckfile = getlckfile($_); if ( -e $lckfile ) { next; } push @tmparray, $_; } } } }

Replies are listed 'Best First'.
Re^5:Help with $File:Find
by Marshall (Canon) on Feb 16, 2018 at 23:09 UTC
    Wow! This is a class of errors like a "UFO report". This can be hard to track down, but you already know that!

    I will "think out loud" for a bit... I don't see any possible way that the wanted routine could actually be given an undef $_ value. It could be that there is something going wrong in the regex that causes an error message that is imprecise. I would look at quotemeta in the Perl docs. Using Perl variables in the regex can cause some weird things to happen, possibly related to some very unusual file name. I would use some form of quotemeta for the variables in your regex.

    This is pretty short code and I would simplify it.

    One issue is that I think your file test is not the best and as I mentioned before the nodirs() sub is not needed. There are things that are not directories, but which are also not actual files in the normal sense. Maybe some weird and unusual happens for these "strange files". I would eliminate nodirs() entirely and add a return unless -f $_; at the top of get_files(); That would eliminate the directories and also "files" which are not really files, like some kinds of links.

    The use of my @array; in get_files() shows a fundamental misunderstanding of what the "wanted" routine gets and what it should do. @array will either be undefined or have at most one entry in it. Therefore your foreach loop is unnecessary. As general advice, I would keep the "wanted" subroutine as simple as possible, perhaps even declare my @array at a higher scope and just have get_files() just do the push? Find will actually cd (change directory) as it traverses the dir structure. Sometimes figuring out what Find was doing or doing some kind of error recovery can be problematic. I don't think that necessarily applies here. But my thinking would be that the code in the foreach loop doesn't needs to be in get_files() - mileage varies and I have no knowledge of your actual application.

    One technique that can be used here is to install a signal handler for Warnings that would just apply to this section of code. When the WARNING happens, this code would intercept the WARNING message and execute some code that spews out what it knows about. This is one technique to capture info for "once every 2 week" sort of problems. This is complex enough that you should start a new SOPW question about how to do that. If I post code here like that, nobody with a similar problem will ever find it.

      Thanks for your help on this one. Not sure why I thought the "wanted" routine would return more than one result. So I've removed the foreach(@array) and removed the nodirs() preprocessing
      find( { wanted => \&get_files }, "$BASEDIR/$dir" ); sub get_files { return unless ( -f $_ ); my $tmpfile = $File::Find::name if ( ($_) && ( (/^(?!\.).*\.($ +INTYPES)$/i) || ( (/^(?!\.).*\.($INTYPES)\.($ENGZTYPES)$/i) && !(/\.( +$OUTTYPES)\.($ENGZTYPES)$/i) ) ) ); if ( ( exists( $globalfiles{$tmpfile} ) ) && ( ( $globalfiles{ +$tmpfile} ne 'submitted' ) || ( $globalfiles{$tmpfile} ne 'working' ) + ) ) { return; } else { if ( -e $tmpfile ) { my $lckfile = getlckfile($tmpfile); push @tmparray, $tmpfile unless ( -e $lckfile ); } } }
        Actually there was a reason for nodirs(). I don't want to find recursively. I'm only looking for files in the specified directory, I'm not interested in anything in its sub directories if they exist
        find( { wanted => \&get_files, preprocess => \&nodirs }, "$BASEDIR/$di +r" ); sub nodirs { grep !-d, @_; }
        I'll have to add that part back in

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1209320]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2024-04-25 18:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found