http://www.perlmonks.org?node_id=584548

marvell has asked for the wisdom of the Perl Monks concerning the following question:

Is there a way to abort a File::Find::find command from within the wanted subroutine. I'm attempting to find files with the same inode, but I can happily stop when I've found as many links as the link count of the target file

This is my hideous solution:

#!/usr/bin/perl use File::Find; my ($dir,$file) = @ARGV; # TODO usage, file and directory check my ($finode,$fnlinks) = (lstat($file))[1,3]; # TODO check for link count of 1 my @files; find(sub { return if $fnlinks == 0; # this is rubbish my ($inode,$nlink) = (lstat($_))[1,3]; return unless $inode == $finode; push(@files, $File::Find::name); $fnlinks--; }, $dir); print map {"$_\n"} @files;

--
Steve Marvell

Replies are listed 'Best First'.
Re: aborting File::Find::find
by jdporter (Paladin) on Nov 16, 2006 at 18:14 UTC

    Perhaps the easiest way is to die, and wrap the find call with eval.

    eval { find( sub { ...; --$fnlinks or die; }, $dir ); };

    Update: Another goto-like solution, if you don't like the idea of generating an exception:

    UNLINKS: { find( sub { ...; --$fnlinks or last UNLINKS; }, $dir ); }

    We're building the house of the future together.
      jdporter,
      If you want scary action at a distance that is subject to break if the implementation changes:
      find(sub { ...; --$fnlinks or last Proc_Top_Item; }, $dir);
      I found this by looking under the File::Find covers.

      Now that I think about it, this sounds like the sanest solution. Of course I mean a patch to File::Find where you can either set some flag or pass in a closure that evaluates to true to determine if it can terminate early which is checked each loop.

      Cheers - L~R

      You probably would want to die with a specific string (maybe "FOUND_ALL_LINKS") and check $@ after the eval to make sure that you really exited because you hit your link limit rather than something else going wrong and dieing.

      It's certainly more efficient, but not exactly Mr Elegant. Nor is a goto. However, it is still top of the lsit of solutions. Thanks.

      --
      Steve Marvell

        One way to make the eval/die a little more elegant is to use the Error module and have 'real' exceptions:
        #!/usr/bin/perl use warnings; use strict; use Error qw/:try/; use File::Find; { package Exception::FoundFile; use base qw/Error/; sub new { my $s = bless {}, __PACKAGE__; $s->{_file} = shift; return $s; } sub file { return shift->{_file}; } } try { find(sub { my $file = $File::Find::name; # Show our working, so you can see we stop early print "Examining $file\n"; throw Exception::FoundFile->new($file) if ($file =~ /f/); # Or whatever your condition is }, "."); } catch Exception::FoundFile with { my $e = shift; print "Found file: ", $e->file, "\n"; } otherwise { print "Didn't find a file\n"; };
        But the 'real solution' would be for the File::Find interface to respect a return value from the sub as to whether it should continue or not.

        Too late for that now, sadly.

Re: aborting File::Find::find
by Sidhekin (Priest) on Nov 16, 2006 at 18:37 UTC

    In situations like these, I've been arguing for goto LABEL:

    find( sub { ...; --$fnlinks or goto DONE; }, $dir ); DONE: ...

    It has twothree advantages over the eval/die pair:

    • it makes your intention more clear;

    • it is better prepared for future versions of File::Find, that just might treat die-ing callbacks differently (that the eval/die pair works does not follow from the documentation); and

    • as Fletch++ notes, eval/die is prone to hiding errors as other code dies. goto LABEL has no corresponding weakness.

    The last LABEL version is similar, but noisy under -w or use warnings; generating Exiting subroutine via last warnings.

    Besides, goto LABEL is more readable still.

    print "Just another Perl ${\(trickster and hacker)},"
    The Sidhekin proves Sidhe did it!

      Did you:

      grep DONE: File/Find.pm

      before you wrote this code? Do you do that for every label you have ever used this way each time you upgrade File::Find? I'd have more sympathy for going against the first paragraph of "perldoc -f goto"1 if Perl supported the following syntax:

      File::Find::find( sub { ... goto My::Package::FOUND ... } ... ); My::Package::FOUND: ....

      Does that give you a better feel why some people are not in favor of such a practice?

      I suspect that using last to exit a sub generates a warning because it is widely considered a bad practice and last is used enough that accidentally using it to exit a subroutine appeared on the radar. So the warning for last and not for goto has more to do with last being seen as useful and used while goto is not. In particular, the lack of a warning is not an indication that the practice is not seen as a bad one (sorry, I wanted one more negation in there but I grew tired).

      I shy aware from using things that the documentation tells me that I shouldn't. Those are the things that get "fixed" to no longer work or that aren't well tested and so nobody notices when they break and so bugs in them survive into "stable" releases.

      I shy aware from practices that break just because someone else somewhere also came up with "DONE" for their label name.

      But I do see the appeal of such simple and clear code (once the reader gets over the shock of using goto to jump out of a possibly large number of subroutine calls, that is).

      BTW, my solution for this problem would probably be to spend less time rolling a dozen-line replacement for File::Find then I'd have to spend searching the File::Find docs to see if it supported a "thanks, I'm done" feature. The classic gotchas are well known to me2 (as well as the failings of File::Find that have caused me grief repeatedly over the many years when I've been unfortunate enough to try to use it or try to use something that used it).

      Here, an example I threw together in a couple of minutes:

      BEGIN { my( $dot, $up )= ( File::Spec->curdir(), File::Spec->updir() ); sub SearchDirs { my( $path, $subdir )= @_; $path= File::Spec->catdir( $path, $subdir ); chdir $subdir or die "Can't chdir, $path: $!\n"; opendir D, $dot or die "Can't opendir, $path: $!\n"; my @files= grep $_ ne $dot && $_ ne $up, readdir D; closedir D; for( @files ) { my $recurse= ! -l $_ && -d _; # ... SearchDirs( $path, $_ ) if $recurse; } chdir $up or die "Can't chdir .. from $path: $!\n"; } }

      Using the above would mean that I at least wouldn't have to worry about "goto DONE;" suddenly not working because I upgraded File::Find.

      Just for fun, let's avoid that whole problem and make return work by eliminating the recursion:

      #!/usr/bin/perl -w use strict; use File::Spec; # Globals: my( $FiNode, $FnLinks, @Found ); Main( @ARGV ); exit( 0 ); BEGIN { my( $dot, $up )= ( File::Spec->curdir(), File::Spec->updir() ); sub SearchDirs { my( $path )= @_; my @depth= "."; my @todo= ( ".", $path ); while( @todo ) { my $path= shift @todo; my $subdir= shift @todo; while( $path ne $depth[-1] ) { chdir $up or die "Can't chdir .. from $depth[-1]: $ +!\n"; pop @depth; } $path= File::Spec->catdir( $path, $subdir ); push @depth, $path; chdir $subdir or die "Can't chdir, $path: $!\n"; opendir D, $dot or die "Can't opendir, $path: $!\n"; my @files= grep $_ ne $dot && $_ ne $up, readdir D; closedir D; for( @files ) { push @todo, $path, $_ if ! -l $_ && -d _; if( $FiNode == ( lstat _ )[1] ) { push @Found, File::Spec->catfile( $path, $_ ); return if --$FnLinks < 1; } } } } } sub Main { die "Usage: file dir\n" if 2 != @_; my( $fpath, $dir )= @_; ( $FiNode, $FnLinks )= ( lstat $fpath )[1,3]; if( 1 == $FnLinks ) { print $fpath, $/; } else { SearchDirs( $dir ); print $_, $/ for @Found; } }

      It even works. (:

      - tye        

      1 Quote: It can be used to go almost anywhere else within the dynamic scope, including out of subroutines, but it's usually better to use some other construct such as "last" or "die".

      2 Such as:

      1. Either use chdir and opendir on File::Spec->curdir() or prepend the path before using stat-like operations
      2. Skip File::Spec->curdir() and ->updir() or you'll loop forever
      3. Don't recurse into symbolic links (or keep a hash of where you've been or just assert that circular links cause infinite loops)

      Sidhekin,
      In this case, I will have to flat out disagree. The sane solution is to patch File::Find. See my comment to jdporter above. Could you remind me what module of your's uses goto label. I promised some time ago that I would look at it to try and see if there was a sane alternative and never did. I do remember at first glance it was a pretty tough cookie to crack.

      Cheers - L~R

        Well, I'll agree that one sane solution would be to patch File::Find. Provided you have the time, and the patch is accepted, that is. Meawhile, though we may disagree, I maintain that goto LABEL is another. :-)

        Test::Trap uses goto LABEL. (It's the second goto in the source; the first is a goto &function.)

        Tough? Since it calls into user code that may contain arbitrary many levels of subroutine calls and/or eval, I see no way to do it without a LABEL, and I'll argue that goto LABEL is more readable than last LABEL where, as here, the latter would require the addition of a loop-once block and a no warnings 'exiting'.

        print "Just another Perl ${\(trickster and hacker)},"
        The Sidhekin proves Sidhe did it!

      IMHO goto is a quite exceptable solution when:

      1. It is easy to read and understand the intent
      2. It is within several lines (no more than say 12 lines) from the label

      YMMV

Re: aborting File::Find::find
by graff (Chancellor) on Nov 17, 2006 at 02:46 UTC
    Given that you have a specific target file (hence its inode and the number of links to that inode), then why not just use the normal unix "find" utility:
    use strict; my ($dir,$file) = @ARGV; my ($finode,$fnlinks) = (lstat($file))[1,3]; $/ = chr(0); my @hardlinks = `find $dir -inum $finode -print0`; chomp @hardlinks; # get rid of null-byte terminations printf "found %d of %d links for %s (inode %d) in %s:\n", scalar @hardlinks, $fnlinks, $file, $finode, $dir; print join("\n",@hardlinks),"\n";
    I haven't tried benchmarking that but based on prior experience, if you happen to be searching over any really large directory trees (thousands of files), I know that this approach will be at least 5 or 6 times faster than any solution involving File::Find. (I have posted at least three benchmarks on PM to prove this.)

    It also seems a lot simpler. Since you're looking specifically for hard links (files with identical inodes), the issue of portability to non-unix systems is irrelevant.

    The unix "find" command is the right tool for this job (and perl just makes it easier to use "find", which is worthwhile).

    (update: simplified the "printf" statement a little; also should clarify that the "5 to 6 times faster" is in terms of wall-clock time to finish a given run.)

    (another update: after simplifying the printf, I put the args in the right order so that the output is correct.)

      Will Unix find abort when it finds the correct number of links?

      --
      ¤ Steve Marvell

        By itself, it seems that unix find will not keep track of the number of links associated with a given inode, and won't stop as soon as that number of links is found.

        And of course, if the given path to search is not the root directory of a disk volume, it's possible that one or more links to the given target file will be outside the search space, so adding a bunch of code to ditch out early won't help.

        But assuming you don't need to worry about that kind of anomaly, you can tailor the perl script to short-circuit the find task like this:

        #!/usr/bin/perl use strict; use warnings; my ( $path, $file ) = @ARGV; die "Usage: $0 search/path data.file\n" unless ( -d $path and -f $file ); my ( $inode, $nlinks ) = ( stat _ )[1,3]; die "$file has no hard links\n" if $nlinks == 1; my ( $chld, $nfound, @found ); $SIG{HUP} = sub { $nfound++; `kill $chld` if $nfound == $nlinks }; $chld = open( FIND, "-|", "find $path -inum $inode -print0 -exec kill +-HUP $$ \\;" ) or die "find: $!\n"; $/ = chr(0); while ( <FIND> ) { chomp; push @found, $_; } printf( "found %d of %d links for %s in %s\n", scalar @found, $nlinks, $inode, $path ); print join( "\n", @found ), "\n";
        My first attempt involved just checking the size of @found inside the while (<FIND>) loop, but it turns out that the output from the find process will be buffered, and perl will just wait until the process finishes.

        The above script works because the child process sends a HUP signal to the parent each time a file is found (note the double backslash to escape the semi-colon properly). The parent kills the child as soon as the expected count is reached, the child's output buffer gets flushed, and the parent can finish up right away.

        I tested it on a path that would normally take 30 seconds to traverse with unix find. I watched the initial output from the find run, and created some hard links in one of the directories that would be found early in the process. The above code found those links, reported them correctly, and finished in less than 1 second.

        (UPDATE: Added a "die" condition if stat returns a link count of 1 -- no need to run find in this case.)

        (Another update: shuffled the code slightly so that the signal handler gets set up before the child process gets started.)

Re: aborting File::Find::find
by marvell (Pilgrim) on Nov 16, 2006 at 18:12 UTC
    I know there is File::Walker, but I'm trying to keep at least within the modules availble in core Perl and Debian (stable) packages.

    --
    Steve Marvell

Re: aborting File::Find::find
by zentara (Archbishop) on Nov 17, 2006 at 13:11 UTC
    I've used File::Find::prune to stop searching. You can set a flag when you find your file, and prune if $flag.
    #!/usr/bin/perl # linux only use warnings; use strict; use File::Find; use File::Spec; if (@ARGV < 2){print "Usage: $0 dir depth\n";exit} my ($path, $depth)= @ARGV; my $abs_path = File::Spec->rel2abs($path); #in case you enter . for di +r my $m = ($abs_path) =~ tr!/!!; #count slashes in top path find (\&found,$abs_path); exit; sub found{ my $n = ($File::Find::name) =~ tr!/!!; #count slashes in file return $File::Find::prune = 1 if $n > ($m + $depth); return unless -d; # do stuff here. #print "$_\n"; #name only print "$File::Find::name\n"; #name with full path }

    I'm not really a human, but I play one on earth. Cogito ergo sum a bum
Re: aborting File::Find::find
by petdance (Parson) on Nov 19, 2006 at 01:36 UTC
    Use File::Next instead. It's an iterator, so you control how and when the iterators is called.
    my $iter = File::Next->new( $dir ); my $done = 0; while ( !$done && my $file = $iter->() ) { # do stuff }
    You can blow out of that loop whenever you want, with either a last; or a $done = 1;. The key is that it's YOU controlling the iteration, not File::Find and not some funky control package variables like $File::Find::prune.

    xoxo,
    Andy