Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Re^4 Useful addition to Perl?

by etcshadow (Priest)
on Mar 05, 2004 at 05:39 UTC ( #334141=note: print w/replies, xml ) Need Help??

in reply to Re: Re: Re: Useful addition to Perl?
in thread Useful addition to Perl?

OK... I bothered to finish it. Or at least get it to a working state (I don't really like just grepping out the "." and ".." directories... it feels so non-portable (even though I know it's cool on windows and *nix)).
package r; use strict; tie @ARGV, 'r::Tie::RecursiveARGVArray', @ARGV; sub import { } package r::Tie::RecursiveARGVArray; use Tie::Array; use base 'Tie::Array'; use File::Spec; sub TIEARRAY { my ($classname,@init) = @_; bless [@init], $classname; } sub FETCH { my ($self, $index) = @_; $self->_ReplaceDirs($index,$index); $self->[$index]; } sub FETCHSIZE { my ($self) = @_; scalar @$self; } sub STORE { my ($self, $index, $value) = @_; $self->[$index] = $value; } sub STORESIZE { my ($self, $count) = @_; $#$self = $count - 1; } sub SPLICE { my ($self,$offset,$length,@list) = @_; $self->_ReplaceDirs($offset,$offset+$length-1); splice(@$self,$offset,$length,@list); } sub POP { my ($self,$item) = @_; $self->_ReplaceDirs(-1,-1); pop(@$self); } sub _ReplaceDirs { my ($self, $fromindex, $toindex) = @_; # as long as the index range contains directories, substitute +the directory contents my $recursionguard = 0; while (my @indices = grep { -d $self->[$_] } ($fromindex..$toi +ndex) and $recursionguard++ < 10000) { my $index = $indices[0]; opendir DIR, $self->[$index] or do { warn "Cannot traverse directory $self->[$index +]: $!\n"; splice(@$self, $index, 1, ()); # remove the ba +d-apple next; }; my @contents = readdir DIR or do { warn "Cannot read directory $self->[$index]: $ +!\n"; splice(@$self, $index, 1, ()); # remove the ba +d-apple closedir DIR or warn "Cannot close directory $ +self->[$index] (weird): $!\n"; next; }; closedir DIR or warn "Cannot close directory $self->[$ +index] (weird): $!\n"; # if there is any portable way to do this... I'd like +to hear it! @contents = grep !/^\.{1,2}$/, @contents; # convert directory contents to paths by prepending th +e directory. # even be super nice about using catfile or catdir, ap +propriately @contents = map { my $asfile = File::Spec->catfile( $self->[$ind +ex], $_ ); -f $asfile ? $asfile : File::Spec->catdir( $se +lf->[$index], $_ ); } @contents; # replace directory with its contents splice(@$self, $index, 1, @contents); } } 1;
complete with example use:
[me@host]$ cat `find d* -type f` | wc -l 58040 [me@host]$ perl -mr -lne '$x++; END{print $x}' d* 58040 [me@host]$
I guess now I should pod this up and make it my first contribution to cpan :-D
------------ :Wq Not an editor command: Wq

Replies are listed 'Best First'.
Re: Re^4 Useful addition to Perl?
by demerphq (Chancellor) on Mar 06, 2004 at 13:53 UTC

    I think the only problem with all of this is that you arent using it as a wrapper to File::Find. Youve got a good idea here, but hand rolling a directory traversal is not in my opinion smart. Also the way that you do it worries me a touch. Its an interesting implementation of a depth first traveral, but surely its quite inefficient? Arent you repeatedly doing file system checks over the same objects?

    I think you should rewrite this as an alternate interface to File::Find. Which would get you better portability and whole host of hooks and options to add. Overall its a good idea though. And I go with calling it something long and giving it a flexible import() interface. For instance:

    use File::Find::ARGV filter=>sub { /\.txt/i }; while (<>){ ... }

    Anyway, its an interesting idea. ++ to you.


      First they ignore you, then they laugh at you, then they fight you, then you win.
      -- Gandhi

      Well, the problem, as I see it, with writing this as a wrapper for File::Find is that that would be suboptimal for the most important use case, and that is perl one-liners (-pe and -ne). Also, for that matter, what this does and what File::Find do really only partially overlap, in that they both traverse directories... but that's about the end of it.

      The ultimate intent of this is to DWIM when I say perl -mr -ne 'print if /foo/' *, and to not do anything silly in the process, like creating a list of every file on the file-system. Maybe I'm wrong, here, but I think that this is an important enough goal (both to do and to do well), that it outweighs the importance of reusing File::Find. Granted, I'm not saying that reuse shouldn't be involved... I sure as heck wouldn't want to reimplement File::Spec.

      Really, what it comes down to is that File::Find implements a "push" interface from the file-system... that is, File::Find pushes file names into your code (because you give it a code-ref as an entry-point for your code). The thing is, though, that perl -ne or perl -pe would need a "pull" interface. That is, they translate to while (<>) { ... }. Which, itself, is essentially:

      while (@ARGV) { $ARGV = shift @ARGV; open ARGV, $ARGV or warn("Couldn't open $ARGV: $!\n"), next; while (<ARGV>) { ... } }
      Now, to look at that code, you can see that it is definitely trying to pull filenames out of @ARGV... so the easiest way to implement an interface on that is to tie a behavior to reading from @ARGV... which is exactly what I've done.

      Now, it's true that I could make this pulling from @ARGV use File::Find as the behavior which underlies the read-event... but if I did that, then I'd end up reading in the whole file-system tree (or the whole sub-tree that is being accessed)... and if there's no good reason to do it that way, then I'd rather not. Granted, if File::Find offered a means to essentially say "depth => 1" (that is, give me all the contents of this directory, but don't traverse sub directories), then that might be worthwhile... as it would save the effort of opendir; readdir; closedir; grep; fix-file-names.... but that's just not what File::Find does. Moreover, I've never been happy with the fact that File::Find actually chdir's into the directory as it goes... that's just ugly. It should use File::Spec to prepend the leading path... but I digress.

      Anyway, I hope that explains why I didn't want to use File::Find for this. I did give it serious consideration... but ultimately, I think that the method I arrived at in the end is the best one that I considered. It is simple, elegant, efficient, and useful. And doing it with File::Find just couldn't make it be all of those at once.

      ------------ :Wq Not an editor command: Wq

        Trouble is that the code as you posted is liable to go into a infinite loop if the directory listed contains a symlink to itself or to one of its parents. Also there are other similar problems IMO with your code... The reason I advocate using File::Find is that it's already handled these issues, as well as the other lurking in your code. Your idea is great. But IMO, you should avoid reinventing File::Find and just use it.


          First they ignore you, then they laugh at you, then they fight you, then you win.
          -- Gandhi

Re: Re^4 Useful addition to Perl?
by BrowserUk (Pope) on Mar 06, 2004 at 11:12 UTC

    Why use (open|read|close)dir and then have to concern yourself with removing '.' & '..' and use File::Spec->catfile/catdir to fix up the pathnames, when glob would take care of (most?) of that for you?

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
Re: Re^4 Useful addition to Perl?
by tsee (Curate) on Mar 06, 2004 at 10:59 UTC

    You really should because I'd be the first to download the module since I've been writing one-liners using File::Find far too often. As you all know, File::Find's interface sucks...

    If you need any support with packaging the module correctly for CPAN, feel free to contact me via email and I'll try to help.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://334141]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2018-10-18 04:54 GMT
Find Nodes?
    Voting Booth?
    When I need money for a bigger acquisition, I usually ...

    Results (99 votes). Check out past polls.