Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Re^4 Useful addition to Perl?

by etcshadow (Priest)
on Mar 05, 2004 at 05:39 UTC ( #334141=note: print w/ replies, xml ) Need Help??

in reply to Re: Re: Re: Useful addition to Perl?
in thread Useful addition to Perl?

OK... I bothered to finish it. Or at least get it to a working state (I don't really like just grepping out the "." and ".." directories... it feels so non-portable (even though I know it's cool on windows and *nix)).

package r; use strict; tie @ARGV, 'r::Tie::RecursiveARGVArray', @ARGV; sub import { } package r::Tie::RecursiveARGVArray; use Tie::Array; use base 'Tie::Array'; use File::Spec; sub TIEARRAY { my ($classname,@init) = @_; bless [@init], $classname; } sub FETCH { my ($self, $index) = @_; $self->_ReplaceDirs($index,$index); $self->[$index]; } sub FETCHSIZE { my ($self) = @_; scalar @$self; } sub STORE { my ($self, $index, $value) = @_; $self->[$index] = $value; } sub STORESIZE { my ($self, $count) = @_; $#$self = $count - 1; } sub SPLICE { my ($self,$offset,$length,@list) = @_; $self->_ReplaceDirs($offset,$offset+$length-1); splice(@$self,$offset,$length,@list); } sub POP { my ($self,$item) = @_; $self->_ReplaceDirs(-1,-1); pop(@$self); } sub _ReplaceDirs { my ($self, $fromindex, $toindex) = @_; # as long as the index range contains directories, substitute +the directory contents my $recursionguard = 0; while (my @indices = grep { -d $self->[$_] } ($fromindex..$toi +ndex) and $recursionguard++ < 10000) { my $index = $indices[0]; opendir DIR, $self->[$index] or do { warn "Cannot traverse directory $self->[$index +]: $!\n"; splice(@$self, $index, 1, ()); # remove the ba +d-apple next; }; my @contents = readdir DIR or do { warn "Cannot read directory $self->[$index]: $ +!\n"; splice(@$self, $index, 1, ()); # remove the ba +d-apple closedir DIR or warn "Cannot close directory $ +self->[$index] (weird): $!\n"; next; }; closedir DIR or warn "Cannot close directory $self->[$ +index] (weird): $!\n"; # if there is any portable way to do this... I'd like +to hear it! @contents = grep !/^\.{1,2}$/, @contents; # convert directory contents to paths by prepending th +e directory. # even be super nice about using catfile or catdir, ap +propriately @contents = map { my $asfile = File::Spec->catfile( $self->[$ind +ex], $_ ); -f $asfile ? $asfile : File::Spec->catdir( $se +lf->[$index], $_ ); } @contents; # replace directory with its contents splice(@$self, $index, 1, @contents); } } 1;
complete with example use:
[me@host]$ cat `find d* -type f` | wc -l 58040 [me@host]$ perl -mr -lne '$x++; END{print $x}' d* 58040 [me@host]$
I guess now I should pod this up and make it my first contribution to cpan :-D
------------ :Wq Not an editor command: Wq

Comment on Re^4 Useful addition to Perl?
Select or Download Code
Re: Re^4 Useful addition to Perl?
by tsee (Curate) on Mar 06, 2004 at 10:59 UTC

    You really should because I'd be the first to download the module since I've been writing one-liners using File::Find far too often. As you all know, File::Find's interface sucks...

    If you need any support with packaging the module correctly for CPAN, feel free to contact me via email and I'll try to help.


Re: Re^4 Useful addition to Perl?
by BrowserUk (Pope) on Mar 06, 2004 at 11:12 UTC

    Why use (open|read|close)dir and then have to concern yourself with removing '.' & '..' and use File::Spec->catfile/catdir to fix up the pathnames, when glob would take care of (most?) of that for you?

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
Re: Re^4 Useful addition to Perl?
by demerphq (Chancellor) on Mar 06, 2004 at 13:53 UTC

    I think the only problem with all of this is that you arent using it as a wrapper to File::Find. Youve got a good idea here, but hand rolling a directory traversal is not in my opinion smart. Also the way that you do it worries me a touch. Its an interesting implementation of a depth first traveral, but surely its quite inefficient? Arent you repeatedly doing file system checks over the same objects?

    I think you should rewrite this as an alternate interface to File::Find. Which would get you better portability and whole host of hooks and options to add. Overall its a good idea though. And I go with calling it something long and giving it a flexible import() interface. For instance:

    use File::Find::ARGV filter=>sub { /\.txt/i }; while (<>){ ... }

    Anyway, its an interesting idea. ++ to you.


      First they ignore you, then they laugh at you, then they fight you, then you win.
      -- Gandhi

      Well, the problem, as I see it, with writing this as a wrapper for File::Find is that that would be suboptimal for the most important use case, and that is perl one-liners (-pe and -ne). Also, for that matter, what this does and what File::Find do really only partially overlap, in that they both traverse directories... but that's about the end of it.

      The ultimate intent of this is to DWIM when I say perl -mr -ne 'print if /foo/' *, and to not do anything silly in the process, like creating a list of every file on the file-system. Maybe I'm wrong, here, but I think that this is an important enough goal (both to do and to do well), that it outweighs the importance of reusing File::Find. Granted, I'm not saying that reuse shouldn't be involved... I sure as heck wouldn't want to reimplement File::Spec.

      Really, what it comes down to is that File::Find implements a "push" interface from the file-system... that is, File::Find pushes file names into your code (because you give it a code-ref as an entry-point for your code). The thing is, though, that perl -ne or perl -pe would need a "pull" interface. That is, they translate to while (<>) { ... }. Which, itself, is essentially:

      while (@ARGV) { $ARGV = shift @ARGV; open ARGV, $ARGV or warn("Couldn't open $ARGV: $!\n"), next; while (<ARGV>) { ... } }
      Now, to look at that code, you can see that it is definitely trying to pull filenames out of @ARGV... so the easiest way to implement an interface on that is to tie a behavior to reading from @ARGV... which is exactly what I've done.

      Now, it's true that I could make this pulling from @ARGV use File::Find as the behavior which underlies the read-event... but if I did that, then I'd end up reading in the whole file-system tree (or the whole sub-tree that is being accessed)... and if there's no good reason to do it that way, then I'd rather not. Granted, if File::Find offered a means to essentially say "depth => 1" (that is, give me all the contents of this directory, but don't traverse sub directories), then that might be worthwhile... as it would save the effort of opendir; readdir; closedir; grep; fix-file-names.... but that's just not what File::Find does. Moreover, I've never been happy with the fact that File::Find actually chdir's into the directory as it goes... that's just ugly. It should use File::Spec to prepend the leading path... but I digress.

      Anyway, I hope that explains why I didn't want to use File::Find for this. I did give it serious consideration... but ultimately, I think that the method I arrived at in the end is the best one that I considered. It is simple, elegant, efficient, and useful. And doing it with File::Find just couldn't make it be all of those at once.

      ------------ :Wq Not an editor command: Wq

        Trouble is that the code as you posted is liable to go into a infinite loop if the directory listed contains a symlink to itself or to one of its parents. Also there are other similar problems IMO with your code... The reason I advocate using File::Find is that it's already handled these issues, as well as the other lurking in your code. Your idea is great. But IMO, you should avoid reinventing File::Find and just use it.


          First they ignore you, then they laugh at you, then they fight you, then you win.
          -- Gandhi

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://334141]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2014-07-31 07:26 GMT
Find Nodes?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:

    Results (245 votes), past polls