Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Re: Re: Re: Useful addition to Perl?

by etcshadow (Priest)
on Mar 04, 2004 at 23:09 UTC ( #334068=note: print w/ replies, xml ) Need Help??

in reply to Re: Re: Useful addition to Perl?
in thread Useful addition to Perl?

Actually, I started this, myself, a while ago... only (in order to avoid certain baddnesses of blowing up @ARGV to impossibly stupidly large proportions) it went a little more like this:

package r; use strict; use File::Spec; tie @ARGV, 'r::Tie::RecursiveARGVArray', @ARGV; sub import { } package r::Tie::RecursiveARGVArray; use Tie::Array; use base 'Tie::StdArray'; sub TIEARRAY { my ($classname,@init) = @_; bless [@init], $classname; } sub FETCH { # magic here to explode directory contents if -d } # etc
So that @ARGV didn't actually get enormous... it just added items to the front as while (<>) { implicitly unshift'd stuff off it.

You can tell by the way that it starts that, actually,

perl -mr -e ...
was sufficient (who's got time for the shift key, anyway?). Too bad I never finished... coulda been a neat CPAN contribution... oh, well. Maybe someday, if no one runs off from reading these posts and implements it before I have time to finish it.
------------ :Wq Not an editor command: Wq

Comment on Re: Re: Re: Useful addition to Perl?
Select or Download Code
Re^4 Useful addition to Perl?
by etcshadow (Priest) on Mar 05, 2004 at 05:39 UTC
    OK... I bothered to finish it. Or at least get it to a working state (I don't really like just grepping out the "." and ".." directories... it feels so non-portable (even though I know it's cool on windows and *nix)).
    package r; use strict; tie @ARGV, 'r::Tie::RecursiveARGVArray', @ARGV; sub import { } package r::Tie::RecursiveARGVArray; use Tie::Array; use base 'Tie::Array'; use File::Spec; sub TIEARRAY { my ($classname,@init) = @_; bless [@init], $classname; } sub FETCH { my ($self, $index) = @_; $self->_ReplaceDirs($index,$index); $self->[$index]; } sub FETCHSIZE { my ($self) = @_; scalar @$self; } sub STORE { my ($self, $index, $value) = @_; $self->[$index] = $value; } sub STORESIZE { my ($self, $count) = @_; $#$self = $count - 1; } sub SPLICE { my ($self,$offset,$length,@list) = @_; $self->_ReplaceDirs($offset,$offset+$length-1); splice(@$self,$offset,$length,@list); } sub POP { my ($self,$item) = @_; $self->_ReplaceDirs(-1,-1); pop(@$self); } sub _ReplaceDirs { my ($self, $fromindex, $toindex) = @_; # as long as the index range contains directories, substitute +the directory contents my $recursionguard = 0; while (my @indices = grep { -d $self->[$_] } ($fromindex..$toi +ndex) and $recursionguard++ < 10000) { my $index = $indices[0]; opendir DIR, $self->[$index] or do { warn "Cannot traverse directory $self->[$index +]: $!\n"; splice(@$self, $index, 1, ()); # remove the ba +d-apple next; }; my @contents = readdir DIR or do { warn "Cannot read directory $self->[$index]: $ +!\n"; splice(@$self, $index, 1, ()); # remove the ba +d-apple closedir DIR or warn "Cannot close directory $ +self->[$index] (weird): $!\n"; next; }; closedir DIR or warn "Cannot close directory $self->[$ +index] (weird): $!\n"; # if there is any portable way to do this... I'd like +to hear it! @contents = grep !/^\.{1,2}$/, @contents; # convert directory contents to paths by prepending th +e directory. # even be super nice about using catfile or catdir, ap +propriately @contents = map { my $asfile = File::Spec->catfile( $self->[$ind +ex], $_ ); -f $asfile ? $asfile : File::Spec->catdir( $se +lf->[$index], $_ ); } @contents; # replace directory with its contents splice(@$self, $index, 1, @contents); } } 1;
    complete with example use:
    [me@host]$ cat `find d* -type f` | wc -l 58040 [me@host]$ perl -mr -lne '$x++; END{print $x}' d* 58040 [me@host]$
    I guess now I should pod this up and make it my first contribution to cpan :-D
    ------------ :Wq Not an editor command: Wq

      You really should because I'd be the first to download the module since I've been writing one-liners using File::Find far too often. As you all know, File::Find's interface sucks...

      If you need any support with packaging the module correctly for CPAN, feel free to contact me via email and I'll try to help.


      Why use (open|read|close)dir and then have to concern yourself with removing '.' & '..' and use File::Spec->catfile/catdir to fix up the pathnames, when glob would take care of (most?) of that for you?

      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail

      I think the only problem with all of this is that you arent using it as a wrapper to File::Find. Youve got a good idea here, but hand rolling a directory traversal is not in my opinion smart. Also the way that you do it worries me a touch. Its an interesting implementation of a depth first traveral, but surely its quite inefficient? Arent you repeatedly doing file system checks over the same objects?

      I think you should rewrite this as an alternate interface to File::Find. Which would get you better portability and whole host of hooks and options to add. Overall its a good idea though. And I go with calling it something long and giving it a flexible import() interface. For instance:

      use File::Find::ARGV filter=>sub { /\.txt/i }; while (<>){ ... }

      Anyway, its an interesting idea. ++ to you.


        First they ignore you, then they laugh at you, then they fight you, then you win.
        -- Gandhi

        Well, the problem, as I see it, with writing this as a wrapper for File::Find is that that would be suboptimal for the most important use case, and that is perl one-liners (-pe and -ne). Also, for that matter, what this does and what File::Find do really only partially overlap, in that they both traverse directories... but that's about the end of it.

        The ultimate intent of this is to DWIM when I say perl -mr -ne 'print if /foo/' *, and to not do anything silly in the process, like creating a list of every file on the file-system. Maybe I'm wrong, here, but I think that this is an important enough goal (both to do and to do well), that it outweighs the importance of reusing File::Find. Granted, I'm not saying that reuse shouldn't be involved... I sure as heck wouldn't want to reimplement File::Spec.

        Really, what it comes down to is that File::Find implements a "push" interface from the file-system... that is, File::Find pushes file names into your code (because you give it a code-ref as an entry-point for your code). The thing is, though, that perl -ne or perl -pe would need a "pull" interface. That is, they translate to while (<>) { ... }. Which, itself, is essentially:

        while (@ARGV) { $ARGV = shift @ARGV; open ARGV, $ARGV or warn("Couldn't open $ARGV: $!\n"), next; while (<ARGV>) { ... } }
        Now, to look at that code, you can see that it is definitely trying to pull filenames out of @ARGV... so the easiest way to implement an interface on that is to tie a behavior to reading from @ARGV... which is exactly what I've done.

        Now, it's true that I could make this pulling from @ARGV use File::Find as the behavior which underlies the read-event... but if I did that, then I'd end up reading in the whole file-system tree (or the whole sub-tree that is being accessed)... and if there's no good reason to do it that way, then I'd rather not. Granted, if File::Find offered a means to essentially say "depth => 1" (that is, give me all the contents of this directory, but don't traverse sub directories), then that might be worthwhile... as it would save the effort of opendir; readdir; closedir; grep; fix-file-names.... but that's just not what File::Find does. Moreover, I've never been happy with the fact that File::Find actually chdir's into the directory as it goes... that's just ugly. It should use File::Spec to prepend the leading path... but I digress.

        Anyway, I hope that explains why I didn't want to use File::Find for this. I did give it serious consideration... but ultimately, I think that the method I arrived at in the end is the best one that I considered. It is simple, elegant, efficient, and useful. And doing it with File::Find just couldn't make it be all of those at once.

        ------------ :Wq Not an editor command: Wq

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://334068]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (11)
As of 2015-07-06 22:18 GMT
Find Nodes?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...

    Results (83 votes), past polls