Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re: Re: Re: Useful addition to Perl?

by etcshadow (Priest)
on Mar 04, 2004 at 23:09 UTC ( #334068=note: print w/replies, xml ) Need Help??

in reply to Re: Re: Useful addition to Perl?
in thread Useful addition to Perl?

Actually, I started this, myself, a while ago... only (in order to avoid certain baddnesses of blowing up @ARGV to impossibly stupidly large proportions) it went a little more like this:
package r; use strict; use File::Spec; tie @ARGV, 'r::Tie::RecursiveARGVArray', @ARGV; sub import { } package r::Tie::RecursiveARGVArray; use Tie::Array; use base 'Tie::StdArray'; sub TIEARRAY { my ($classname,@init) = @_; bless [@init], $classname; } sub FETCH { # magic here to explode directory contents if -d } # etc
So that @ARGV didn't actually get enormous... it just added items to the front as while (<>) { implicitly unshift'd stuff off it.

You can tell by the way that it starts that, actually,

perl -mr -e ...
was sufficient (who's got time for the shift key, anyway?). Too bad I never finished... coulda been a neat CPAN contribution... oh, well. Maybe someday, if no one runs off from reading these posts and implements it before I have time to finish it.
------------ :Wq Not an editor command: Wq

Replies are listed 'Best First'.
Re^4 Useful addition to Perl?
by etcshadow (Priest) on Mar 05, 2004 at 05:39 UTC
    OK... I bothered to finish it. Or at least get it to a working state (I don't really like just grepping out the "." and ".." directories... it feels so non-portable (even though I know it's cool on windows and *nix)).
    package r; use strict; tie @ARGV, 'r::Tie::RecursiveARGVArray', @ARGV; sub import { } package r::Tie::RecursiveARGVArray; use Tie::Array; use base 'Tie::Array'; use File::Spec; sub TIEARRAY { my ($classname,@init) = @_; bless [@init], $classname; } sub FETCH { my ($self, $index) = @_; $self->_ReplaceDirs($index,$index); $self->[$index]; } sub FETCHSIZE { my ($self) = @_; scalar @$self; } sub STORE { my ($self, $index, $value) = @_; $self->[$index] = $value; } sub STORESIZE { my ($self, $count) = @_; $#$self = $count - 1; } sub SPLICE { my ($self,$offset,$length,@list) = @_; $self->_ReplaceDirs($offset,$offset+$length-1); splice(@$self,$offset,$length,@list); } sub POP { my ($self,$item) = @_; $self->_ReplaceDirs(-1,-1); pop(@$self); } sub _ReplaceDirs { my ($self, $fromindex, $toindex) = @_; # as long as the index range contains directories, substitute +the directory contents my $recursionguard = 0; while (my @indices = grep { -d $self->[$_] } ($fromindex..$toi +ndex) and $recursionguard++ < 10000) { my $index = $indices[0]; opendir DIR, $self->[$index] or do { warn "Cannot traverse directory $self->[$index +]: $!\n"; splice(@$self, $index, 1, ()); # remove the ba +d-apple next; }; my @contents = readdir DIR or do { warn "Cannot read directory $self->[$index]: $ +!\n"; splice(@$self, $index, 1, ()); # remove the ba +d-apple closedir DIR or warn "Cannot close directory $ +self->[$index] (weird): $!\n"; next; }; closedir DIR or warn "Cannot close directory $self->[$ +index] (weird): $!\n"; # if there is any portable way to do this... I'd like +to hear it! @contents = grep !/^\.{1,2}$/, @contents; # convert directory contents to paths by prepending th +e directory. # even be super nice about using catfile or catdir, ap +propriately @contents = map { my $asfile = File::Spec->catfile( $self->[$ind +ex], $_ ); -f $asfile ? $asfile : File::Spec->catdir( $se +lf->[$index], $_ ); } @contents; # replace directory with its contents splice(@$self, $index, 1, @contents); } } 1;
    complete with example use:
    [me@host]$ cat `find d* -type f` | wc -l 58040 [me@host]$ perl -mr -lne '$x++; END{print $x}' d* 58040 [me@host]$
    I guess now I should pod this up and make it my first contribution to cpan :-D
    ------------ :Wq Not an editor command: Wq

      I think the only problem with all of this is that you arent using it as a wrapper to File::Find. Youve got a good idea here, but hand rolling a directory traversal is not in my opinion smart. Also the way that you do it worries me a touch. Its an interesting implementation of a depth first traveral, but surely its quite inefficient? Arent you repeatedly doing file system checks over the same objects?

      I think you should rewrite this as an alternate interface to File::Find. Which would get you better portability and whole host of hooks and options to add. Overall its a good idea though. And I go with calling it something long and giving it a flexible import() interface. For instance:

      use File::Find::ARGV filter=>sub { /\.txt/i }; while (<>){ ... }

      Anyway, its an interesting idea. ++ to you.


        First they ignore you, then they laugh at you, then they fight you, then you win.
        -- Gandhi

        Well, the problem, as I see it, with writing this as a wrapper for File::Find is that that would be suboptimal for the most important use case, and that is perl one-liners (-pe and -ne). Also, for that matter, what this does and what File::Find do really only partially overlap, in that they both traverse directories... but that's about the end of it.

        The ultimate intent of this is to DWIM when I say perl -mr -ne 'print if /foo/' *, and to not do anything silly in the process, like creating a list of every file on the file-system. Maybe I'm wrong, here, but I think that this is an important enough goal (both to do and to do well), that it outweighs the importance of reusing File::Find. Granted, I'm not saying that reuse shouldn't be involved... I sure as heck wouldn't want to reimplement File::Spec.

        Really, what it comes down to is that File::Find implements a "push" interface from the file-system... that is, File::Find pushes file names into your code (because you give it a code-ref as an entry-point for your code). The thing is, though, that perl -ne or perl -pe would need a "pull" interface. That is, they translate to while (<>) { ... }. Which, itself, is essentially:

        while (@ARGV) { $ARGV = shift @ARGV; open ARGV, $ARGV or warn("Couldn't open $ARGV: $!\n"), next; while (<ARGV>) { ... } }
        Now, to look at that code, you can see that it is definitely trying to pull filenames out of @ARGV... so the easiest way to implement an interface on that is to tie a behavior to reading from @ARGV... which is exactly what I've done.

        Now, it's true that I could make this pulling from @ARGV use File::Find as the behavior which underlies the read-event... but if I did that, then I'd end up reading in the whole file-system tree (or the whole sub-tree that is being accessed)... and if there's no good reason to do it that way, then I'd rather not. Granted, if File::Find offered a means to essentially say "depth => 1" (that is, give me all the contents of this directory, but don't traverse sub directories), then that might be worthwhile... as it would save the effort of opendir; readdir; closedir; grep; fix-file-names.... but that's just not what File::Find does. Moreover, I've never been happy with the fact that File::Find actually chdir's into the directory as it goes... that's just ugly. It should use File::Spec to prepend the leading path... but I digress.

        Anyway, I hope that explains why I didn't want to use File::Find for this. I did give it serious consideration... but ultimately, I think that the method I arrived at in the end is the best one that I considered. It is simple, elegant, efficient, and useful. And doing it with File::Find just couldn't make it be all of those at once.

        ------------ :Wq Not an editor command: Wq

      Why use (open|read|close)dir and then have to concern yourself with removing '.' & '..' and use File::Spec->catfile/catdir to fix up the pathnames, when glob would take care of (most?) of that for you?

      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail

      You really should because I'd be the first to download the module since I've been writing one-liners using File::Find far too often. As you all know, File::Find's interface sucks...

      If you need any support with packaging the module correctly for CPAN, feel free to contact me via email and I'll try to help.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://334068]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (8)
As of 2020-09-30 08:42 GMT
Find Nodes?
    Voting Booth?
    If at first I donít succeed, I Ö

    Results (160 votes). Check out past polls.