Re^4 Useful addition to Perl?

in reply to Re: Re: Re: Useful addition to Perl?
in thread Useful addition to Perl?

OK... I bothered to finish it. Or at least get it to a working state (I don't really like just grepping out the "." and ".." directories... it feels so non-portable (even though I know it's cool on windows and *nix)).

package r;
use strict;

tie @ARGV, 'r::Tie::RecursiveARGVArray', @ARGV;

sub import { }

package r::Tie::RecursiveARGVArray;
use Tie::Array;
use base 'Tie::Array';
use File::Spec;

sub TIEARRAY {
        my ($classname,@init) = @_;
        bless [@init], $classname;
}

sub FETCH {
        my ($self, $index) = @_;
        $self->_ReplaceDirs($index,$index);
        $self->[$index];
}

sub FETCHSIZE {
        my ($self) = @_;
        scalar @$self;
}

sub STORE {
        my ($self, $index, $value) = @_;
        $self->[$index] = $value;
}

sub STORESIZE {
        my ($self, $count) = @_;
        $#$self = $count - 1;
}

sub SPLICE {
        my ($self,$offset,$length,@list) = @_;
        $self->_ReplaceDirs($offset,$offset+$length-1);
        splice(@$self,$offset,$length,@list);
}

sub POP {
        my ($self,$item) = @_;
        $self->_ReplaceDirs(-1,-1);
        pop(@$self);
}

sub _ReplaceDirs {
        my ($self, $fromindex, $toindex) = @_;

        # as long as the index range contains directories, substitute 
+the directory contents
        my $recursionguard = 0;
        while (my @indices = grep { -d $self->[$_] } ($fromindex..$toi
+ndex) and $recursionguard++ < 10000) {
                my $index = $indices[0];

                opendir DIR, $self->[$index] or do {
                        warn "Cannot traverse directory $self->[$index
+]: $!\n";
                        splice(@$self, $index, 1, ()); # remove the ba
+d-apple
                        next;
                };
                my @contents = readdir DIR or do {
                        warn "Cannot read directory $self->[$index]: $
+!\n";
                        splice(@$self, $index, 1, ()); # remove the ba
+d-apple
                        closedir DIR or warn "Cannot close directory $
+self->[$index] (weird): $!\n";
                        next;
                };
                closedir DIR or warn "Cannot close directory $self->[$
+index] (weird): $!\n";

                # if there is any portable way to do this... I'd like 
+to hear it!
                @contents = grep !/^\.{1,2}$/, @contents;

                # convert directory contents to paths by prepending th
+e directory.
                # even be super nice about using catfile or catdir, ap
+propriately
                @contents = map {
                        my $asfile = File::Spec->catfile( $self->[$ind
+ex], $_ );
                        -f $asfile ? $asfile : File::Spec->catdir( $se
+lf->[$index], $_ );
                } @contents;

                # replace directory with its contents
                splice(@$self, $index, 1, @contents);
        }
}

1;
[download]

complete with example use:

[me@host]$ cat `find d* -type f` | wc -l  
  58040
[me@host]$ perl -mr -lne '$x++; END{print $x}' d*
58040
[me@host]$
[download]

I guess now I should pod this up and make it my first contribution to cpan :-D

------------
:Wq
Not an editor command: Wq
[download]

Comment on Re^4 Useful addition to Perl? Select or Download Code

Replies are listed 'Best First'.
Re: Re^4 Useful addition to Perl? by demerphq (Chancellor) on Mar 06, 2004 at 13:53 UTC
I think the only problem with all of this is that you arent using it as a wrapper to File::Find. Youve got a good idea here, but hand rolling a directory traversal is not in my opinion smart. Also the way that you do it worries me a touch. Its an interesting implementation of a depth first traveral, but surely its quite inefficient? Arent you repeatedly doing file system checks over the same objects? I think you should rewrite this as an alternate interface to File::Find. Which would get you better portability and whole host of hooks and options to add. Overall its a good idea though. And I go with calling it something long and giving it a flexible import() interface. For instance: `use File::Find::ARGV filter=>sub { /\.txt/i }; while (<>){ ... }` [download] Anyway, its an interesting idea. ++ to you. --- demerphq _{First they ignore you, then they laugh at you, then they fight you, then you win. -- Gandhi}	[reply] [d/l] [select]
Re: Re: Re^4 Useful addition to Perl? by etcshadow (Priest) on Mar 07, 2004 at 21:47 UTC
Well, the problem, as I see it, with writing this as a wrapper for File::Find is that that would be suboptimal for the most important use case, and that is perl one-liners (-pe and -ne). Also, for that matter, what this does and what File::Find do really only partially overlap, in that they both traverse directories... but that's about the end of it. The ultimate intent of this is to DWIM when I say `perl -mr -ne 'print if /foo/' *`, and to not do anything silly in the process, like creating a list of every file on the file-system. Maybe I'm wrong, here, but I think that this is an important enough goal (both to do and to do well), that it outweighs the importance of reusing File::Find. Granted, I'm not saying that reuse shouldn't be involved... I sure as heck wouldn't want to reimplement File::Spec. Really, what it comes down to is that File::Find implements a "push" interface from the file-system... that is, File::Find pushes file names into your code (because you give it a code-ref as an entry-point for your code). The thing is, though, that `perl -ne` or `perl -pe` would need a "pull" interface. That is, they translate to `while (<>) { ... }`. Which, itself, is essentially: `while (@ARGV) { $ARGV = shift @ARGV; open ARGV, $ARGV or warn("Couldn't open $ARGV: $!\n"), next; while (<ARGV>) { ... } }` [download] Now, to look at that code, you can see that it is definitely trying to pull filenames out of @ARGV... so the easiest way to implement an interface on that is to tie a behavior to reading from @ARGV... which is exactly what I've done. Now, it's true that I could make this pulling from @ARGV use File::Find as the behavior which underlies the read-event... but if I did that, then I'd end up reading in the whole file-system tree (or the whole sub-tree that is being accessed)... and if there's no good reason to do it that way, then I'd rather not. Granted, if File::Find offered a means to essentially say "depth => 1" (that is, give me all the contents of this directory, but don't traverse sub directories), then that might be worthwhile... as it would save the effort of opendir; readdir; closedir; grep; fix-file-names.... but that's just not what File::Find does. Moreover, I've never been happy with the fact that File::Find actually chdir's into the directory as it goes... that's just ugly. It should use File::Spec to prepend the leading path... but I digress. Anyway, I hope that explains why I didn't want to use File::Find for this. I did give it serious consideration... but ultimately, I think that the method I arrived at in the end is the best one that I considered. It is simple, elegant, efficient, and useful. And doing it with File::Find just couldn't make it be all of those at once. `------------ :Wq Not an editor command: Wq` [download]	[reply] [d/l] [select]
Re: Re: Re: Re^4 Useful addition to Perl? by demerphq (Chancellor) on Mar 07, 2004 at 22:24 UTC
Trouble is that the code as you posted is liable to go into a infinite loop if the directory listed contains a symlink to itself or to one of its parents. Also there are other similar problems IMO with your code... The reason I advocate using File::Find is that it's already handled these issues, as well as the other lurking in your code. Your idea is great. But IMO, you should avoid reinventing File::Find and just use it. --- demerphq _{First they ignore you, then they laugh at you, then they fight you, then you win. -- Gandhi}	[reply] [d/l]
Re: Re: Re: Re: Re^4 Useful addition to Perl? by etcshadow (Priest) on Mar 08, 2004 at 06:32 UTC
Re: Re: Re: Re: Re: Re^4 Useful addition to Perl? by demerphq (Chancellor) on Mar 08, 2004 at 11:23 UTC
Re: Re^4 Useful addition to Perl? by BrowserUk (Patriarch) on Mar 06, 2004 at 11:12 UTC
Why use (open\|read\|close)dir and then have to concern yourself with removing '.' & '..' and use File::Spec->catfile/catdir to fix up the pathnames, when glob would take care of (most?) of that for you? Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail	[reply] [d/l]
Re: Re^4 Useful addition to Perl? by tsee (Curate) on Mar 06, 2004 at 10:59 UTC
You really should because I'd be the first to download the module since I've been writing one-liners using File::Find far too often. As you all know, File::Find's interface sucks... If you need any support with packaging the module correctly for CPAN, feel free to contact me via email and I'll try to help. Steffen	[reply]

In Section Meditations