in reply to Re: Re: Useful addition to Perl? in thread Useful addition to Perl?
Actually, I started this, myself, a while ago... only (in order to avoid certain baddnesses of blowing up @ARGV to impossibly stupidly large proportions) it went a little more like this:
package r;
use strict;
use File::Spec;
tie @ARGV, 'r::Tie::RecursiveARGVArray', @ARGV;
sub import {
}
package r::Tie::RecursiveARGVArray;
use Tie::Array;
use base 'Tie::StdArray';
sub TIEARRAY {
my ($classname,@init) = @_;
bless [@init], $classname;
}
sub FETCH {
# magic here to explode directory contents if -d
}
# etc
So that @ARGV didn't actually get enormous... it just added items to the front as while (<>) { implicitly unshift'd stuff off it.
You can tell by the way that it starts that, actually,
perl -mr -e ...
was sufficient (who's got time for the shift key, anyway?). Too bad I never finished... coulda been a neat CPAN contribution... oh, well. Maybe someday, if no one runs off from reading these posts and implements it before I have time to finish it.
------------
:Wq
Not an editor command: Wq
Re^4 Useful addition to Perl?
by etcshadow (Priest) on Mar 05, 2004 at 05:39 UTC
|
OK... I bothered to finish it. Or at least get it to a working state (I don't really like just grepping out the "." and ".." directories... it feels so non-portable (even though I know it's cool on windows and *nix)).
package r;
use strict;
tie @ARGV, 'r::Tie::RecursiveARGVArray', @ARGV;
sub import { }
package r::Tie::RecursiveARGVArray;
use Tie::Array;
use base 'Tie::Array';
use File::Spec;
sub TIEARRAY {
my ($classname,@init) = @_;
bless [@init], $classname;
}
sub FETCH {
my ($self, $index) = @_;
$self->_ReplaceDirs($index,$index);
$self->[$index];
}
sub FETCHSIZE {
my ($self) = @_;
scalar @$self;
}
sub STORE {
my ($self, $index, $value) = @_;
$self->[$index] = $value;
}
sub STORESIZE {
my ($self, $count) = @_;
$#$self = $count - 1;
}
sub SPLICE {
my ($self,$offset,$length,@list) = @_;
$self->_ReplaceDirs($offset,$offset+$length-1);
splice(@$self,$offset,$length,@list);
}
sub POP {
my ($self,$item) = @_;
$self->_ReplaceDirs(-1,-1);
pop(@$self);
}
sub _ReplaceDirs {
my ($self, $fromindex, $toindex) = @_;
# as long as the index range contains directories, substitute
+the directory contents
my $recursionguard = 0;
while (my @indices = grep { -d $self->[$_] } ($fromindex..$toi
+ndex) and $recursionguard++ < 10000) {
my $index = $indices[0];
opendir DIR, $self->[$index] or do {
warn "Cannot traverse directory $self->[$index
+]: $!\n";
splice(@$self, $index, 1, ()); # remove the ba
+d-apple
next;
};
my @contents = readdir DIR or do {
warn "Cannot read directory $self->[$index]: $
+!\n";
splice(@$self, $index, 1, ()); # remove the ba
+d-apple
closedir DIR or warn "Cannot close directory $
+self->[$index] (weird): $!\n";
next;
};
closedir DIR or warn "Cannot close directory $self->[$
+index] (weird): $!\n";
# if there is any portable way to do this... I'd like
+to hear it!
@contents = grep !/^\.{1,2}$/, @contents;
# convert directory contents to paths by prepending th
+e directory.
# even be super nice about using catfile or catdir, ap
+propriately
@contents = map {
my $asfile = File::Spec->catfile( $self->[$ind
+ex], $_ );
-f $asfile ? $asfile : File::Spec->catdir( $se
+lf->[$index], $_ );
} @contents;
# replace directory with its contents
splice(@$self, $index, 1, @contents);
}
}
1;
complete with example use:
[me@host]$ cat `find d* -type f` | wc -l
58040
[me@host]$ perl -mr -lne '$x++; END{print $x}' d*
58040
[me@host]$
I guess now I should pod this up and make it my first contribution to cpan :-D
------------
:Wq
Not an editor command: Wq
| [reply] [d/l] [select] |
|
I think the only problem with all of this is that you arent using it as a wrapper to File::Find. Youve got a good idea here, but hand rolling a directory traversal is not in my opinion smart. Also the way that you do it worries me a touch. Its an interesting implementation of a depth first traveral, but surely its quite inefficient? Arent you repeatedly doing file system checks over the same objects?
I think you should rewrite this as an alternate interface to File::Find. Which would get you better portability and whole host of hooks and options to add. Overall its a good idea though. And I go with calling it something long and giving it a flexible import() interface. For instance:
use File::Find::ARGV filter=>sub { /\.txt/i };
while (<>){
...
}
Anyway, its an interesting idea. ++ to you.
---
demerphq
First they ignore you, then they laugh at you, then they fight you, then you win.
-- Gandhi
| [reply] [d/l] [select] |
|
Well, the problem, as I see it, with writing this as a wrapper for File::Find is that that would be suboptimal for the most important use case, and that is perl one-liners (-pe and -ne). Also, for that matter, what this does and what File::Find do really only partially overlap, in that they both traverse directories... but that's about the end of it.
The ultimate intent of this is to DWIM when I say perl -mr -ne 'print if /foo/' *, and to not do anything silly in the process, like creating a list of every file on the file-system. Maybe I'm wrong, here, but I think that this is an important enough goal (both to do and to do well), that it outweighs the importance of reusing File::Find. Granted, I'm not saying that reuse shouldn't be involved... I sure as heck wouldn't want to reimplement File::Spec.
Really, what it comes down to is that File::Find implements a "push" interface from the file-system... that is, File::Find pushes file names into your code (because you give it a code-ref as an entry-point for your code). The thing is, though, that perl -ne or perl -pe would need a "pull" interface. That is, they translate to while (<>) { ... }. Which, itself, is essentially:
while (@ARGV) {
$ARGV = shift @ARGV;
open ARGV, $ARGV or warn("Couldn't open $ARGV: $!\n"), next;
while (<ARGV>) {
...
}
}
Now, to look at that code, you can see that it is definitely trying to pull filenames out of @ARGV... so the easiest way to implement an interface on that is to tie a behavior to reading from @ARGV... which is exactly what I've done.
Now, it's true that I could make this pulling from @ARGV use File::Find as the behavior which underlies the read-event... but if I did that, then I'd end up reading in the whole file-system tree (or the whole sub-tree that is being accessed)... and if there's no good reason to do it that way, then I'd rather not. Granted, if File::Find offered a means to essentially say "depth => 1" (that is, give me all the contents of this directory, but don't traverse sub directories), then that might be worthwhile... as it would save the effort of opendir; readdir; closedir; grep; fix-file-names.... but that's just not what File::Find does. Moreover, I've never been happy with the fact that File::Find actually chdir's into the directory as it goes... that's just ugly. It should use File::Spec to prepend the leading path... but I digress.
Anyway, I hope that explains why I didn't want to use File::Find for this. I did give it serious consideration... but ultimately, I think that the method I arrived at in the end is the best one that I considered. It is simple, elegant, efficient, and useful. And doing it with File::Find just couldn't make it be all of those at once.
------------
:Wq
Not an editor command: Wq
| [reply] [d/l] [select] |
|
|
|
|
| [reply] [d/l] |
|
You really should because I'd be the first to download the module since I've been writing one-liners using File::Find far too often. As you all know, File::Find's interface sucks...
If you need any support with packaging the module correctly for CPAN, feel free to contact me via email and I'll try to help.
Steffen
| [reply] |
|
|