Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

Re: Re^4 Useful addition to Perl?

by demerphq (Chancellor)
on Mar 06, 2004 at 13:53 UTC ( #334491=note: print w/replies, xml ) Need Help??

in reply to Re^4 Useful addition to Perl?
in thread Useful addition to Perl?

I think the only problem with all of this is that you arent using it as a wrapper to File::Find. Youve got a good idea here, but hand rolling a directory traversal is not in my opinion smart. Also the way that you do it worries me a touch. Its an interesting implementation of a depth first traveral, but surely its quite inefficient? Arent you repeatedly doing file system checks over the same objects?

I think you should rewrite this as an alternate interface to File::Find. Which would get you better portability and whole host of hooks and options to add. Overall its a good idea though. And I go with calling it something long and giving it a flexible import() interface. For instance:

use File::Find::ARGV filter=>sub { /\.txt/i }; while (<>){ ... }

Anyway, its an interesting idea. ++ to you.


    First they ignore you, then they laugh at you, then they fight you, then you win.
    -- Gandhi

Replies are listed 'Best First'.
Re: Re: Re^4 Useful addition to Perl?
by etcshadow (Priest) on Mar 07, 2004 at 21:47 UTC
    Well, the problem, as I see it, with writing this as a wrapper for File::Find is that that would be suboptimal for the most important use case, and that is perl one-liners (-pe and -ne). Also, for that matter, what this does and what File::Find do really only partially overlap, in that they both traverse directories... but that's about the end of it.

    The ultimate intent of this is to DWIM when I say perl -mr -ne 'print if /foo/' *, and to not do anything silly in the process, like creating a list of every file on the file-system. Maybe I'm wrong, here, but I think that this is an important enough goal (both to do and to do well), that it outweighs the importance of reusing File::Find. Granted, I'm not saying that reuse shouldn't be involved... I sure as heck wouldn't want to reimplement File::Spec.

    Really, what it comes down to is that File::Find implements a "push" interface from the file-system... that is, File::Find pushes file names into your code (because you give it a code-ref as an entry-point for your code). The thing is, though, that perl -ne or perl -pe would need a "pull" interface. That is, they translate to while (<>) { ... }. Which, itself, is essentially:

    while (@ARGV) { $ARGV = shift @ARGV; open ARGV, $ARGV or warn("Couldn't open $ARGV: $!\n"), next; while (<ARGV>) { ... } }
    Now, to look at that code, you can see that it is definitely trying to pull filenames out of @ARGV... so the easiest way to implement an interface on that is to tie a behavior to reading from @ARGV... which is exactly what I've done.

    Now, it's true that I could make this pulling from @ARGV use File::Find as the behavior which underlies the read-event... but if I did that, then I'd end up reading in the whole file-system tree (or the whole sub-tree that is being accessed)... and if there's no good reason to do it that way, then I'd rather not. Granted, if File::Find offered a means to essentially say "depth => 1" (that is, give me all the contents of this directory, but don't traverse sub directories), then that might be worthwhile... as it would save the effort of opendir; readdir; closedir; grep; fix-file-names.... but that's just not what File::Find does. Moreover, I've never been happy with the fact that File::Find actually chdir's into the directory as it goes... that's just ugly. It should use File::Spec to prepend the leading path... but I digress.

    Anyway, I hope that explains why I didn't want to use File::Find for this. I did give it serious consideration... but ultimately, I think that the method I arrived at in the end is the best one that I considered. It is simple, elegant, efficient, and useful. And doing it with File::Find just couldn't make it be all of those at once.

    ------------ :Wq Not an editor command: Wq

      Trouble is that the code as you posted is liable to go into a infinite loop if the directory listed contains a symlink to itself or to one of its parents. Also there are other similar problems IMO with your code... The reason I advocate using File::Find is that it's already handled these issues, as well as the other lurking in your code. Your idea is great. But IMO, you should avoid reinventing File::Find and just use it.


        First they ignore you, then they laugh at you, then they fight you, then you win.
        -- Gandhi

        Fair criticisms. The particular point that the posted code doesn't handle symlinks is valid, and will be fixed before I submit this. The meta-point that there will be other issues such as this that will inevitably come up, and that this will create wasted effort as similar parallel issues are fixed over time in File::Find is also very true.

        The issue, though, from my standpoint, is that File::Find doesn't offer any means (at least as far as I can think of... please tell me if I'm wrong) to turn it's use inside-out... that is, if you will, to ask File::Find for a file, rather than be told by File::Find that there is a file.

        Ideally, I'd be able to use File::Find::Iterator, but to look at it, it also doesn't reuse File::Find... it's just yet another implementation of directory-tree traversal, and so using it would bring in all the same issues of duplicating work of File::Find (not me duplicating the work, but the mantainers of File::Find::Iterator). Also, File::Find::Iterator appears to be not very complete (version .3), and not file-system-portable (it assumes a directory-separator, and it, also, doesn't handle symlinks).

        In a truly ideal world, File::Find would offer some kind of interface that allowed it operate in this manner, but I just don't see it / can't think of how to do it. Sadly, it would be easy to build File::Find's interface out of a thing wrapper around File::Find::Iterator's, but not vice-versa, and File::Find is the one that (currently) works :-(

        So, I suppose I can reframe the question as: is there a way to build an iterator-like interface out of an event-generator interface? Even doing ugly stuff with goto's, I can't think of how to get past the fact that I'd have to be leapfroging backward and forward over a couple of stack-frames (or, more precisely, saving a couple of stack-frames off to the side, and then deleting them... then restoring them back later).

        What would make me really want to use File::Find for this is if the maintainers of File::Find decided to flip things around a little bit so that File::Find was just a thin wrapper around an underlying iterator class that did the real work... and then I could piggy-back off of that same underlying iterator.

        I don't mean to sound closed-minded... I'm not. I'm just trying to figure out a solution to a problem with certain constraints, and to the best I've been able to figure out so far, File::Find won't work within those constraints. I would actually love to figure out how to use File::Find for this, but to the best of my knowledge, it won't. I'd love to hear suggestions about how to fit File::Find into this problem, without violating the two primary constraints that:

        • @ARGV not be blown up to include the entire file tree
        • perl -ne '...' (and any similar looping over <>) works in a completely DWIM fashion.
        Thanks for any ideas.
        ------------ :Wq Not an editor command: Wq

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://334491]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2020-09-28 06:07 GMT
Find Nodes?
    Voting Booth?
    If at first I donít succeed, I Ö

    Results (143 votes). Check out past polls.