Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Seeking your opinions on proposed patch to File::Util

by Tommy (Chaplain)
on Sep 29, 2012 at 23:36 UTC ( #996437=perlquestion: print w/replies, xml ) Need Help??
Tommy has asked for the wisdom of the Perl Monks concerning the following question:

Perl 5.17 was released and "broke" my CPAN module File::Util. A patch was submitted through which I applied and will be uploading shortly, but this got things moving on some other bug reports that I've left alone for far too long... the issues weren't critical and I hadn't enough time I suppose. Either way, I'm recommitting myself to the maintenance and betterment of this module and that translates to resolving all the bugs.

This brings me to the question at hand. In CPAN bug report number 64775, someone reported that a certain behavior he/she expected was not being enforced by File::Util. I've looked at a patch that was submitted, but I wanted to ask the monastery what they think about the matter before I make a change that could have potentially far-reaching effects on lots of code out there which is using File::Util (a stable module since 2007).

The full details regarding the patch are visible here:

A summary of the proposed patch, and a further proposal by myself is as follows: File::Util::list_dir() provides a means to list directory contents, and it accepts a regex pattern against which only files with names satisfying a successful match are returned. This works. Mostly. The problem identified in the original bug report is that the pattern isn't enforced on subdirectory contents when the list_dir() method is asked to recurse through subdirectories. Obviously this was an oversight on my part.

The proposed patch attempts to fix this by providing the user with means to supply patterns in a different and potentially ambiguous manner. The logic behind the necessity of the patch is sound. However I'm not wholly confident that the patch offers a complete solution to the problem.

In response to the bug report I included correspondence to submitter of the patch expressing my thoughts, which are open to the public view at the link I provided above. I respectfully ask for anyone who has interest in this to please take a look at the link to the bug report and my proposed solution, and kindly provide your own opinions.

Your opinions are warmly appreciated!

$ perl -MMIME::Base64 -e 'print decode_base64 "YWNlQHRvbW15YnV0bGVyLm1lCg=="'

Replies are listed 'Best First'.
Re: Seeking your opinions on proposed patch to File::Util
by BrowserUk (Pope) on Sep 30, 2012 at 00:16 UTC

    First pass opinion: When you have both --file-pattern and --dir-pattern, --pattern is redundant.

    So, make --pattern & ( --file-pattern &| --dir-pattern ) mutually exclusive.

    Existing code using --pattern continues to work as is.

    New code can choose to stick with the old -- subject to the caveat of the known bug -- or switch to using the new mechanism.

    Existing code continues to work; new code can choose to works better.

    Potentially; pre-deprecate --pattern. That is, warn that it will(may) go away at some point in the future.

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong


      First thanks for your excellent work. I'm using it regularly.

      I personally encountered the 'bug' and I worked around it by performing a list without a pattern and then grepping the whole list myself. Will you break my code with changing the behaviour of --pattern? I think not, but I have some testing to do.

      Some other thoughts about the above proposal: I see a small issue with the above proposition of --file-pattern and --dir-pattern.

      In list dir, you already have the switches --dirs-only and --files-only. This makes the above switches a bit confusing. I suppose the combination of the switches --files-only --dir-pattern will give me all files in the matching directories. But do I get also the files of the root dir where the search was started? Because this directory isn't matched against the --dir-pattern. And I honestly don't known what the default expectation will be.

      So, either document the switches really well and watch out for unexpected behaviour like this borderline case above, or keep it simple. And I personally like simple best. So for me, you can change the --pattern behaviour and even leave the out --file-pattern && --dir-pattern. Those switches aren't crucial. Keep focussing on cross OS functionality of your toolbox, because that's the reason why I'm using your module. And if really needed, it is a simple exercise to perform the filtering ourselves.

      My humble 2 cents


        Having evaluated all the options on the table now for several hours, I am leaning toward the more 'simple' approach you mentioned. My rationale is that I don't want to complicate things, and martel I believe you are right that the proposed new switches would make things confusing given the already existent --dirs-only and --files-only switches. With any extra switches, it's anyone's guess what the expected behavior would be from end users.

        So for now I am going to plan on throwing a warning out when list_dir() is used with the --pattern switch, for at least one release (ver 3.29), expressing to the user what new behavior to expect with the following release (ver 3.30+). I don't see too much harm in this, given that if users out there are already grepping through the results returned from list_dir() against a pattern they already provided to the --pattern switch, it's not going to upset the output they are currently getting. The end-result output would be the same.

        If I did go with the other option (to add more switches), then browserUK's suggestion would be the correct approach. For the present however, I'm not leaning in that direction so much as I once was, given that I feel more confident I can start enforcing the originally intended behavior without disrupting the downstream results in current user code (in what I believe will be the majority of cases).


        $ perl -MMIME::Base64 -e 'print decode_base64 "YWNlQHRvbW15YnV0bGVyLm1lCg=="'
      Love these opinions! Keep 'em coming!
      $ perl -MMIME::Base64 -e 'print decode_base64 "YWNlQHRvbW15YnV0bGVyLm1lCg=="'
Re: Seeking your opinions on proposed patch to File::Util
by toolic (Bishop) on Sep 30, 2012 at 00:23 UTC
    Since you are mainly concerned about not breaking existing code, another possibility is to leave the list_dir behavior as-is and create a new function with a different name, such as list_dir_recurse. The new function will apply the pattern to all files as originally intended.

    Make sure to add plenty of explanations in the POD to distinguish the new function from the old one.

Re: Seeking your opinions on proposed patch to File::Util
by pvaldes (Chaplain) on Sep 30, 2012 at 23:47 UTC


    my @recurse_ignored = $f->list_dir('/dir/', '--recurse','--pattern=\.t +xt$'); my @pattern_and_recurse_ignored = $f->list_dir('/dir/', '--pattern=\.t +xt$ --recurse'); my @no_files = $f->list_dir('/dir/', qw/--pattern=\.txt$ --recurse /);

    I would suggest a different notation instead, to use bash "--longname-option" notation inside perl looks ugly (in my opinion)

    my @array = $f->list_dir('/home/moo/','--pattern=\.txt$', '--recurse', '--files-only');

    I propose this form instead for the same line:

    my @array = $f->ls(files, '/home/moo/','\.txt$', 0);

    Much easier to write, read and understand, isn't?

    list_dir is not a bad name at all, but if you want to obtain a list of files, to use a function named list_dir can sound a little strange

    the third argument optional (an integer, or even a range like 1..3) could be equivalent to find -maxdepth num. Recurse should be the default probably, cause you don't need to load a full perl module to do the same job of opendir, readdir in more lines (or when you can simply could wrote `ls *.txt` for this). You expect to be rewarded with the possibility to do something more sophisticated (like to search only for four levels of subdirectories, or in the third subdir) when you take the extra effort to load a module.

    I would suggest also to avoid to write "my ($var) = x" in the documentation when you can write simply "my $var = x". The first idea suggests a list context for the vars, for both $scalars and @arrays, and again seems a little more difficult to read.

      pvaldes, your ideas are good, in hindsight. However this module and its interfaces were originally written in 2002. Many people and businesses are using the code (they even use it on the NASDAQ). I can't completely change the way calls work. However it might be possible to redesign several interfaces so that they work both in the current way and also in a way that is more consistent with "modern" interface styles, for example, the styles we see in DBIx::Class. This just requires more logical branching in order to handle different inputs, and that means more overhead...a tradeoff I would be forcing on users when some of them might not want it. Decisions, decisions.

      I've considered rewriting the whole thing in Moose, and consulted Ovid about it. The end conclusion was to leave it alone and not expose implementation by creating another namespace such as "File::Util::Moosed"

      If I was designing it brand-new today, I would still keep the name of list_dir, and provide an "ls" alias (which I may still do), and the syntax would be something more like the following:

      $f->ls( dirs => [ $dir, $dir2, $dir3 ], recurse => 3, # go three directories deep files-only => 1, # return only file names (no dirs) match => [ qr/txt$/, qr/!^\./ ] # all text files that don't be +gin with a "." )

      ...and File::Util would have to examine more carefully what each argument was, and intelligently handle the things it gets, regardless of what order they come in or whether or not they are necessarily a balanced key-value pairing. One example would be to automatically recognize listrefs of pre-compiled regular expressions as a valid argument, and just apply each regex as a pattern to match against.

      Such things might materialize in the future as I come closer to figuring out how to implement such features without breaking existing ones. I'm always open to suggestions regarding the backward-compatible introduction of new features, and patches of course.

      $ perl -MMIME::Base64 -e 'print decode_base64 "YWNlQHRvbW15YnV0bGVyLm1lCg=="'

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://996437]
Approved by kcott
Front-paged by toolic
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2018-04-24 15:07 GMT
Find Nodes?
    Voting Booth?