Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

RFC - File::Util 4.x Series Pre-Release

by Tommy (Chaplain)
on Jan 30, 2013 at 20:25 UTC ( [id://1016150]=perlmeditation: print w/replies, xml ) Need Help??

What's Up

File::Util has undergone some major changes in v4.x, some of which have been discussed here since late December. I've preserved complete backward compatibility while performing the overhaul.

The 4.x series is a both a response to community complaints/requests, and a big push to bring it into step with "modern" best practices and interface styles.

I'm looking for people to kindly let me know what they think...good/bad/otherwise. Why? I'd like to get as much community feedback as possible in the way of "social review" of the new interface before publishing this distribution of major changes, features, bug fixes. I value what you have to say.

The git repository is here: https://github.com/tommybutler/file-util

A packaged dist is available here: http://www.atrixnet.com/File-Util-4.130300.tar.gz

What's New

Other than a slew of bug fixes and feature additions, a quick look at some key differences in the interface is succinctly presented here. See also the NEWS file in the dist.

What's Left

Things left before actual 4.x release would be to correct any grammar/spelling issues in the docs that I haven't already caught, to add more to the cookbook (and revisit recipes in the cookbook that are old and could be improved), and to add even more to the test suite (which currently runs over 500 tests in developer release test mode). See also the TODO file in the dist.

My gratitude goes out to those who provide feedback, even if all you do is read over the Manual on github and point out anything you find good/bad/otherwise. For those who try out the dist itself (maybe with perlbrew?) and play with File::Util a bit, I thank you in the most emphatic terms possible. It's so important to me to put forth the best code I can for the community, for those who use the module commercially, for the CPAN, and for Perl.

My thanks already goes out to MST and RJBS who have provided valuable help via IRC and CPAN RT. Also to SirSpammenot and Nick Perez who helped via email and Google+, and to anyone who ever filed a bug report or smoked File::Util.

Tommy
A mistake can be valuable or costly, depending on how faithfully you pursue correction
  • Comment on RFC - File::Util 4.x Series Pre-Release

Replies are listed 'Best First'.
Re: RFC - File::Util 4.x Series Pre-Release
by vsespb (Chaplain) on Jan 30, 2013 at 23:43 UTC
    Some comments:
    
    
    1. Filename like - :a:b:c:d:e:f:g.txt  - I think Mac
    OS Classic support was dropped from Perl !
    
    
    
    2. That's very sad that you use 5.006 and seems
    not to support Unicode filenames processing, or you do?
    
    
    
    3. Bigger problem with crospatform filenames
    is case insensetivity and unicode normalization (on MacOSX). Need unicode for this.
    Would be great to implement stuff to convert filenames to canonical form with this in mind.
    
    4.
    > escape_filename
    > Illegal characters (i.e.- any type of newline character, tab, vtab, and the following / | * " ? < : > \),
    
    It's not really clear to me what escape you are talking about and what characters are illegal. Escape for shell command line? What shell/what OS?
    
    
    
    5. file_type, existent, can_write etc
    I don't see any good reason to create wrappers to perl -X operators
    
    
    6. Returns alphabetically sorted all file names in the directory specified if it exists
    Do you mean ASCII sorted. To sort alphabetically you need unicode AND to know which locale to use.
    
    
    7.
    > list_dir
    > Recurse subdirectories
    
    hm. there is no option to follow/not follow symlinks for directory? What about recursive symlinks?
    
    8.
    > needs_binmode
    i think any OS needs binmode:
    
    > http://search.cpan.org/~dom/perl-5.12.5/pod/perlfunc.pod#binmode
    > . Note that, despite what may be implied in "Programming Perl" (the Camel, 3rd edition) or elsewhere, :raw is not simply the inverse of :crlf. Other layers that would affect the binary nature of the stream are also disabled. See PerlIO, perlrun, and the discussion about the PERLIO environment variable.
    
    9.
    > $ILLEGAL_CHR = qr/\\$NL\r\n\t\013\*\"\?\<\:\>/;
    a) on Linux any character except NULL and '/' is legal
    b) NULL is illegal.
    

      Brilliant, vsespb. Thank you for all your points! I'll respond below...

      point #1 - I've never encountered a situation where someone needed classic macos support, but I tried to support it anyway.

      point #2 - I've been considering lifting the minimum Perl to 5.6. Thoughts?

      point #3 - still thinking about that one

      point #4 - in addition to claiming to be cross platform, File::Util guides you to use filenames and characters that can port between FAT32, EXT2 and upwards. Is it bad to enforce that? Hmmm. Nobody ever brought it up before. This could become much more complicated if I get unicode involved --- or --- I could just not attempt to trap nasty characters. The entire point of trying to do so was to make sure nobody tried to name a file with an embedded directory separator in it. It grew out from there (by request, from people who wanted me to trap *potential* dangers and provide diagnostic and "helpful" error messages.) Perhaps it's time to leave that behind...

      point #5 - Agree to disagree there. While I personally can relate to what you're saying, I've had a lot of people ask for methods that are "easier to remember" than -X. For the sake of those people, those methods will remain.

      point #6 - Yes, they are sorted a la sort { $a cmp $b } OR sort { uc $a cmp uc $b } depending on what was requested by the caller. That's "asciibetical" sorting for the most part. I should either advertise that up front, or use a unicode sorting mechanism. I wonder if the latter is overkill.

      point #7 - I haven't written a way to detect looping symlinks, so I don't follow them in the code. It could be an option, but I'd have to keep track of actual inodes I think (lstat). Is there a preferred way to do this without memory bloat and performance degredation while constantly adding to and comparing entries in the %inodes_seen lookup table?

      point #8 - That very well may be deprecated and silently removed from the documentation. On the backend, everything is done with syswrite anyway, for THAT EXACT reason.

      point #9 - same reasoning and response as point #4. Open to suggestions and criticisms on this.

      Tommy
      A mistake can be valuable or costly, depending on how faithfully you pursue correction
        > point #2 
        It was about unicodel. I think think that proper unicode support will require 5.8.. No other objections about version.
        
        > point #4
        It's pretty uncommon to _escape_ illegal characters to make filenames portable. I think better deny it. Btw here is what Dropbox thinks about filename portability https://www.dropbox.com/help/145/en
        
        > point #7
        yep, that's the thing. when one provide method to traverse directory, it usually have option to follow symlinks (with this option on it takes much more memory) OR at least
        this method should not hang - it should detect symlinks to stop crawling it
        
        Why 5.6 or even older versions? Are you supporting legacy systems? Unicode support seems far more important these days.

        Elda Taluta; Sarks Sark; Ark Arks
        My deviantART gallery

Re: RFC - File::Util 4.x Series Pre-Release
by BrowserUk (Patriarch) on Jan 30, 2013 at 21:14 UTC

    A comment -- which may apply to the original interface as much as the new one (I'm not familiar with either) -- but can_read and can_write seem particularly vulnerable to misinterpretation.

    Those two are usually associated with the immediate ability to read from or write to a handle; rather than the "effective permissions" usage; which might better be described by can_be_read and can_be_written.

    Beyond that, this is not a module I see myself ever using (in its old or new form). after paging down through the "succinct" doc about 5 times my eyes glazed over. Personally, I much preferred the brevity of the old "scripting" interface to the verbosity of the "modern [sic] interface; though I never felt the need for even that. I'll be sticking with glob.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Thanks for the suggestion. Noted. I'm definitely taking it into account. It wouldn't be hard to, say, *can_be_read = *can_read; # NOOOO! ;-)

      As for "succinct" -- that was in reference to the SYNTAX section of the File::Util::Manual. The actual front-facing man page is much shorter.

      Tommy
      A mistake can be valuable or costly, depending on how faithfully you pursue correction

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://1016150]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (7)
As of 2024-04-19 10:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found