Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

File::Find seems grossly inefficient for performing simple file tasks

by taint (Chaplain)
on Apr 26, 2013 at 06:27 UTC ( #1030774=perlquestion: print w/ replies, xml ) Need Help??
taint has asked for the wisdom of the Perl Monks concerning the following question:

Greetings,
In an effort to figure out how to perform the equivalent of this shell script:
find ./iso/ -type f -cmin '+11' -uid web -exec rm '{}' \;
using Perl. I made use of one of the Perl utilities, find2perl. After reading it's syntax, I used what I understood to be the equivalent:
find2perl ./iso/ -iname '*.xz' -user web -ctime 1 -exec rm '{}' \;
-- with 2 exceptions;
1) I added -iname '*.xz'
2) I was unable to define time in minutes, as only -ctime is available, which == day(s).
I had expected a similarly short equivalent to be returned upon execution. But much to my surprise, I received the following:
#! /usr/local/bin/perl -w eval 'exec /usr/local/bin/perl -S $0 ${1+"$@"}' if 0; #$running_under_some_shell use strict; use File::Find (); # Set the variable $File::Find::dont_use_nlink if you're using AFS, # since AFS cheats. # for the convenience of &wanted calls, including -eval statements: use vars qw/*name *dir *prune/; *name = *File::Find::name; *dir = *File::Find::dir; *prune = *File::Find::prune; sub wanted; my (%uid, %user); while (my ($name, $pw, $uid) = getpwent) { $uid{$name} = $uid{$uid} = $uid; } # Traverse desired filesystems File::Find::find({wanted => \&wanted}, './iso/'); exit; sub wanted { my ($dev,$ino,$mode,$nlink,$uid,$gid); (($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_)) && /^.*\.xz\z/si && ($uid == $uid{'web'}) && (int(-C _) == 1) && (unlink($_) || warn "$name: $!\n"); }
Is this right?!
I'm not going to pretend to be a Perl GURU -- far from it. But even after removing the comments, this seems to inefficient -- no?
Anyway, if this is really the best option to perform such a short task in a shell with Perl. It looks to me that "shelling out" within Perl is more efficient -- minus Taint, of course.

Thank you for any consideration in this matter.

--chris

#!/usr/bin/perl -Tw
use perl::always;
my $perl_version = "5.12.4";
print $perl_version;

Comment on File::Find seems grossly inefficient for performing simple file tasks
Select or Download Code
Re: File::Find seems grossly inefficient for performing simple file tasks
by BrowserUk (Pope) on Apr 26, 2013 at 07:17 UTC

    The two command lines are pretty much the same size. And it is only the command line that you are seeing of the shell code.

    If you were to look at the source code of the find command, it runa to several hundred lines.

    Try comparing like with like, and do not confused length of source with efficiency.

    (It can be a factor, especially in Perl, but not in this case.)


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: File::Find seems grossly inefficient for performing simple file tasks
by DrHyde (Prior) on Apr 26, 2013 at 10:38 UTC
    What about it seems inefficient?
Re: File::Find seems grossly inefficient for performing simple file tasks
by RichardK (Priest) on Apr 26, 2013 at 12:11 UTC

    Did you mean programmer efficiency?

    File::Find::Rule has a nicer interface and is easier to use (IMHO).

    So your query could look something like this :-

    use File::Find::Rule; my $uid = getpwnam('web'); my @files = File::Find::Rule->file->name('*.iso')->uid($uid)->in('/'); unlink $_ for @files;

    There are methods for the stat tests, ctime,size etc, so you can add more stages as you need them. I'm not sure what 'find -cmin' actually does so I didn't attempt that bit.

      Greetings RichardK, and thank you for your reply.
      Yes. This is exactly the sort of return I had anticipated (as you provided).
      With the exceprion of:
      file->name('*.iso')
      which should have read:
      file->name('*.xz')
      as ./iso/ was a reference I used to a directory.
      As to find(1); -cmin refers to:
      -cmin n True if the difference between the time of last change of file status information and the time find was started, rounded up to the next full minute, is n minutes.
      referring to *BSD UNIX' version of FIND(1).

      Which is actually the most important part of my reason for trying this;
      I need to clobber (perldoc -f unlink) symlinks (perldoc -f symlink) older than 11 minutes. Unfortunately, Perls find2perl only provides:

      -atime N True if last-access time of file matches N (measured in days) (see bel +ow). -ctime N True if last-changed time of file's inode matches N (measured in days, + see below). -mtime N True if last-modified time of file matches N (measured in days, see be +low). -newer FILE True if last-modified time of file matches N. # # # See below: # # # 1. * N is prefixed with a +: match values greater than N 2. * N is prefixed with a -: match values less than N 3. * N is not prefixed with either + or -: match only values equal t +o N
      NOTE: (measured in days, see below), which is different than the find the system provides.
      As the system' find provides minutes. While I'm sure it must be possible to feed the string some math to make it more granular, I'm not clever enough to figure out how. :(

      Thank you again, for taking the time to respond.

      --chris

      #!/usr/bin/perl -Tw
      use perl::always;
      my $perl_version = "5.12.4";
      print $perl_version;

        I read that find man page too and I'm still not sure exactly what it does!

        But, I'd interpret that to mean -cmin 7 is changed exactly 7 minutes ago ( which seems a bit odd ).

        Anyway, back to the point :), if you look at stat you'll see that atime,mtime,ctime are in seconds since the epoch and the pod for File::Find::Rule says :-

        stat tests The following "stat" based methods are provided: "dev", "in +o", "mode", "nlink", "uid","gid", "rdev", "size", "atime", "mtime", "ctime", "blksize", and "blocks". See "stat" in perlfunc for details. Each of these can take a number of targets, which will foll +ow Number::Compare semantics. $rule->size( 7 ); # exactly 7 $rule->size( ">7Ki" ); # larger than 7 * 1024 * 1024 by +tes $rule->size( ">=7" ) ->size( "<=90" ); # between 7 and 90, inclusive $rule->size( 7, 9, 42 ); # 7, 9 or 42

        So, you can write a mtime rule to do whatever you need.

        I'm guessing mtime but these things are a specific to your OS and file system so you'll have to play around a bit to see what works for you.

Re: File::Find seems grossly inefficient for performing simple file tasks
by chrestomanci (Priest) on Apr 26, 2013 at 15:42 UTC

    There where a couple of blog posts on the speed of the many CPAN modules avalable for file finding.

    There looks to be quite a selection avalable. File::Find looks to be fastest. There are alternatives with different APIs for the programmer that you may find easer to use, but some are rather slow.

    The author looks be same person as rjbs here, but that is not certain as there are no links from one account to the other.

      @DrHyde, @BrowserUk, @chrestomanci.
      Greetings to all of you, and thank you for your replies!
      While I can completely understand the conclusion all of you arrived at;
      Please let me take the time to clarify "grossly inefficient", as to my intended assessment.
      I was referring to the volume of code (source) that find2perl emitted as an alternative to the shell' find && rm script.
      So to be clear; I do/have not found Perls' File::Find to be "grossly inefficient", but rather; it appeared (to me) that the code/source required to achieve the same results in Perls' File::Find, was much greater than tha shells' counterpart.
      I definitely meant no disrespect to Perl || Perls' File::Find. :)

      @chrestomanci
      The links you provided made for some interesting reading -- thanks!

      Best wishes.

      --chris

      #!/usr/bin/perl -Tw
      use perl::always;
      my $perl_version = "5.12.4";
      print $perl_version;

        Hmmm, that type of comparison does not really make sense to me.

        If I want to find all the lines that contains the letters "ab" in a file, using shell script, I can just write:

        grep ab file.txt

        If I want to do that in a Perl one-liner, there is just no way I can do that in just 8 characters plus the name of the file. Just the "perl -e " sequence has 8 characters, and I haven't even started to give the code of my Perl script. The script could be as short as something like this:

        perl -ne 'print if /ab/' file.txt

        (Maybe someone will find some way of doing it shorter, but that's not the point.)

        There is just no way a Perl program could be as simple (as concise, as short) as a simple shell command, but it makes no sense to compare them. A shell command may contain hundreds or thousands of code source lines. If you go this way, I could also include all the code that I want or need in an x.pl file and then say that:

        grep ab file.txt can be replaced by the following: x.pl

        which shows that Perl is far more concise than the shell or almost pretty anything else.

        I should add that I don't know many languages where something like this:

        perl -ne 'print if /ab/' file.txt

        can be coded so concisely (or course, sed and awk could do that, but we are again comparing things that are not really comparable).

        So to be clear; I do/have not found Perls' File::Find to be "grossly inefficient", but rather; it appeared (to me) that the code/source required to achieve the same results in Perls' File::Find, was much greater than tha shells' counterpart.

        Yes, we understand, you run screaming at sunset because you think the sky is burning

        Sure, it takes more code. That's what happens when you try to use a general purpose language instead of a task-specific mini-language.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1030774]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2014-12-27 09:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (176 votes), past polls