Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Trouble with File::Find::Rule

by sowais (Sexton)
on Dec 12, 2013 at 21:08 UTC ( #1066919=perlquestion: print w/ replies, xml ) Need Help??
sowais has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks! I am a novice at Perl and need help with a code I have written. I am trying to grab all files with a certain extension using File::Find::Rule. The problem I have is that I am looking two specific directories using a 'while' loop with a counter for the files but for some reason the $directory path does not set to the 'Archive' option when $count = 0. After looking for files, I write them to a log for output. Please see my code below. Any help would be greatly appreciated. Thanks!

use strict; use warnings; use File::Find::Rule; my $directory = 'C:\Test\WMA_calls'; my $output_file = 'C:\Test\Results\output.txt'; my $count=0; my @files; while($count < 2) { print $count; $directory = 'C:\Test\Calls\Archive' if $count eq 0; $directory = 'C:\Test\Calls\History' if $count eq 1; @files = File::Find::Rule->file() ->name( "*.wma", "*.wmv" ) + ->in( $directory ); $count++; } if(@files) { open FILE, ">", $output_file or die "Can't open file, $output_file +: $!"; foreach(@files) { print FILE "$_\n"; } close FILE; } else { print "\n\nError!! No files found"; } exit 0;

Update: I was able to fix the above code and get it to work but the issue now is that I have more than 4000+ files in the two folders and its taking a really long time for the script to complete. Any advice on a more efficient/less memory hog way? Thanks again for the responses!

Update: Thank you for all the responses! I believe I was not clear on the original description of what I am trying to accomplish. I am trying to read all files of certain type from a given directory and all its subdirectories and eventually log all these filenames. From the previous responses I have been able to get a working code but when I did a dry run in Production, where I have over a million files I will be logging, its taking over 2hrs for just one directory read. Is there a more efficient/less memory consumption way I can do this? the dry run has taken over 100 MB of memory and counting...

Comment on Trouble with File::Find::Rule
Download Code
Re: Trouble with File::Find::Rule
by wazat (Scribe) on Dec 12, 2013 at 21:22 UTC

    It looks like your second time through the loop, the contents of @files, will be overwritten.

    Do you even need a loop? Would the following not work as well?

    @files = File::Find::Rule->file() ->name( "*.wma", "*.wmv" ) ->in( 'C:\Test\Calls\Archive', 'C:\Test\Calls\History');

      wazat, that did work and thanks for pointing that out (an embarrassing obvious). A second set of eyes does help!

      In addition to wazat's advice. I wonder if File::List might be helpful too.

      --Chris

      Yes. What say about me, is true.
      

        In addition to wazat's advice. I wonder if File::List might be helpful too.

        And hammer and pancakes

Re: Trouble with File::Find::Rule
by 2teez (Priest) on Dec 12, 2013 at 21:42 UTC

    Also note that it would be good practice not to hardcore your directories or file names into your script. It may be counter productive later.
    Imagine, you later now have to check directories more than 2, with names not as straight forward as you have it.
    Just saying.. :)

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
      2teez, I agree with you thats why I was using the hardcoded directories for testing purposes. In the actual implementation I will be passing the diretories to my script via the application that will execute it.
Re: Trouble with File::Find::Rule
by Laurent_R (Parson) on Dec 12, 2013 at 23:14 UTC
    I definitely agree with 2teez that you should probably not hard code the directories and the number of such directories. Or, if you do nonetheless, do it in a way that is easy to change. For example, this loop:
    while($count < 2) { print $count; $directory = 'C:\Test\Calls\Archive' if $count eq 0; $directory = 'C:\Test\Calls\History' if $count eq 1; @files = File::Find::Rule->file() ->name( "*.wma", "*.wmv" ) + ->in( $directory ); $count++; }
    could possibly be rewritten as follows:
    my $base_dir = 'C:\Test\Calls\'; my @directories = $base_dir . $_ . '\' for qw /History Archive/; my @files; for my $dir (@directories) { push @files, $_ for glob ($curr_dir . "*.wma"), glob ($curr_dir . + "*.wmv"); }
    This might still not be not the best way to do it, but the advantage, at least, is that if you need to add another subdirectory, you only need to add it to the @directories array and don't need to change anything else. The last line in the code above is not very satisfactory, because the extensions are hard coded. So you could change it to nested loops:
    my $base_dir = 'C:\Test\Calls\'; my @directories = $base_dir . $_ . '\' for qw /History Archive/; my @extensions = qw /wma wmv/; my @files; for my $dir (@directories) { for my $ext (@extensions) { push @files, $_ for glob ($curr_dir . $ext); } }
    Now, if you need an additional extension, there is only one place where it needs to be modified. This is untested, as I do not have Perl under Windows available right now.

      I use this code chunk quite a bit to get directory contents without extra modules, and the nice thing is you only change one line to re-direct to another location, or you use a variable that you pass to it...


      This particular version sorts the list and reads the contents of the files into an array, if that is of any value. The array dots is a list of all the files in the directory that you can grep for a pattern and work with.

      $sd = "../data"; opendir( DIR, $sd) || die; while( ($filename = readdir(DIR))){ next if "$sd\/$filename" =~ /\/\./; push @dots, "$sd\/$filename"; } ## end of while @dots = sort @dots; closedir(DIR); for($a=0;$a<@dots;$a++){ open (FILE, $dots[$a]); push @foo, <FILE>; close FILE; } ## end of for
        tbone654, thanks for the code! I have used something similar in the past as well but in this case I need to go into all the subfolders of the root folder, thats why I was using File::Find::Rule.
Re: Trouble with File::Find::Rule
by Jim (Curate) on Dec 14, 2013 at 04:08 UTC

    Unfortunately, you can't use File::Find and friends on Microsoft Windows. It doesn't handle Unicode file or folder names, long paths (i.e., paths that exceed MAX_PATH), or junction points. In general, ordinary Perl isn't useful for handling arbitrary files and folders on Windows. I suspect no modern scripting language is, including PowerShell. But I could be wrong.

    Jim

      "...you can't use File::Find and friends on Microsoft Windows.

      Outrageously inaccurate. Works fine on WinXP and up (and IIRC, should work fine even on W98).

      Likewise "isn't useful" and "no modern scripting language is...."

      Yes, you "could be" and, in fact, are wrong; way wrong!


      Quis custodiet ipsos custodes. Juvenal, Satires

        Oh, I wish I were wrong, ww—believe me. And if you can demonstrate with working Perl code examples and accompanying test cases that I'm wrong, no one will be more thrilled than I will be.

        Perl's built-in functions don't use the Windows API that they must use if they are to work correctly with Unicode file names, paths that exceed MAX_PATH, and junction points. At least this is how it has always been until the last time I researched the topic. If Perl's limitations with respect to file and folder handling on Windows have been fixed very recently, that's terrific. But I don't think they have or I'd know about it already.

        Jim

        You can't use chdir to change the current working directory to a Japanese folder name (Unicode) on Windows. It fails.

        Running this trivial Perl script…

            #!perl
        
            use strict;
            use warnings;
            use utf8;
            use autodie qw( chdir );
        
            binmode STDERR, ':encoding(UTF-8)';
        
            chdir 'C:/日本/';
        

        …fails with this error message…

            Can't chdir('C:/日本/'): No such file or directory at JapanFolder.pl line 8
        

        File::Find uses chdir.

        I'm running Strawberry Perl version 5.16.2.

        Jim

        UPDATE:

        Running this Perl script…

        #!perl use strict; use warnings; use utf8; use autodie qw( chdir ); binmode STDERR, ':encoding(UTF-8)'; chdir 'C:/Doesn’t Work/';

        …fails with this error message…

            Can't chdir('C:/Doesn’t Work/'): No such file or directory at DoesntWork.pl line 8
        

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1066919]
Approved by taint
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (9)
As of 2014-12-19 04:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (70 votes), past polls