Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

How to read files in all subfolders?

by coltman (Acolyte)
on Oct 21, 2012 at 06:39 UTC ( #1000174=perlquestion: print w/replies, xml ) Need Help??
coltman has asked for the wisdom of the Perl Monks concerning the following question:

Dear all,

I have several hundreds of txt files to read. Unfortunately, they are stored under multiple sub-folders (could be several layers). How can I let PERL automatically search and read these files?

I have the following code. However, (1) it does not read through sub-folders and (2) it works only when I place the perl code under the same folder with the txt files. How can I modify my code to (1) read both root and sub-folders? and (2) specify the root folder rather than the default "dir"?

Any widsom will be greatly appreciated!

#=====file reading=============; open(FIND, "dir *.txt /B |") || die "couldn't open: $!\n"; #============Text reading In Each File====================; FILE: while (<FIND>) { m/(20\d\d)/; $filename=$1; print "$filename\n"; if (!open(TEXTFILE, $_)) { #print $_; print "Can't open $_--continuing...\n"; next FILE; } #============Article reading======================; while(<TEXTFILE>) { ......................................

Replies are listed 'Best First'.
Re: How to read files in all subfolders?
by davido (Archbishop) on Oct 21, 2012 at 06:51 UTC

    If you have the suspicion that recursing through directory structures to work on files within them is something that has been done before with Perl, you would be right. It's been done often enough that with every complete distribution of Perl, there's a module included with the name of File::Find. This is where you might want to start; File::Find solves the hardest part of the problem you're facing.


    Dave

Re: How to read files in all subfolders?
by zentara (Archbishop) on Oct 21, 2012 at 11:58 UTC
    Here is a simple usage of File::Find, which you can modify to suit your needs.
    #!/usr/bin/perl use File::Find; $|++; my $path = '.'; my $cmd = 'file'; finddepth (\&wanted,$path); sub wanted { return unless -f; #-d for dir ops or comment out for both # system("d2u -U -b -v $_") or warn "$!\n"; system($cmd ,$_) or warn "$!\n"; } __END__

    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh
Re: How to read files in all subfolders?
by Anonymous Monk on Oct 21, 2012 at 06:51 UTC
Re: How to read files in all subfolders?
by 2teez (Priest) on Oct 21, 2012 at 11:22 UTC

    Hi coltman,
    To open directory use opendir, not open which is use for files and associated with a FILEHANDLES.
    The code below uses recursive call to go through all the folders in a directory given.

    #!/usr/bin/perl use warnings; use strict; use Cwd qw(abs_path); die "no directory provided " unless defined $ARGV[0]; my $path = abs_path $ARGV[0]; search_all_folder($path); sub search_all_folder { my ($folder) = @_; if ( -d $folder ) { chdir $folder; opendir my $dh, $folder or die "can't open the directory: $!"; while ( defined( my $file = readdir($dh) ) ) { chomp $file; next if $file eq '.' or $file eq '..'; search_all_folder("$folder/$file"); ## recursive call read_files($file) if ( -f $file ); } closedir $dh or die "can't close directory: $!"; } } sub read_files { my ($filename) = @_; open my $fh, '<', $filename or die "can't open file: $!"; while (<$fh>) { print $_, $/; } }
    Hope this helps.
    UPDATE:
    Like others have said before now, it will be half the work and a lot easiler to use module like File::Find like so:
    use warnings; use strict; use Cwd qw(abs_path); use File::Find qw(find); die "no directory provided " unless defined $ARGV[0]; my $path = abs_path $ARGV[0]; find( \&search_all_folder, $path ); sub search_all_folder { chomp $_; return if $_ eq '.' or $_ eq '..'; read_files($_) if (-f); } sub read_files { my ($filename) = @_; open my $fh, '<', $filename or die "can't open file: $!"; while (<$fh>) { print $_, $/; } }

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me

      2teez,

      I noticed a few things that could help the performance of your File::Find solution above.

      In the search sub:

      find( \&search_all_folder, $path ); sub search_all_folder { chomp $_; return if $_ eq '.' or $_ eq '..'; read_files($_) if (-f); }
      • The chomp isn't needed here since File::Find changes to the subdirectory and returns just the filename in $_.
      • The return if line isn't needed since in the next line the read_files() sub is only called for regular files. '.' and '..' are directories so they won't be included.
      • Since the OP specified only to print text files why not use the file test -T? The -f test will allow the program to attempt to print binary files which is pretty annoying if there are any lurking in a subdirectory.

      In this function:

      sub read_files { my ($filename) = @_; open my $fh, '<', $filename or die "can't open file: $!"; while (<$fh>) { print $_, $/; } }
      • Printing $/ when printing each line will cause an extra blank line to appear since you did not use chomp when the line was read.

      And one question: what is the reason for using abs_path? I find that File::Find works fine with relative pathnames. Does it help the performance if an absolute pathname is given?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1000174]
Approved by davido
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (1)
As of 2018-01-18 04:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    How did you see in the new year?










    Results (206 votes). Check out past polls.

    Notices?