Sofie has asked for the wisdom of the Perl Monks concerning the following question:

Hi I am a beginner at Perl, trying to do some really simple things, but nothing seems simple.. I have a folder with a bunch of tab sep txt files. All the files have the same type of information, and I need to extract the information from one of the columns in each file and put in a new file. I have previously managed to do this with one file by readin gthe file into an array and iterating each row of the array, spliting it on tab and the taking the second column (where the data I want is) and printing this to a new file. But now I have several files. I started by using a while loop to open the directory and counting the number of files in the directory. But struggling to get to reading each file and extracting my data.. After lots of googling, there seems to be lots of different answers, but can't really understand them. I need something very simple so I can understand how it works.

#!/usr/bin/perl -w use strict; #create file to write to open (LABNR, ">>Labnr_all.txt") or die "Could not open file"; #open the directory where files are located $dirname = 'Filer'; opendir (DIR, $dirname) or die "Could not open $dirname\n"; #count files $nrfiles = 0; while ($filename = readdir(DIR)){ $nrfiles ++ if $filename =~ /\.txt/; print "$filename\n" if -f $filename; #this is where I want to look into each file and extract the info... } print "The number of files in the folder: $nrfiles\n"; closedir(DIR)

Replies are listed 'Best First'.
Re: Extract information from several files in directory
by hippo (Chancellor) on Nov 27, 2020 at 15:38 UTC
    I have previously managed to do this with one file

    That's a good place to start. Might I suggest the following approach to help extend this to several files?

    1. Take your working code and move it into a subroutine (see perlsub) which takes the filename to be read as the only argument. Confirm the script still works as before.
    2. Now simply call the subroutine twice in the script with a different filename each time. Confirm that the script produces the output you expect for two input files.
    3. Now write your loop to find all the files in the directory and just call the subroutine from within the loop. You should now have output from all the files.

    Portioning up the code into smaller blocks (such as using subroutines) helps you to work on small parts of a larger problem in isolation. If you have specific problems with any particular part feel free to ask. Remember that an SSCCE is best. Good luck.


    🦛

Re: Extract information from several files in directory
by tybalt89 (Prior) on Nov 27, 2020 at 16:06 UTC

    In before everyone else suggests using a module for this. Path::Tiny is my current favorite module for general file bashing.

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11124289 use warnings; use Path::Tiny; my $dirname = 'Filer'; my $outputfile = path('output.second.column'); $outputfile->spew(''); # empty file for my $file ( path($dirname)->children( qr/\.txt\z/ ) ) { $outputfile->append( map { (split /\t|\n/)[1] . "\n"} $file->lines ) +; } # following for debug only :) system 'more Filer/* output.second.column | cat'; # because 'more' shows filename and contents when piped

    Outputs:

    :::::::::::::: Filer/one.txt :::::::::::::: one two three four five six :::::::::::::: Filer/three.txt :::::::::::::: thirteen fourteen fifteen sixteen seventeen eighteen :::::::::::::: Filer/two.txt :::::::::::::: even eight nine ten eleven twelve :::::::::::::: output.second.column :::::::::::::: two five fourteen seventeen eight eleven

    Is this the kind of thing you were looking for ?

    P.S. See, it really is simple...

Re: Extract information from several files in directory
by AnomalousMonk (Bishop) on Nov 27, 2020 at 21:23 UTC

    This won't help you at the moment, but for future reference, keep in the back of your mind the modules Text::CSV, which is pure Perl, and the closely related Text::CSV_XS, which does the same things only faster, but requires local compilation that can be problematic.

    These modules do lots of stuff and can be a great help when processing | reading and writing CSV files of any kind. However, their learning curve, while quite reasonable, is steep enough that you may wish to simplify your current situation by sticking close to what you know.


    Give a man a fish:  <%-{-{-{-<

      If you start simple, Text::CSV_XS' interface does not have a steep learning curve at all :)

      If all your text files have a header, and you want "column" fooble extracted from all TAB-separated .txt files

      use strict; use warnings; use Text::CSV_XS qw( csv ); my @result; foreach my $f (sort glob "*.txt") { # Use File::Find for recursive act +ions csv (in => $f, headers => "auto", sep => "\t", on_in => sub { push @result => $_{fooble}; }); } open my $fh, ">", "results.txt" or die $!; say $fh $_ for grep { length } @result; close $fh;

      Enjoy, Have FUN! H.Merijn
Re: Extract information from several files in directory
by LanX (Cardinal) on Nov 27, 2020 at 14:49 UTC
Re: Extract information from several files in directory
by wazat (Monk) on Nov 29, 2020 at 18:32 UTC

    One thing to be aware of is that readdir() returns file names without the directory part. If using readdir() you will need to prepend the file names with the directory part of the file path in order to open it for reading.

    You could just build the path yourself, which is not necessarily portable.

    my $path = "Filer/$filename";

    or you could use File::Spec->catfile() for portability

    use File::Spec; my $path = File::Spec->catfile('Filer', $filename");

    This is an old fashioned approach.

    If you use glob() it will return file paths. Then again for portability code>File::Spec->catfile()</code> could be used to build the argument to glob().

    As others have pointed out, using Path::Tiny simple. It is a more modern approach. For example

    my @paths = path("Filer")->children( qr/\.txt\z/ ); # now do something with each element of @paths