Multiple file input into a perl script

kelder has asked for the wisdom of the Perl Monks concerning the following question:

I recently posted a question on how I could pipe many files into a single perl script, and received answers such as "use @ARGV", which worked great for small projects. The command line input worked just like I needed it to. The problem, however, is that it turns out I actually have far more files to deal with than I thought, on the order of 100,000 at a time. Due to the size of the list in the command line, even when I use a wildcard, I get error output like this:

-bash: /usr/bin/perl: Argument list too long
[download]

Thus I have a problem. My input list is too long for my program to pipe in. What other solutions do I have? Is using @ARGV still a good idea, or do I have to look at another avenue? UPDATE!!!: I got my code to work! Thanks especially to Corion for recommending tye's sort function. Basically, this is how it played out:

@ARGV=glob my $pattern;
my @files=@ARGV;
my @sorted = @file[
    map { unpack "N", substr($_,-4) }
    sort
    map {
        my $key = $file[$_];
        $key =~ s[(\d+)][ pack "N", $1 ]ge;
        $key . pack "N", $_
    } 0..$#file
];

@ARGV=@sorted;

while (<>) {
    Do my function
}
if (eof(ARGV)) {
    Do end of file cleanup
}
[download]

Using this format, I was able to still use the <> operator, while piping in a sorted ARGV, so my output came out like this:

file1
file2
....
file10
file11
[download]

This is exactly what I wanted. Thanks for all the help!

Comment on Multiple file input into a perl script Select or Download Code

Replies are listed 'Best First'.
Re: Multiple file input into a perl script by Corion (Patriarch) on Sep 30, 2008 at 18:46 UTC
It depends. How do you find all those files? If you have them in a text file, read them from that text file: `use strict; my $filename = 'my_textfile.txt'; open my $fh, $filename or die "Couldn't read '$filename': $!"; chomp @ARGV = <$fh>; print "Processing $_" for @ARGV;` [download] If they all are below a certain directory, use File::Find: `use strict; use File::Find; my $directory = '/home/kelder/files'; find(sub { push @ARGV, $File::Find::name; }, $directory ); print "Processing $_" for @ARGV;` [download] If you want to use a shell glob pattern, you can prevent the shell from expanding it and do the expansion in Perl: `use strict; use File::DosGlob qw(bsd_glob); # to get sane whitespace semantics my $pattern = '/home/kelder/files/*.txt'; @ARGV = glob $pattern; print "Processing $_" for @ARGV;` [download] If you have the list in some other fashion, you'll have to tell us, but the basic pattern remains.	[reply] [d/l] [select]
Re^2: Multiple file input into a perl script by kelder (Novice) on Sep 30, 2008 at 19:01 UTC
I thought about using the glob function, since all of my files are in the same directory, but because I'm new to perl I might be misuing the format: My Code: `@files=<ABi1*>; foreach $file (@files) { while(<>) { Do my function; } if (eof($file)) { Do my end of file cleanup } }` [download] Does the way I set this up work, or am I completely screwing up the way you are supposed to use this function?	[reply] [d/l]
Re^3: Multiple file input into a perl script by Corion (Patriarch) on Sep 30, 2008 at 19:47 UTC
The magic diamond-operator `<>` only works if you stuff the filenames into `@ARGV`. But you surely have tried that yourself and merely forgot to tell me that you found your code didn't work the way you wrote it. I recommend you read up on open to learn how to open and read a single file and process that, and then proceed to do that in the loop: `use strict; use File::DosGlob qw(bsd_glob); my @files = glob 'ABi1*'; foreach my $file (@files) { open my $fh, '<', $file or die "Couldn't read '$file': $!"; while (<$fh>) { ... do your function }; # EOF, do end of file cleanup };` [download]	[reply] [d/l] [select]
Re^4: Multiple file input into a perl script by broomduster (Priest) on Sep 30, 2008 at 22:11 UTC
Re^5: Multiple file input into a perl script by Corion (Patriarch) on Oct 01, 2008 at 05:42 UTC
Some notes below your chosen depth have not been shown here
Re^2: Multiple file input into a perl script by kelder (Novice) on Sep 30, 2008 at 19:53 UTC
This worked out great: `use File::DosGlob qw(bsd_glob); # to get sane whitespace semantics my $pattern = '/home/kelder/files/*.txt'; @ARGV = glob $pattern;` [download] Only there is a new problem; my output file list reads like this: `Ai0.txt Ai1.txt Ai2.txt ... Ai10.txt Ai11.txt` [download] What ends up happening is that in my "results" file, where I print my counts, the order of the inputs is messed up. `Output: File1 File10 File11 File2` [download] How do I fix this ordering? Can I input the files in the order they are in the directory with another method? Thanks for all of the help so far!!	[reply] [d/l] [select]
Re^3: Multiple file input into a perl script by Corion (Patriarch) on Sep 30, 2008 at 19:58 UTC
Maybe you want to sort your output? You don't tell us in what order you want it sorted, so I'm going to recommend tye's [ Natural Sort ]: Sorting a string with numbers.	[reply]
Re^4: Multiple file input into a perl script by kelder (Novice) on Sep 30, 2008 at 20:25 UTC
Re^5: Multiple file input into a perl script by Corion (Patriarch) on Sep 30, 2008 at 20:33 UTC
Re: Multiple file input into a perl script by toolic (Bishop) on Sep 30, 2008 at 18:49 UTC
If all the files are in a single directory, you could try to use opendir and readdir to get a list of filenames in the directory. I have found this to be simpler that trying to fight with the Unix xargs command. If the files are is a directory tree, you could try to use File::Find::Rule. Update: I thought this sounded familiar... I already gave this answer in Re: file$name.class find - regexp ?.	[reply]
Re: Multiple file input into a perl script by Perlbotics (Archbishop) on Sep 30, 2008 at 19:11 UTC
You could also read the filenames from STDIN while allowing to pass extra arguments via `@ARGV` like so: `#!/usr/bin/perl use strict; print "Info: ARGV is: @ARGV\n"; while (<STDIN>) { chomp; next if /^\s$/; # skip empty lines if (-e $_) { # a regular file (might be suited to your needs) # do something with $_ as if it were shifted from @ARGV print "handling file: $_\n"; } else { warn "no such file: $_ \n"; } } __END__ usr@host:tmp> ls -1 file.pl \| fileabove.pl arg1 arg2 arg3 Info: ARGV is: arg1 arg2 arg3 handling file: fileabove.pl` [download] It could be used i.e. this way: `ls -1 *.dat \| fileabove.pl` [download] or `fileabove.pl < filenames.txt` [download] But finally it depends on your needs, which I might not have fully understood... Update: Added the shebang-line to get rid of extra-calls to perl executable.	[reply] [d/l] [select]
Re: Multiple file input into a perl script by graff (Chancellor) on Oct 01, 2008 at 04:03 UTC
You said: Due to the size of the list in the command line, even when I use a wildcard, I get error output like this: `-bash: /usr/bin/perl: Argument list too long` [download] Thus I have a problem. My input list is too long for my program to pipe in. Maybe you did not see (or understand) the reply I made on your previous thread? (it was here: Re^3: Piping many individual files into a single perl script) An array of 100,000 file names, read from STDIN (or from a named file that contains the list), should not pose any problem, unless you have an indecently small amount of RAM (or you are wasting too much of it on other stuff). Just use a "find" or "ls" command to generate the list of file names, read that list in your perl script, and loop over it. Your terminology in the bit that I quoted above seems confused. Using @ARGV to hold the list of file names to process (using a wildcard on the command line for your script) is not what we call "piping". Piping refers to reading from STDIN. Are you having some sort of problem with that?	[reply] [d/l]