Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Multiple file input into a perl script

by kelder (Novice)
on Sep 30, 2008 at 18:36 UTC ( [id://714601]=perlquestion: print w/replies, xml ) Need Help??

kelder has asked for the wisdom of the Perl Monks concerning the following question:

I recently posted a question on how I could pipe many files into a single perl script, and received answers such as "use @ARGV", which worked great for small projects. The command line input worked just like I needed it to. The problem, however, is that it turns out I actually have far more files to deal with than I thought, on the order of 100,000 at a time. Due to the size of the list in the command line, even when I use a wildcard, I get error output like this:
-bash: /usr/bin/perl: Argument list too long
Thus I have a problem. My input list is too long for my program to pipe in. What other solutions do I have? Is using @ARGV still a good idea, or do I have to look at another avenue? UPDATE!!!: I got my code to work! Thanks especially to Corion for recommending tye's sort function. Basically, this is how it played out:
@ARGV=glob my $pattern; my @files=@ARGV; my @sorted = @file[ map { unpack "N", substr($_,-4) } sort map { my $key = $file[$_]; $key =~ s[(\d+)][ pack "N", $1 ]ge; $key . pack "N", $_ } 0..$#file ]; @ARGV=@sorted; while (<>) { Do my function } if (eof(ARGV)) { Do end of file cleanup }
Using this format, I was able to still use the <> operator, while piping in a sorted ARGV, so my output came out like this:
file1 file2 .... file10 file11
This is exactly what I wanted. Thanks for all the help!

Replies are listed 'Best First'.
Re: Multiple file input into a perl script
by Corion (Patriarch) on Sep 30, 2008 at 18:46 UTC

    It depends. How do you find all those files? If you have them in a text file, read them from that text file:

    use strict; my $filename = 'my_textfile.txt'; open my $fh, $filename or die "Couldn't read '$filename': $!"; chomp @ARGV = <$fh>; print "Processing $_" for @ARGV;

    If they all are below a certain directory, use File::Find:

    use strict; use File::Find; my $directory = '/home/kelder/files'; find(sub { push @ARGV, $File::Find::name; }, $directory ); print "Processing $_" for @ARGV;

    If you want to use a shell glob pattern, you can prevent the shell from expanding it and do the expansion in Perl:

    use strict; use File::DosGlob qw(bsd_glob); # to get sane whitespace semantics my $pattern = '/home/kelder/files/*.txt'; @ARGV = glob $pattern; print "Processing $_" for @ARGV;

    If you have the list in some other fashion, you'll have to tell us, but the basic pattern remains.

      I thought about using the glob function, since all of my files are in the same directory, but because I'm new to perl I might be misuing the format: My Code:
      @files=<ABi1*>; foreach $file (@files) { while(<>) { Do my function; } if (eof($file)) { Do my end of file cleanup } }
      Does the way I set this up work, or am I completely screwing up the way you are supposed to use this function?

        The magic diamond-operator <> only works if you stuff the filenames into @ARGV. But you surely have tried that yourself and merely forgot to tell me that you found your code didn't work the way you wrote it.

        I recommend you read up on open to learn how to open and read a single file and process that, and then proceed to do that in the loop:

        use strict; use File::DosGlob qw(bsd_glob); my @files = glob 'ABi1*'; foreach my $file (@files) { open my $fh, '<', $file or die "Couldn't read '$file': $!"; while (<$fh>) { ... do your function }; # EOF, do end of file cleanup };
      This worked out great:
      use File::DosGlob qw(bsd_glob); # to get sane whitespace semantics my $pattern = '/home/kelder/files/*.txt'; @ARGV = glob $pattern;
      Only there is a new problem; my output file list reads like this:
      Ai0.txt Ai1.txt Ai2.txt ... Ai10.txt Ai11.txt
      What ends up happening is that in my "results" file, where I print my counts, the order of the inputs is messed up.
      Output: File1 File10 File11 File2
      How do I fix this ordering? Can I input the files in the order they are in the directory with another method? Thanks for all of the help so far!!
Re: Multiple file input into a perl script
by toolic (Bishop) on Sep 30, 2008 at 18:49 UTC
    If all the files are in a single directory, you could try to use opendir and readdir to get a list of filenames in the directory. I have found this to be simpler that trying to fight with the Unix xargs command.

    If the files are is a directory tree, you could try to use File::Find::Rule.

    Update: I thought this sounded familiar... I already gave this answer in Re: file$name.class find - regexp ?.

Re: Multiple file input into a perl script
by Perlbotics (Archbishop) on Sep 30, 2008 at 19:11 UTC

    You could also read the filenames from STDIN while allowing to pass extra arguments via @ARGV like so:

    #!/usr/bin/perl use strict; print "Info: ARGV is: @ARGV\n"; while (<STDIN>) { chomp; next if /^\s*$/; # skip empty lines if (-e $_) { # a regular file (might be suited to your needs) # do something with $_ as if it were shifted from @ARGV print "handling file: $_\n"; } else { warn "no such file: $_ \n"; } } __END__ usr@host:tmp> ls -1 file*.pl | fileabove.pl arg1 arg2 arg3 Info: ARGV is: arg1 arg2 arg3 handling file: fileabove.pl
    It could be used i.e. this way:
    ls -1 *.dat | fileabove.pl
    or
    fileabove.pl < filenames.txt
    But finally it depends on your needs, which I might not have fully understood...
    Update: Added the shebang-line to get rid of extra-calls to perl executable.

Re: Multiple file input into a perl script
by graff (Chancellor) on Oct 01, 2008 at 04:03 UTC
    You said:
    Due to the size of the list in the command line, even when I use a wildcard, I get error output like this:
    -bash: /usr/bin/perl: Argument list too long
    Thus I have a problem. My input list is too long for my program to pipe in.

    Maybe you did not see (or understand) the reply I made on your previous thread? (it was here: Re^3: Piping many individual files into a single perl script) An array of 100,000 file names, read from STDIN (or from a named file that contains the list), should not pose any problem, unless you have an indecently small amount of RAM (or you are wasting too much of it on other stuff). Just use a "find" or "ls" command to generate the list of file names, read that list in your perl script, and loop over it.

    Your terminology in the bit that I quoted above seems confused. Using @ARGV to hold the list of file names to process (using a wildcard on the command line for your script) is not what we call "piping". Piping refers to reading from STDIN. Are you having some sort of problem with that?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://714601]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-04-19 22:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found