Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Loading all .txt files within current directory

by TJCooper (Beadle)
on Jan 31, 2016 at 12:13 UTC ( [id://1154128]=perlquestion: print w/replies, xml ) Need Help??

TJCooper has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to deparse the following one-liner (using -MO=Deparse) in-order to construct it as a .pl script:

perl -i.tmp -lane '@ar = split /([ABC])/, $F[5]; $s = 0; $s += $n * ("A" eq $op ? 0 : 1) while ($n, $op) = splice @ar, 0, 2; $w = "g"; $l = length($F[9]); print "$w\t$F[2]\t$F[3]\t$l\t$F[5]\t0" if $F[1] =~ 50; $w = "y"; $l = length($F[9]); $p = $F[3]+$s; print "$w\t$F[2]\t$p\t$l\t$F[5]\t$s" if $F[1] =~ 10; print "Head1\tHead2\tHead3\tHead4\tVar1\tVar2" if $.==1; close ARGV if eof;' *.txt;

-MO=Deparse Output:

BEGIN { $^I = ".tmp"; } BEGIN { $/ = "\n"; $\ = "\n"; } LINE: while (defined($_ = <ARGV>)) { chomp $_; our(@F) = split(' ', $_, 0); @ar = split(/([ABC])/, $F[5], 0); $s = 0; $s += $n * ('A' eq $op ? 0 : 1) while ($n, $op) = splice(@ar, 0, 2 +); $w = 'g'; $l = length $F[9]; print "$w\t$F[2]\t$F[3]\t$l\t$F[5]\t0" if $F[1] =~ /50/; $w = 'y'; $l = length $F[9]; $p = $F[3] + $s; print "$w\t$F[2]\t$p\t$l\t$F[5]\t$s" if $F[1] =~ /10/; print "Head1\tHead2\tHead3\tHead4\tVar1\tVar2" if $. == 1; close ARGV if eof; } -e syntax OK

The deparsed output does not appear to include the ability to read in all .txt files within the current directory - without having to specify them on the command line such that they feed into ARGV. I was under the impression I could simply add to line-1:

my @ARGV = glob("*.txt");

And each file would be passed to <ARGV> for processing, however when running:

perl script.pl

Whilst in a current directory containing .txt files, nothing happens. No error is returned, but it does not process anything nor complete. I have also tried to wrap the code in a for-loop using:

my @ARGV = glob("*.txt"); foreach my $ARGV {

And alternatively getting the directory itself:

use cwd my $dir = cwd() foreach my $files (global("$dir/*.txt")) {

But this produces the same issue.

What am I missing? Also, given that I was originally creating backups using -i within the one-liner, how is this now handled within a script? Could this also cause issues?

Replies are listed 'Best First'.
Re: Loading all .txt files within current directory
by choroba (Cardinal) on Jan 31, 2016 at 12:44 UTC
    @ARGV is not a lexical variable. Remove the my. See perlvar:

    Perl identifiers that begin with digits, control characters, or punctuation characters are exempt from the effects of the package declaration and are always forced to be in package main; they are also exempt from strict vars errors. A few other names are also exempt in these ways:
    ENV STDIN INC STDOUT ARGV STDERR ARGVOUT SIG
    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      Thanks. So I can load in all .txt files simply using @ARGV = glob('*.SAM');? I have tried this, but as before - the script does not execute the commands (besides printing the headers) and I can't figure out why.
Re: Loading all .txt files within current directory
by Laurent_R (Canon) on Jan 31, 2016 at 16:12 UTC
    Please note that your script will probably not do the same thing under Linux and Windows.

    As an example, this:

    $ perl -E 'say $_ for @ARGV' m*.txt mail_cmc.txt modules.txt mots.txt ...
    prints the file names matching the "m*.txt" pattern under Linux, Unix or Cygwin. But it doesn't work under Windows:
    C:\Users\Laurent>perl -E "say $_ for @ARGV" t*.* t*.*
    The difference is that the Unix or Linux shell will expand "m*.txt" into a list of files matching the pattern and pass the list to the Perl script, whereas Windows is too lazy to do that and just passes the pattern as you have entered.

    In the latter case (i.e. with Windows, you need glob or something equivalent to expand this pattern into a list of files.

    perl -E "my @a = glob(shift); say $_ for @a" t*.* test_hash.pl test_parl10_1.pl test_perl10.pl test_perl11.pl ...
      Because of this I use
      BEGIN { @ARGV = map glob, @ARGV}
      or some variations of it in oneliners and rarely in scripts too. The result is more portable: using the perl glob instead of that offered by the shell. That said windows cmd is totally unreliable, infact * expansions works as expected but within few commands only:
      Wildcards are supported by the following commands: ATTRIB, CACLS, CIPER, COMPACT, COPY, DEL, DIR, EXPAND, EXTRACT, FIND, +FINDSTR, FOR, FORFILES, FTP, ICACLS, IF EXIST, MORE, MOVE, MV, NET (* +=Any Drive), PERMS, PRINT, QGREP, REN, REPLACE, ROBOCOPY, ROUTE, TAKE +OWN, TYPE, WHERE, XCACLS, XCOPY

      as explained in details on this external site

      L*

      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Loading all .txt files within current directory
by Anonymous Monk on Jan 31, 2016 at 13:33 UTC

    As they stand, both the one-liner and the emitted script iterate over the command-line arguments as they appear after file name expansion.

    perldoc -v ARGV says that ARGV is a special filehandle that iterates over the file names in @ARGV when specified inside angle brackets, and that is where the iteration takes place.

    Your attempted modification of the emitted script to always iterate over *.txt is almost correct. What I think is needed is to remove the "my"; that is, the line should simply be

    @ARGV = glob( '*.txt' )

    The thing is, Perl has two completely different places for variables to live. Lexical variables are created using 'my', and are accessible only within the lexical scope of the 'my'. Global variables are created when 'my' is not specified, live in a name space ('main' by default), and are accessible from anywhere (including other name spaces if you fully-qualify the name). "Magic" variables like @ARGV are global, and in fact are typically forced into the 'main' name space.

    Novice Perl programmers are encouraged to specify "my" because lexical variables minimize the chance for unplanned and unexpected interactions between different parts of the code. But in this case you want to modify the global @ARGV, because that is the one with the magic connection to ARGV. By specifying "my" you create a lexical @ARGV, which hides the global one and has no magic attached.

      Thank you. That clears a lot up. Currently the script using @ARGV = glob('*.SAM'); will only run to completion if the .txt files are specified on the command line along with it rather than simply running it within a directory containing .txt files. I'm not sure why this is occurring. When I do specify .txt files, I get the error: sh: -c: line 1: syntax error: unexpected end of file.

        Do you really need that <> magic ?

        #!perl use strict; use File::Copy; my $extension = '.tmp'; my @files = glob("*.txt"); for my $file (@files) { print "\n--------- $file-----------\n"; my $backup = $file.$extension; rename($file,$backup); # keep as backup open IN, '<', $backup or die "$!"; open OUT, '>',$file or die "$!"; # overwrite existing while (<IN>) { # process lines print OUT $_; } }
        poj

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1154128]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (2)
As of 2024-03-19 06:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found