http://www.perlmonks.org?node_id=568521

tamaguchi has asked for the wisdom of the Perl Monks concerning the following question:

In my folder C:/dataset/
I have the following files:

20051026ng3_Well A1_21593.pkl
20051026ng3_Well A2_21593.pkl
20051026ng3_Well A3_21593.pkl
20051026ng3_Well A4_21593.pkl
20051026ng3_Well A5_21593.pkl
20051026ng3_Well A6_21593.pkl
20051026ng3_Well A7_21593.pkl
20051026ng3_Well A8_21593.pkl
20051026ng3_Well A9_21593.pkl
To search for filenames containing alphanumeric signatures like A1, A2 etc. I have written the following:
#!/usr/bin/perl -w use strict; my $path='C:/dataset/'; my @sign_arr=('A1', 'A2', 'A3','A4', 'A5', 'A6', 'A7', 'A8', 'A9'); foreach (@sign_arr) { if(defined (my $filename=(glob("$path*$_*.pkl")))) {print "$filename $_\n"} else {print "not found $_\n";} }
The ouput I get is following:

C:/dataset/20051026ng3_Well A1_21593.pkl A1

Not found A2

C:/dataset/20051026ng3_Well A3_21593.pkl A3

Not found A4

C:/dataset/20051026ng3_Well A5_21593.pkl A5

Not found A6

C:/dataset/20051026ng3_Well A7_21593.pkl A7

Not found A8

C:/dataset/20051026ng3_Well A9_21593.pkl A9

I wonder how this small code could generate this results why are not all filenames containing the signatures A1, A2 etc. found? Thank you for your help.

2006-08-21 Retitled by holli, as per Monastery guidelines
Original title: 'Glob strange bahavor'

Replies are listed 'Best First'.
Re: Glob strange behavior
by jhourcle (Prior) on Aug 21, 2006 at 12:50 UTC

    It's a side effect from calling glob in scalar context:

    glob EXPR glob In list context, returns a (possibly empty) list of fil +ename expansions on the value of EXPR such as the standard Un +ix shell /bin/csh would do. In scalar context, glob iterates thr +ough such filename expansions, returning undef when the list + is exhausted. This is the internal function implementing t +he "<*.c>" operator, but you can use it directly. If EXPR +is omit- ted, $_ is used. The "<*.c>" operator is discussed in +more detail in "I/O Operators" in perlop.

    Try instead:

    ... my ($filename)=glob("$path*$_*.pkl"); if (defined($filename)) ...

    Update: See graff's explanation for more specifics

Re: Glob strange bahavor
by Sidhekin (Priest) on Aug 21, 2006 at 12:57 UTC

    if(defined (my $filename=(glob("$path*$_*.pkl"))))

    Based on those extra parentheses, it looks like you're trying to call glob in list context (which would make more sense, too). But parentheses on the right hand side of assignment do not a list assignment make. Put them on the left hand side instead.

    Also, an empty list assignment in scalar context is actually defined (zero, to be precise), so you likely want this:

    if(my ($filename)=glob("$path*$_*.pkl"))

    print "Just another Perl ${\(trickster and hacker)},"
    The Sidhekin proves Sidhe did it!

      He's checking if a file exists, so a simpler fix would be to use the command designed to check just that: -e (or -f).
      #!/usr/bin/perl use strict; use warnings; my $path='C:/dataset/'; my @sign_arr=('A1', 'A2', 'A3','A4', 'A5', 'A6', 'A7', 'A8', 'A9'); foreach (@sign_arr) { my $filename = "$path*$_*.pkl"; if (-e $filename) {print "$filename $_\n"} else {print "not found $_\n";} }

      Update: Ack, nevermind, I didn't notice his use of *. First post of the morning, sorry.

Re: Glob strange behavior
by un-chomp (Scribe) on Aug 21, 2006 at 13:40 UTC
    You might also consider using File::Find::Rule for this type of task:
    use File::Find::Rule; my @wanted = File::Find::Rule->file() ->name( qr/A[0-9].*\.pkl$/ ) ->in('c:/dataset/');
Re: Glob strange behavior
by cdarke (Prior) on Aug 21, 2006 at 12:46 UTC
    Which version of perl? Early versions did not necessarily return filenames in alphabetical order on all platforms. You might be better off using:
    @files = glob("$path*A[0-9]*.pkl");
    or iterate through that from a foreach loop, rather than using glob in scalar context.
Re: Glob strange behavior
by calin (Deacon) on Aug 21, 2006 at 15:41 UTC

    People typing lines like this:

    my @sign_arr=('A1', 'A2', 'A3','A4', 'A5', 'A6', 'A7', 'A8', 'A9');

    make the lazy man in me cringe :)

    Using the range operator in list context to produce lists of automagically incremented values was one of the first things I learned to reduce typing... :

    my @sign_arr = 'A1'..'A9';
      Hi.

      calin++

      Just to illustrate what is available...

      my @sign_arr = qw(A1 A2 A3 A4 A5 A6 A7 A8 A9);
Re: Glob strange behavior
by mobby_6kl (Novice) on Aug 21, 2006 at 22:22 UTC
    I don't usually consider myself qualified to give out perl advice, and after reading your post again I realized you question was why doesn't your code work, and not asking for alternatives. Still, others already responded to that so I'll just offer my alternative using regex:
    while (glob '*.pkl') {print qq(Found $1 in $_\n) if /(A[1-9])/}
    Do you need it to print which alphanumeric signatures it didn't find? This could be easily added, but it would compromise the cute one-liner aspect. ;-)
Re: Glob strange behavior
by johnnywang (Priest) on Aug 21, 2006 at 23:37 UTC
    Can someone actually explain this? why glob in scalar context will return undef for every other file? The documentation says in scalar context glob returns an iterator(which is ok, so one can do a while loop to go through all of them). But why does it fail for every other file? I thought it might be some side effects on $_, but that's not. Thanks.
      The explanation was alluded to above, but not actually spelled out in the thread. It lies in the "I/O Operators" section of the perlop man page:
      A (file)glob evaluates its (embedded) argument only when it is starting a new list. All values must be read before it will start over. In list context, this isn't important because you automatically get them all anyway. However, in scalar context the operator returns the next value each time it's called, or "undef" when the list has run out. As with filehandle reads, an automatic "defined" is generated when the glob occurs in the test part of a "while", because legal glob returns (e.g. a file called 0) would otherwise terminate the loop. Again, "undef" is returned only once. So if you're expecting a single value from a glob, it is much better to say
      ($file) = <blurch*>;
      than
      $file = <blurch*>;
      because the latter will alternate between returning a filename and returning false.

      That's some pretty arcane stuff, but I think it gets the point across, which is: even though you would expect the file glob to "start fresh" each time you use it with a new string, it won't, unless it has already reached the end of a list that it built on a previous call.

      As used in the OP (globbed string matches one file, glob is used in a scalar context), it takes two iterations to reach the end of the first list, and thereafter only the odd-numbered glob strings are actually processed.

        Hi fellows, i would like to point out the different behavior of glob utilized in scalar context when the key is a variable or a literal.
        WRONG my @sign_arr=('A1' .. 'A9'); for my $sign (@sign_arr) { my $filename = <*$sign*.pkl>; print "$filename\n" if defined($filename); } # OUTPUT: # file_A1_.pkl # file_A3_.pkl # file_A5_.pkl # file_A7_.pkl # file_A9_.pkl # RIGHT my $filename; $filename = <*A1*.pkl>; print "$filename\n" if defined($filename); $filename = <*A2*.pkl>; print "$filename\n" if defined($filename); $filename = <*A3*.pkl>; print "$filename\n" if defined($filename); $filename = <*A4*.pkl>; print "$filename\n" if defined($filename); $filename = <*A5*.pkl>; print "$filename\n" if defined($filename); $filename = <*A6*.pkl>; print "$filename\n" if defined($filename); $filename = <*A7*.pkl>; print "$filename\n" if defined($filename); $filename = <*A8*.pkl>; print "$filename\n" if defined($filename); $filename = <*A9*.pkl>; print "$filename\n" if defined($filename); # OUTPUT: # file_A1_.pkl # file_A2_.pkl # file_A3_.pkl # file_A4_.pkl # file_A5_.pkl # file_A6_.pkl # file_A7_.pkl # file_A8_.pkl # file_A9_.pkl
        A bug?
Re: Glob strange behavior
by Anonymous Monk on Aug 22, 2006 at 13:40 UTC
    Sorry if i duplicate my post but the previous one was under another reply. This is my first post (actually the second), i'm just a lurker :) Hi fellows, i would like to point out the different behavior of glob utilized in scalar context when the key is a variable or a literal.
    WRONG my @sign_arr=('A1' .. 'A9'); for my $sign (@sign_arr) { my $filename = <*$sign*.pkl>; print "$filename\n" if defined($filename); } # OUTPUT: # file_A1_.pkl # file_A3_.pkl # file_A5_.pkl # file_A7_.pkl # file_A9_.pkl # RIGHT my $filename; $filename = <*A1*.pkl>; print "$filename\n" if defined($filename); $filename = <*A2*.pkl>; print "$filename\n" if defined($filename); $filename = <*A3*.pkl>; print "$filename\n" if defined($filename); $filename = <*A4*.pkl>; print "$filename\n" if defined($filename); $filename = <*A5*.pkl>; print "$filename\n" if defined($filename); $filename = <*A6*.pkl>; print "$filename\n" if defined($filename); $filename = <*A7*.pkl>; print "$filename\n" if defined($filename); $filename = <*A8*.pkl>; print "$filename\n" if defined($filename); $filename = <*A9*.pkl>; print "$filename\n" if defined($filename); # OUTPUT: # file_A1_.pkl # file_A2_.pkl # file_A3_.pkl # file_A4_.pkl # file_A5_.pkl # file_A6_.pkl # file_A7_.pkl # file_A8_.pkl # file_A9_.pkl
    A bug?