bmcquill has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to get better at regex and I'm starting with Perl. I want to be able to go through a directory and find all the files that begin with messages and MAY have a "." and a digit behind it, but it should not match something that has say .txt, .pl, etc. Any assistance is greatly appreciated. I want to find all the files that are messages, messages., messages.1, etc. but NOT messages.txt or messages.pl. Does that help?

Replies are listed 'Best First'.
Re: Regex for files
by Athanasius (Bishop) on Apr 23, 2015 at 03:24 UTC
Re: Regex for files
by aaron_baugher (Curate) on Apr 23, 2015 at 02:31 UTC

    If I understand your requirements, you mean:

    # Should match:
    messages
    messages.1
    messages.345
    # should not match:
    messages.txt
    messages.
    messages123
    

    If that's correct, I'll assume you know how to open a directory and read the files, so then inside that loop, just see if the filename matches the pattern you're looking for. This pattern looks for a filename starting with "messages" followed optionally by a . and any number of digits. The grouping parentheses around the dot and digits ensure that it will have both or neither.

    if($filename =~ /^messages(\.\d+)?/ ){ # this one matches }

    Aaron B.
    Available for small or large Perl jobs and *nix system administration; see my home node.

      Interoperability note: Windows inherits from DOS -- filenames that end in dot are indistinguishable from filenames with no ending dot:

      D:\PerlMonks>echo>message. This is a test. D:\PerlMonks>dir Directory of D:\PerlMonks 04/22/2015 11:12 PM <DIR> . 04/22/2015 11:12 PM <DIR> .. 04/22/2015 11:12 PM 17 message D:\PerlMonks>echo>message This is a test. D:\PerlMonks>dir Directory of D:\PerlMonks 04/22/2015 11:12 PM <DIR> . 04/22/2015 11:12 PM <DIR> .. 04/22/2015 11:12 PM 17 message D:\PerlMonks>echo>"message." This is a test. D:\PerlMonks>dir Directory of D:\PerlMonks 04/22/2015 11:12 PM <DIR> . 04/22/2015 11:12 PM <DIR> .. 04/22/2015 11:12 PM 17 message D:\PerlMonks>

      The nit: Under Windows, message. and message need to be in the same category.

      Hi, Aaron thanks that regex still returns everything. Any other ideas?

        Add a $ to the end of aaron_baugher's expression, like this: /^messages(\.\d*)?$/. Also note the change of the + to a *

Re: Regex for files
by GotToBTru (Prior) on Apr 23, 2015 at 02:47 UTC
Re: Regex for files
by shmem (Chancellor) on Apr 23, 2015 at 10:42 UTC
    all the files that begin with messages and MAY have a "." and a digit behind it, but it should not match something that has say .txt, .pl, etc.

    Having a directory populated with these files

    messages messages. messages.1 messages1.2.3 messages.1.gz messages.2 messages.3 messages.pl messages.txt

    the following code

    opendir D, '.'; while(readdir D) { /^messages(?:\.\d?)?$/ and print $_,"\n"; }

    will output those:

    messages. messages.2 messages messages.1 messages.3

    The elements of that regular expression can be best explained using the x modifier (see perlre and perlfunc):

    opendir D, '.'; while(readdir D) { m/ # begin of match ^ # match at the beginning of element, i.e. $_ messages # match 'messages' literally ( # begin of match group ?: # subject to * ? + not creating a capture \. # match a period \d? # optionally (?) match a digit ) # end of match group ? # which may or not occur $ # before the end of the string /x # tell m// that we use extended regex w/comments and # if found print "$_\n" # print it out. }
    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'

      readdir is low level :) Path::Tiny is a whole nother level :P

      use Path::Tiny qw/ path /; print "$_\n" for path('.')->children( qr /^messages(?:\.\d?)?$/ ) ;
        readdir is low level :) Path::Tiny is a whole nother level :P

        That may well be, but is irrelevant for Regex for files.


        perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
Re: Regex for files
by Marshall (Abbot) on Apr 23, 2015 at 02:24 UTC