Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Reg Expression on file name

by Anonymous Monk
on Jan 06, 2003 at 19:10 UTC ( [id://224703]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I need to check if a file begins with AB DAT and ends with .doc

Here is my attempt:
next if($file !~ /^AB\sDAT*\.doc$/i);

Replies are listed 'Best First'.
Re: Reg Expression on file name
by sauoq (Abbot) on Jan 06, 2003 at 19:28 UTC

    The T* part of that restricts the match to files beginning with "AB DA" followed by 0 or more T's followed by ".doc" and that's probably not what you meant. You are missing a dot. Try /^AB DAT.*\.doc$/i instead. Note that I replace the \s with a literal space. That's because \s will also match a tab or a newline.

    I don't know if you really want the /i modifier. That will make the expression case insensitive so it will also match files beginning with "ab dat" or "Ab DaT" and so on.

    You might also consider using unless rather than if ... !~ too. I'd write the whole thing as

    next unless $file =~ /^AB DAT.*\.doc$/;

    By the way, filenames with spaces in them are yucky. If you have control over the filenames, I suggest renaming them and replacing spaces with underscores.

    -sauoq
    "My two cents aren't worth a dime.";
    
Re: Reg Expression on file name
by fruiture (Curate) on Jan 06, 2003 at 19:29 UTC

    I guess th error was "it doesn't work" ;-) 1

    The '*' in regular expressions is a quantifier (0 or more) for whatever stood before it. It's not like '*' in shell globbing, where it stands for any number of characters.

    next unless $file =~ m/^ AB \s DAT .* \.doc $/x;

    A regexp is overhead here, because you only check for constant strings: 'AB DAT' (that space is probaly not going to be a newline or tab someday) and '.doc', so it's more effective to say:

    next unless substr($file,0,6) eq 'AB DAT' and substr($file,-4) eq '.doc' ;

    1 whenever you have a problem, describe what went wrong. In this case now it was clear, but for the future: describe the unwanted behaviour.

    --
    http://fruiture.de
      so it's more effective to say:
      next unless substr($file,0,6) eq 'AB DAT' and substr($file,-4) eq '.doc' ;

      A. That's not more effective. It's just as effective. It may be ever so slightly more efficient but the micro-optimization probably isn't really all that important.

      B. It's verbose and harder to read at a glance than a simple regular expression is. This is probably more important than whatever small optimization you might get by using substr.

      If the filename were in $_ I'd suggest next unless /^AB DAT/ and /\.doc/; which would be a good compromise.

      I just noticed your first suggestion:

      next unless $file =~ m/^ AB \s DAT .* \.doc $/x;
      I think this case in particular would really be a good time to use a literal space instead of \s and avoid /x. In general, I think /x makes short regular expressions harder to read. Do you often use /x on short expressions and if so, what's your reasoning?

      -sauoq
      "My two cents aren't worth a dime.";
      

        True, it's more readable to have an expression and the kind-of-best way is the one with two expressions. To make this quite efficient and readable, a prepost($name,$prefix,$postfix) function using substr inside would be the solution if this problem occurs more than once. This is hypothetical, it's perhaps only 30 lines of code where neither speed nor readability matter :)

        About /x: I'm starting to use it nearly everywhere, i don't think it makes anything harder to read. Perl 6 regular expressions (which are then somehow comparable to Parse::RecDescent now) are /x'ed by default, getting used to this is probably helpful. Imho it makes any expression more readable, at least after you're used to it. You see this differently, because you're probably used to compact expressions because they are common in Perl 5. Try forcing yourself to make everything very optical via /x and you'll change your mind in a week.

        --
        http://fruiture.de
      Thanks for all responses!
Re: Reg Expression on file name
by thezip (Vicar) on Jan 06, 2003 at 19:37 UTC

    I believe that you'll need to add a '.' just before the asterisk, as in:

    next if($file !~ /^AB\sDAT.*\.doc$/i); ^
    Otherwise, you'll only match strings like:
    AB DA.doc
    AB DAT.doc
    AB DATTTTTTTTTTTTTT.doc

    The way you have coded it, the asterisk refers to zero or more occurrences of the previous character, 'T', instead of "any character", as I think you intended.

    Where do you want *them* to go today?
Re: Reg Expression on file name
by Zaxo (Archbishop) on Jan 06, 2003 at 19:29 UTC

    You're mixing shell glob patterns with regex patterns.     next if($file !~ /^AB\sDAT.*\.doc$/i); will do the job.

    After Compline,
    Zaxo

Re: Reg Expression on file name
by Anonymous Monk on Jan 06, 2003 at 19:28 UTC
    next if /^AB\sDAT.*\.doc/i;
Re: Reg Expression on file name
by foxops (Monk) on Jan 06, 2003 at 19:22 UTC
    How about:
    use strict; use File::Find; my(@directories_to_seach); @directories_to_seach = "C:/Documents and Settings/Desktop"; find(\&wanted, @directories_to_seach); sub wanted { if ("$_" =~ /^AB\sDAT*\.doc$/i) { print "$_\n"; } }
      Is my regular expression correct??
      /^AB\sDAT*\.doc$/i

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://224703]
Approved by RMGir
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-04-25 15:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found