Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Best way to match a file.

by Athanasius (Archbishop)
on Jul 25, 2013 at 16:07 UTC ( [id://1046376]=note: print w/replies, xml ) Need Help??


in reply to Best way to match a file.

Three quick observations:

  1. You can make the match more efficient by anchoring it to the end of the string: m/(.{10,}XYZQW.*\.csv$)/i

  2. .{10,}.* says “match at least 10 characters, followed by 0 or more characters”. It is equivalent to .{10,} by itself, i.e., the additional .* is redundant.

  3. The /g modifier is also redundant here.

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^2: Best way to match a file.
by Anonymous Monk on Jul 25, 2013 at 17:02 UTC
    That’s good, but if in a different condition could it match both file names and with different extensions in:
    print "\n REMOVED $file\n\n" if $file =~ m/(.{10,}XYZQW|KMHYT.*\.csv| +\.txt)/i;
    Thanks

      Yes, but you need to group the alternations, and for efficiency, the grouping should be non-capturing:

      #! perl use strict; use warnings; for ( 'ASQWERFD.YYxxxx.W12345.XYZQW.D072413.csv', 'ASQWERFD.YYxxxx.W12345.XYZQW.D072413.txt', 'ASQWERFD.YYxxxx.W12345.XYZQW.D072413.dat', 'ASQWERFD.YYxxxx.W12345.KMHYT.D072413.csv', 'ASQWERFD.YYxxxx.W12345.XYZQA.D072413.csv', ) { if (/ .{10,} (?: XYZQW | KMHYT) .* \. (?: csv | txt) $ /ix) { print "Matched $_\n"; } else { print "Ignoring $_\n"; } }

      Output:

      11:57 >perl 673_SoPW.pl Matched ASQWERFD.YYxxxx.W12345.XYZQW.D072413.csv Matched ASQWERFD.YYxxxx.W12345.XYZQW.D072413.txt Ignoring ASQWERFD.YYxxxx.W12345.XYZQW.D072413.dat Matched ASQWERFD.YYxxxx.W12345.KMHYT.D072413.csv Ignoring ASQWERFD.YYxxxx.W12345.XYZQA.D072413.csv 11:57 >

      On grouping, see Regular Expressions:

      WARNING: Once Perl sees that you need one of $&, $`, or $' anywhere in the program, it has to provide them for every pattern match. This may substantially slow your program. Perl uses the same mechanism to produce $1, $2, etc, so you also pay a price for each pattern that contains capturing parentheses. (To avoid this cost while retaining the grouping behaviour, use the extended regular expression (?: ... ) instead.)

      Note that I’ve also used /x for improved readability.

      Hope that helps,

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1046376]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (2)
As of 2024-04-25 19:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found