Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

File name regex

by dev2dev (Initiate)
on Jan 20, 2006 at 06:18 UTC ( [id://524407]=perlquestion: print w/replies, xml ) Need Help??

dev2dev has asked for the wisdom of the Perl Monks concerning the following question:

hi monks, need your expertiese, i have a regex to match file name, i am intrested in either ^\d{8}(_(\d)+)*\.ilf or ^\d{8}(\s(\d)+)*\.sht is there any shortcut for this
if (($file=~m/^\d{8}(_(\d)+)*\.ilf/i) || ($file=~/^\d{8}(\s(\d)+)*\.sh +t/i)){ #blah blah }
Thanks in advance

Replies are listed 'Best First'.
Re: File name regex
by ikegami (Patriarch) on Jan 20, 2006 at 06:52 UTC

    It's not possible to combine those regexps if those are trully meant to be captures ((...)) and not groupings ((?:...)). At least, it wouldn't be possible without change the variables in which the data is captured.

    Let's assume that you only used the parens for grouping (and that you forgot $ at the end of each regexp), then you'd be starting with the following:

    if ($file =~ /^\d{8}(?:_\d+)*\.ilf$/i || $file =~ /^\d{8}(?:\s\d+)*\.sht$/i ) { ... }

    The possibilities for joining are still quite limited since there is a mixture of common and not common elements. The following illustrates this:

    vvvvvvvvv vvv vvv vvvv common /^\d{8}(?:_ \d+)*\.ilf$/ix /^\d{8}(?:\s\d+)*\.sht$/ix ^^ ^^^ not common

    The best that can be done, as far as I can tell, is to join the common beginning and the common ending. The following illustrates this:

    (?:_ \d+)*\.ilf /^\d{8} $/ix (?:\s\d+)*\.sht

    We get the following regexp:

    if ($file =~ /^\d{8}(?:(?:_\d+)*\.ilf|(?:\s\d+)*\.sht)$/i) { ... }

    This is definitely less readable.

Re: File name regex
by duff (Parson) on Jan 20, 2006 at 06:59 UTC

    This should do for you:

    if ($file =~ /^\d{8}((_\d+)*\.ilf|(\s\d+)*\.sht)/i) { ... }

    You don't need parens around \d.

    Update: What ikegami said about captures vs. grouping. I assumed you were only using the parens for grouping.

Re: File name regex
by BrowserUk (Patriarch) on Jan 20, 2006 at 07:53 UTC

    I believe this will do what you ask, where if the option '_nnn' is present then the extension should be '.ilf', or if it is ' nnn' then the extension should be '.sht'; whilst capturing the entire optional part to $1 and the last digit of that optional part to $2:

    Update: tightened slightly. Update2: Tightening removed; unnecessary.

    m[ ^ \d{8} ( (?(?=.* \. ilf) _ | \s | (?!) ) ) (\d)+ )* \. (?:ilf| +sht) ]x

    Whether you would call that a 'shortcut' is debatable.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: File name regex
by grinder (Bishop) on Jan 20, 2006 at 08:56 UTC

    (Warning: more Regexp::Assemble pimping ahead).

    Assuming you do not really care about those captures, if I "unroll" the patterns, it looks like you're interested matching against the following patterns:

    ^\d{8}\.ilf ^\d{8}_\d+\.ilf ^\d{8}\.sht ^\d{8}\s\d+\.sht

    Using the assemble script from the above module, or something like

    my $re = Regexp::Assemble->new(flags=>'i')->add(@list)

    it produces the following pattern:

    ^\d{8}(?:\.(?:ilf|sht)|\s\d+\.sht|_\d+\.ilf)

    ... which looks roughly like that the other people came up with manually. If you do need the captures, the pattern becomes

    ^\d{8}(?:(\s(\d+))\.sht|(_(\d+))\.ilf|\.(?:ilf|sht))

    in which case you can perform the match and get the captures with something like

    my @result = map {defined} ($file =~ /$re/);

    • another intruder with the mooring in the heart of the Perl

Re: File name regex
by parv (Parson) on Jan 20, 2006 at 07:02 UTC
    never mind what i wrote earlier; i was quite wrong as i failed to note what ikegami noted.

      That would work if he can live with the false positives. i.e., you pattern will match files named 12345678_1.sht and 12345678 1.ilf

      Erm, and now that you've deleted the previous contents we'll never know what was wrong. It's better form to add an update (possibly bracketing the incorrect parts with <strike> tags) stating that somethings wrong. Now the existing reply to your node makes no sense because it has no context.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://524407]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (7)
As of 2024-03-28 23:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found