Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

What does this regex do?

by Zubinix (Acolyte)
on Sep 22, 2006 at 06:18 UTC ( [id://574325]=perlquestion: print w/replies, xml ) Need Help??

Zubinix has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks

I'm have trouble understanding the following:

my @cvs = $file =~ /((?:(?:[^\n@]+|@[^@]*@)\n?)+)/gs;

Can you help me understand what the above line of code does?


Thanks!

Replies are listed 'Best First'.
Re: What does this regex do?
by davido (Cardinal) on Sep 22, 2006 at 06:56 UTC

    It should (assuming a match) capture a series of values from $file into the array @cvs. The matching occurs like this (explanation brought to you by YAPE::Regex::Explain):

    The regular expression: (?s-imx:((?:(?:[^\n@]+|@[^@]*@)\n?)+)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?s-imx: group, but do not capture (with . matching \n) (case-sensitive) (with ^ and $ matching normally) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- (?: group, but do not capture (1 or more times (matching the most amount possible)): ---------------------------------------------------------------------- (?: group, but do not capture: ---------------------------------------------------------------------- [^\n@]+ any character except: '\n' (newline), '@' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- @ '@' ---------------------------------------------------------------------- [^@]* any character except: '@' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- @ '@' ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- \n? '\n' (newline) (optional (matching the most amount possible)) ---------------------------------------------------------------------- )+ end of grouping ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

    The preceeding explanation is the output from the following test code:

    use warnings; use strict; use YAPE::Regex::Explain; my $REx = qr/((?:(?:[^\n@]+|@[^@]*@)\n?)+)/s; print YAPE::Regex::Explain->new($REx)->explain;

    When deciphering a regular expression, it's often helpful to use the /x modifier so you can lay the regular expression out in smaller chunks that are easier to digest.

    my $REx = qr/ ( (?: (?: [^\n@]+ | @ [^@]* @ ) \n? )+ ) /gsx;

    Dave

Re: What does this regex do?
by bart (Canon) on Sep 22, 2006 at 08:35 UTC
    /((?:(?:[^\n@]+|@[^@]*@)\n?)+)/gs
    Just a bit of optimization: I think the inner noncapturing parens would better be replaced with a "cut" (non-backtracking) pattern. You have nested plusses, and that could occasionally go very awry. See Jeffrey Friedl's book on regular expressions for a detailed discussion.

    Oh, as there are no dots in the pattern, the /s modifier is useless.

    /((?:(?>[^\n@]+|\@[^@]*\@)\n?)+)/g

    There. That should match the same things, but without danger for near-endless backtracking.

    Never mind the warning in perlre, I see no reason why this particular feature would have to change as it has been in use for many years, and unlike other extensions like (?{ CODE }) and (??{ EXPR }), it is very simple, and effective. For the latter extensions, I think they'll stay too, although the API may still change. (They are not simple.)

Re: What does this regex do?
by Anonymous Monk on Sep 22, 2006 at 06:54 UTC
    YAPE::Regex::Explain
    D:\>perl -MYAPE::Regex::Explain -e"print YAPE::Regex::Explain->new(shi +ft)->explain" "/((?:(?:[^\n@]+|@[^@]*@)\n?)+)/gs" The regular expression: (?-imsx:/((?:(?:[^\n@]+|@[^@]*@)\n?)+)/gs) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- / '/' ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- (?: group, but do not capture (1 or more times (matching the most amount possible)): ---------------------------------------------------------------------- (?: group, but do not capture: ---------------------------------------------------------------------- [^\n@]+ any character except: '\n' (newline), '@' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- @ '@' ---------------------------------------------------------------------- [^@]* any character except: '@' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- @ '@' ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- \n? '\n' (newline) (optional (matching the most amount possible)) ---------------------------------------------------------------------- )+ end of grouping ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- /gs '/gs' ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- D:\>
      That "explanation" didn't go quite right -- the OP's regex is not trying to match a literal "/" at the beginning or a literal "/gs" at the end.
Re: What does this regex do?
by pbeckingham (Parson) on Sep 22, 2006 at 13:16 UTC

    Approximation: it captures single lines that do not contain @, and multiple lines (if possible) containing patterns that start and end with @.

    Not that it helps much, but here it is in action. Anyone have ideas on what it's trying to find?

    #! /usr/bin/perl use strict; use warnings; my $file = do {local $/, <DATA>}; my @cvs = $file =~ /((?:(?:[^\n@]+|@[^@]*@)\n?)+)/gs; print "matched [[$_]]\n" for @cvs; __DATA__ line 1 line 2 line 3 longer @line@ with @some stuff@ line with nothing on it line 4
    and the output is:
    matched [[line 1 line 2 line 3 longer @line@ with @some stuff@ line with nothing on it ]] matched [[line 4 ]]



    pbeckingham - typist, perishable vertebrate.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://574325]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2024-04-16 04:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found