Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Iterations in regex

by vitoco (Pilgrim)
on May 12, 2014 at 15:29 UTC ( #1085808=perlquestion: print w/ replies, xml ) Need Help??
vitoco has asked for the wisdom of the Perl Monks concerning the following question:

Im quite embarrassed, but I cannot figure out what's going on here:

#!perl use strict; use warnings; my $data = <DATA>; chomp $data; my @f = ($data =~ m!((\w+),+)+!g); print join("\t", @f) . "\n"; __DATA__ qwerty,asd,zxcvbnm,fgh,jkl,uiop,

Output:

uiop, uiop

I was expecting to receive many elements in the array: every word twice (one with and one without the comma), not just the last one.

What am I missing?

Comment on Iterations in regex
Select or Download Code
Re: Iterations in regex
by toolic (Chancellor) on May 12, 2014 at 15:44 UTC
    Don't use the last +
    my @f = ($data =~ m!((\w+),+)!g);

    YAPE::Regex::Explain

    The regular expression: (?-imsx:((\w+),+)+) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1 (1 or more times (matching the most amount possible)): ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- ,+ ',' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- )+ end of \1 (NOTE: because you are using a quantifier on this capture, only the LAST repetition of the captured pattern will be stored in \1) ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
Re: Iterations in regex
by LanX (Canon) on May 12, 2014 at 16:08 UTC
    toolic is right, with the last plus you ARE matching all words already in the first "iteration" (so /g is useless), but only the last matches can be returned (the ones before are overwritten).

    Without the + the /g will produce multiple attempts and return matches for each one.

    BTW: did you really mean ,+ ??? Looks weird...

    Cheers Rolf

    ( addicted to the Perl Programming Language)

      Thanks to both of you. I've tried adding and removing the g modifier, but never tried without the last operator.

      The sample code was a simplification of my real problem, where I'm trying to capture one specific record from one kind of table from a set of html documents, where each field has it's own line in the source.

      Doing that way, I had to split the original regex in two:

      1. one to identify the required record by the value of the first field
      2. another to the capture of the data fields

      BTW, the original regex was something like this:

      my ($k, @f) = ($h =~ m!<td.*?>(required_\d+_\d+.txt)</td>\s+(<td.*?> +(.+?)</td>\s+)+</tr>!m);

      Then, the ",+" actually meant whitespace "\s+", but I wanted to make them visible in the output. ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1085808]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (9)
As of 2014-07-25 07:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (169 votes), past polls