Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Iterations in regex

by vitoco (Friar)
on May 12, 2014 at 15:29 UTC ( #1085808=perlquestion: print w/ replies, xml ) Need Help??
vitoco has asked for the wisdom of the Perl Monks concerning the following question:

Im quite embarrassed, but I cannot figure out what's going on here:

#!perl use strict; use warnings; my $data = <DATA>; chomp $data; my @f = ($data =~ m!((\w+),+)+!g); print join("\t", @f) . "\n"; __DATA__ qwerty,asd,zxcvbnm,fgh,jkl,uiop,

Output:

uiop, uiop

I was expecting to receive many elements in the array: every word twice (one with and one without the comma), not just the last one.

What am I missing?

Comment on Iterations in regex
Select or Download Code
Re: Iterations in regex
by toolic (Bishop) on May 12, 2014 at 15:44 UTC
    Don't use the last +
    my @f = ($data =~ m!((\w+),+)!g);

    YAPE::Regex::Explain

    The regular expression: (?-imsx:((\w+),+)+) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1 (1 or more times (matching the most amount possible)): ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- ,+ ',' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- )+ end of \1 (NOTE: because you are using a quantifier on this capture, only the LAST repetition of the captured pattern will be stored in \1) ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
Re: Iterations in regex
by LanX (Canon) on May 12, 2014 at 16:08 UTC
    toolic is right, with the last plus you ARE matching all words already in the first "iteration" (so /g is useless), but only the last matches can be returned (the ones before are overwritten).

    Without the + the /g will produce multiple attempts and return matches for each one.

    BTW: did you really mean ,+ ??? Looks weird...

    Cheers Rolf

    ( addicted to the Perl Programming Language)

      Thanks to both of you. I've tried adding and removing the g modifier, but never tried without the last operator.

      The sample code was a simplification of my real problem, where I'm trying to capture one specific record from one kind of table from a set of html documents, where each field has it's own line in the source.

      Doing that way, I had to split the original regex in two:

      1. one to identify the required record by the value of the first field
      2. another to the capture of the data fields

      BTW, the original regex was something like this:

      my ($k, @f) = ($h =~ m!<td.*?>(required_\d+_\d+.txt)</td>\s+(<td.*?> +(.+?)</td>\s+)+</tr>!m);

      Then, the ",+" actually meant whitespace "\s+", but I wanted to make them visible in the output. ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1085808]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (8)
As of 2015-07-05 20:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (68 votes), past polls