Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Iterations in regex

by vitoco (Friar)
on May 12, 2014 at 15:29 UTC ( #1085808=perlquestion: print w/replies, xml ) Need Help??
vitoco has asked for the wisdom of the Perl Monks concerning the following question:

Im quite embarrassed, but I cannot figure out what's going on here:

#!perl use strict; use warnings; my $data = <DATA>; chomp $data; my @f = ($data =~ m!((\w+),+)+!g); print join("\t", @f) . "\n"; __DATA__ qwerty,asd,zxcvbnm,fgh,jkl,uiop,


uiop, uiop

I was expecting to receive many elements in the array: every word twice (one with and one without the comma), not just the last one.

What am I missing?

Replies are listed 'Best First'.
Re: Iterations in regex
by toolic (Bishop) on May 12, 2014 at 15:44 UTC
    Don't use the last +
    my @f = ($data =~ m!((\w+),+)!g);


    The regular expression: (?-imsx:((\w+),+)+) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1 (1 or more times (matching the most amount possible)): ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- ,+ ',' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- )+ end of \1 (NOTE: because you are using a quantifier on this capture, only the LAST repetition of the captured pattern will be stored in \1) ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
Re: Iterations in regex
by LanX (Chancellor) on May 12, 2014 at 16:08 UTC
    toolic is right, with the last plus you ARE matching all words already in the first "iteration" (so /g is useless), but only the last matches can be returned (the ones before are overwritten).

    Without the + the /g will produce multiple attempts and return matches for each one.

    BTW: did you really mean ,+ ??? Looks weird...

    Cheers Rolf

    ( addicted to the Perl Programming Language)

      Thanks to both of you. I've tried adding and removing the g modifier, but never tried without the last operator.

      The sample code was a simplification of my real problem, where I'm trying to capture one specific record from one kind of table from a set of html documents, where each field has it's own line in the source.

      Doing that way, I had to split the original regex in two:

      1. one to identify the required record by the value of the first field
      2. another to the capture of the data fields

      BTW, the original regex was something like this:

      my ($k, @f) = ($h =~ m!<td.*?>(required_\d+_\d+.txt)</td>\s+(<td.*?> +(.+?)</td>\s+)+</tr>!m);

      Then, the ",+" actually meant whitespace "\s+", but I wanted to make them visible in the output. ;-)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1085808]
Front-paged by Corion
[Corion]: ambrus: AnyEvent(::HTTP) doesn't integrate well with Prima, that's my main problem
[Corion]: There is a weirdo shim because there is a POE integration for Prima, and if you use that, you can use the POE adapter of AnyEvent. What I'd want is something transport agnostic that parses HTTP or produces HTTP output, so that the communication with ...
[Corion]: ... the socket is done by my code. Ideally that module would not be based on callbacks ;)
[Corion]: Basically, something that decouples the HTTP parsing (+ determining whether to redirect, etc) from the IO
[Corion]: All clients I'm aware of don't do that but issue all IO themselves

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (12)
As of 2016-12-07 15:50 GMT
Find Nodes?
    Voting Booth?
    On a regular basis, I'm most likely to spy upon:

    Results (130 votes). Check out past polls.