Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Iterations in regex

by vitoco (Friar)
on May 12, 2014 at 15:29 UTC ( #1085808=perlquestion: print w/replies, xml ) Need Help??
vitoco has asked for the wisdom of the Perl Monks concerning the following question:

Im quite embarrassed, but I cannot figure out what's going on here:

#!perl use strict; use warnings; my $data = <DATA>; chomp $data; my @f = ($data =~ m!((\w+),+)+!g); print join("\t", @f) . "\n"; __DATA__ qwerty,asd,zxcvbnm,fgh,jkl,uiop,


uiop, uiop

I was expecting to receive many elements in the array: every word twice (one with and one without the comma), not just the last one.

What am I missing?

Replies are listed 'Best First'.
Re: Iterations in regex
by toolic (Bishop) on May 12, 2014 at 15:44 UTC
    Don't use the last +
    my @f = ($data =~ m!((\w+),+)!g);


    The regular expression: (?-imsx:((\w+),+)+) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1 (1 or more times (matching the most amount possible)): ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- ,+ ',' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- )+ end of \1 (NOTE: because you are using a quantifier on this capture, only the LAST repetition of the captured pattern will be stored in \1) ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
Re: Iterations in regex
by LanX (Bishop) on May 12, 2014 at 16:08 UTC
    toolic is right, with the last plus you ARE matching all words already in the first "iteration" (so /g is useless), but only the last matches can be returned (the ones before are overwritten).

    Without the + the /g will produce multiple attempts and return matches for each one.

    BTW: did you really mean ,+ ??? Looks weird...

    Cheers Rolf

    ( addicted to the Perl Programming Language)

      Thanks to both of you. I've tried adding and removing the g modifier, but never tried without the last operator.

      The sample code was a simplification of my real problem, where I'm trying to capture one specific record from one kind of table from a set of html documents, where each field has it's own line in the source.

      Doing that way, I had to split the original regex in two:

      1. one to identify the required record by the value of the first field
      2. another to the capture of the data fields

      BTW, the original regex was something like this:

      my ($k, @f) = ($h =~ m!<td.*?>(required_\d+_\d+.txt)</td>\s+(<td.*?> +(.+?)</td>\s+)+</tr>!m);

      Then, the ",+" actually meant whitespace "\s+", but I wanted to make them visible in the output. ;-)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1085808]
Front-paged by Corion
[Corion]: Yeah, I'm thinking more of marketing mailing lists, not public broadcast-style mailing lists
[marto]: yeah, so our hackerspace, we run mailman. that's a public discussion list, not a weekly buy our crap marketing list, people can unsubscribe at any time. What they can't do is delete their mails from the archive, or from the inboxes of our hundreds of user
[Corion]: marto: I'm not sure on how to treat mail archives. I think you could either set an auto-deletion timespan or an auto-anonymisation timespan if you wanted to do it right.

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (9)
As of 2018-05-22 08:24 GMT
Find Nodes?
    Voting Booth?