Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

pattern matching

by perltux (Monk)
on Oct 12, 2012 at 14:38 UTC ( #998704=perlquestion: print w/replies, xml ) Need Help??
perltux has asked for the wisdom of the Perl Monks concerning the following question:

Hi, given the same identical input string, why does the following line return an array with 2 elements:

my @parts=($sysex_dump =~ /\xF0C.[~z](..LM  0087[A-Z][A-Z].+?)(..LM  0087[A-Z][A-Z].+?)\xF7/gs);

While this line only returns an array with the first of the above two elements:

my @parts=($sysex_dump =~ /\xF0C.[~z](..LM  0087[A-Z][A-Z].+?)+\xF7/gs);

I would have thought the '+' after the bracket should match more than once and assign all matches to the array?

How do I get the second pattern match to assign all matches as multiple elements to the array?

Edited to add a self contained code example (the print and the for loop are just to check the array I get):

my @parts=("\xF0C.z..LM 0087AAaaa..LM 0087BBbbb\xF7"=~ /\xF0C.[~z](. +.LM 0087[A-Z][A-Z].+?)+\xF7/s); print STDERR @parts." parts\n"; for($a = 0; $a < @parts; $a++) { print STDERR ($a+1).": ".length($parts[$a])." length\n"; }

Replies are listed 'Best First'.
Re: pattern matching
by choroba (Bishop) on Oct 12, 2012 at 14:57 UTC
    Captures with quantifiers only return the last match. I cannot find it documented, but try
    perl -E 'say for "ab" =~ /(.)+/g'
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      How about this?

      #! perl use strict; use warnings; use YAPE::Regex::Explain; my $re = qr/(.)+/; print YAPE::Regex::Explain->new($re)->explain;


      The regular expression: (?-imsx:(.)+) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1 (1 or more times (matching the most amount possible)): ---------------------------------------------------------------------- . any character except \n ---------------------------------------------------------------------- )+ end of \1 (NOTE: because you are using a quantifier on this capture, only the LAST repetition of the captured pattern will be stored in \1) ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

      Not strictly documentation, but certainly confirmation that “Captures with quantifiers only return the last match.”

      Update: Here is the official documentation:

      /(\d)(\d)/ # Match two digits, capturing them into $1 and $2 /(\d+)/ # Match one or more digits, capturing them all into $1 /(\d)+/ # Match a digit one or more times, capturing the last into + $1

      Note the difference between the second and third patterns. The second form is usually what you want. The third form does not create multiple variables for multiple digits. Parentheses are numbered when the pattern is compiled, not when it is matched.

      The Camel Book, 4th edition, pages 221–222.

      Athanasius <°(((><contra mundum

        Thanks, I believe you are right, but how do I get then all matches as individual elements into the array?
Re: pattern matching
by nemesdani (Friar) on Oct 12, 2012 at 14:56 UTC
    I think: There's a .+? part in the pattern. If it's different in the 2 supposed match, the + won't find it, it looks for exactly the same pattern that it found first. I could be wrong.

    I'm too lazy to be proud of being impatient.
Re: pattern matching
by Athanasius (Chancellor) on Oct 12, 2012 at 15:42 UTC

    Hello perltux, and welcome to the Monastery!

    How do I get the second pattern match to assign all matches as multiple elements to the array?

    I think you need to capture the multiple matches first, then separate them using a loop:

    my @chunks = $sysex_dump =~ /\xF0C.[~z]((?:..LM 0087[A-Z][A-Z].+?)+)\ +xF7/gs; my @parts; for (@chunks) { push @parts, $1 while /(..LM 0087[A-Z][A-Z].+?)/gs; }

    This code is untested (you provided no sample data), but it should give you a workable approach. Note that in the first regex, the second + quantifier needs to be within the capturing parentheses, so I have added non-capturing parentheses (?:) to define this quantifier’s scope.

    Update: I see you have added a “self contained code example” to your original post. From this, it appears that the data to be matched contains literal . characters (periods, full stops.) But your regex matches these with:

    (..LM 0087[A-Z][A-Z].+?) # ^^ <= here they are

    which matches any two characters. To match periods only, backslash them in the regex:

    /\xF0C.[~z](\.\.LM 0087[A-Z][A-Z].+?)+\xF7/s

    Hope that helps,

    Athanasius <°(((><contra mundum

      Thanks,looks like this works but it's very contrived, surely there must be an easier way to do it all in on go?
      I have tried this now but unfortunately it still doesn't work, the second match in the line starting with 'push' matches too few bytes, it needs to match all bytes until the next occurence of (..LM 0087A-ZA-Z.+?). If I remove the '?' at the end then it matches all occurrences at once. How do I get this match right to match all bytes until the next occurence of (..LM 0087A-ZA-Z.+?) ?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://998704]
Approved by nemesdani
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2018-03-20 01:09 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (247 votes). Check out past polls.