http://www.perlmonks.org?node_id=1200743
NetWallah has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed RegEx-Monkers:

In parsing a log file lines that contains XML-ish content with a single regex, I'm having trouble understanding the subtleties of optional capture.

The string I'm parsing is like:

<blah1 phase="2" type="MyType" more_keys="Values" <Unwanted/> <SomeTa +gIwant><k1="v1"></SomeTagIwant>
And I'm trying to extract the content of the "type", and the tag name of a tag that ends with "TagIwant".
The Tag may or may not be present.

I'm able to capture both pieces with the RE:

\btype="([^"]+)".+<(\w+TagIwant\b)
but - the match fails if I append a "?" to the expression, in an attempt to make it optional.
I.e. this fails:
perl -E '$x=q|<blah1 phase="2" type="MyType" more_keys="Values" <Unwa +nted/> <SomeTagIwant><k1="v1"></SomeTagIwant>|; say for $x=~/\btype +="([^"]+)".+<(\w+TagIwant\b)?/'
Which returns only "MyType", and not the second expected capture of "SomeTagIwant".

The "\b" is an attempt to deal with variations like <SomeTagIwant/> and <SomeTagIwant k3="v3" /> .

I'm hoping for (1) Explanations for why the "?" fails, and (2) Suggestions on how to fix it.

                All power corrupts, but we need electricity.