Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

comment on

( [id://3333] : superdoc . print w/replies, xml ) Need Help??
i.e. where are my expectations wrong?

You seem to assume that, for an alternation $a|$b, the regex engine does the following:

  1. It searches for alternative $a in the string</lii>
  2. If it doesn't find a match, it tries alternative $b
However, that's not the case. It does this:
  1. anchor pattern at start of string
  2. try to match alternative $a
  3. if it fails, try to match alternative $b
  4. if there's still no match, anchor pattern at the second character in the string, and start again from No. 2

Perhaps you want something along this line:

m/(?s:.*)Remediation Report\n\n(.+?)\n|^(.+?)\n/;

That searches for the Remediation Report\n\n(.+?)\n part of the regex anywhere in your string, and only if that fails it tries the second regex.

The record in question is one large string with newlines inside it

In the example script I posted, yes.

I updated the example data above to explain how each record is broken up better. I think the Data::Dumper output was a bit confusing. There is only one vulnerability name in each record, so /g shouldn't apply (I believe).

In scalar context the /g modifier doesn't mean "match as often as you can", but rather "start your match at pos $str, and set pos $str after the match". That means you can say stuff like this:

while ($str =~ m/($regex)/){ print $1, "\n"; }

But it's not the only application. You can use it to preserve the pos $str value, and then apply a different regex against it.


In reply to Re^7: Regex problems using '|' by moritz
in thread Regex problems using '|' by romandas

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.