Unable to constrain the effect of a negative lookaheadby fireblood (Scribe)
|on Apr 14, 2022 at 14:38 UTC||Need Help??|
fireblood has asked for the wisdom of the Perl Monks concerning the following question:
Dear wise ones,
I’m unable to constrain the effect of a negative lookahead to just a specific scope. What I want to do is identify in a text value called SYSPBUFF all instances of the pattern “A = B” that occur prior to the first instance of the pattern “batch =”. There are instances of the pattern “A = B” that occur after the first instance of the pattern “batch =” which I don’t want to match.
An example of a typical value of SYSPBUFF is:
I want my regex to capture “run_type = dev”, “max_monitor_time = 0.25”, and “verbosity_level = 2”, because they precede the string “batch =”, but not to capture “source = sample_document_collection_1”, “files = Confucius.docx”, or “dest = Enterprise:Department” because they do NOT precede the string “batch =”.
Each of the patterns “A = B” may be followed by an optional comma.
My regex is applied repeatedly to the value of SYSPBUFF. Every time it finds a match on the pattern “A = B” which precedes the value “batch =” in the value of SYSPBUFF, it saves the captured information, then reconstructs the value of SYSPBUFF as being everything EXCEPT for the string “A = B” that it just matched, and then tries to match again on the revised value of SYSPBUFF. Being able to do this reconstruction is the reason why all parts of the value of SYSPBUFF are captured into capture buffers.
My regex is the following:
What is happening is that the regex fails to match even once. What I suspect the problem is is that I don’t understand how the scope of the negative lookahead (?!.*batch\s*=) can be limited. I had thought that its scope would be confined to within the parentheses that are labeled “non-capturing group to limit the effect of the following negative lookahead”. But what I think is really happening is that when the pattern (?!.*batch\s*=) is encountered, despite being within a parenthesized group, its effect extends beyond those parentheses, effectively setting the condition that from that point in the regex to the end of the regex there can be no “batch =” pattern present. So when the latter part of the regex stipulates that the pattern “batch =” is a mandatory component of that part of the value of SYSPBUFF, the result is an impossible match. First there is the stipulation (?!.*batch\s*=) that the pattern “batch =” must not be present, and then there is the later stipulation that the pattern “batch =” must be present.
Is there a way to specify that a lookahead pattern that begins with the pattern .* applies only to a particular part of a larger pattern after which it is no longer in effect?