Combining Regex

by neversaint (Deacon)
Dear Masters,
I want a single regex that match these lines except the last one where it contains <EXP-N\d+>.
I am stuck with this code (it also reflects the core pattern and it's ordering desired for the match)

What's the right way to do it?

Re: Combining Regex
by BrowserUk (Pope) on Jul 23, 2013 at 09:30 UTC

    @l = ...; m[ <MIR-\d+> (?:<EXP-V-\d+>)? (?:<ASSC-PHRASE-\d+>|<ART-\d+>)? (?:<BE-V>)? <VACCVIRUS-PROP-\d+> (?:<PATTERN-\d+>)? ]x and print for @l;; <MIR-1><EXP-V-3><VACCVIRUS-PROP-1> <MIR-1><ASSC-PHRASE-1><VACCVIRUS-PROP-1><PATTERN-1> <MIR-1><EXP-V-0><ART-0><VACCVIRUS-PROP-1> <MIR-1><EXP-V-0><ART-0><BE-V><VACCVIRUS-PROP-1>

    Or more simply:

    $_ !~ m[<EXP-N-\d+>] and print for @l;; <MIR-1><EXP-V-3><VACCVIRUS-PROP-1> <MIR-1><ASSC-PHRASE-1><VACCVIRUS-PROP-1><PATTERN-1> <MIR-1><EXP-V-0><ART-0><VACCVIRUS-PROP-1> <MIR-1><EXP-V-0><ART-0><BE-V><VACCVIRUS-PROP-1>

Re: Combining Regex
by tobyink (Canon) on Jul 23, 2013 at 09:30 UTC

    Is there something wrong with the answer you received here?

Re: Combining Regex
by Happy-the-monk (Canon) on Jul 23, 2013 at 09:32 UTC


    Both your ".+" parts of the regex ask to match something where in your example data there isn't anything. Take 'em out.

Re: Combining Regex
by Loops (Curate) on Jul 23, 2013 at 09:33 UTC

    Okay you changed the question a few times while I was composing this reply ;o). Have to say i'm still left guessing what the ordering rules are for the angle bracket segments. I picked one ordering that gives the results you're requesting, but you may still have to tweak them a bit. The main idea is to ignore whitespace in the regex using the /x parameter so that you can format the regex for readability:

    use strict; use warnings; while (<DATA>) { print if / <MIR-\d+> ( <EXP-V-\d+> (<ART-\d+>)* (<BE-V>)* | <ASSC-PHRASE-\d+> ) <VACCVIRUS-PROP-\d+>/x; } __DATA__ <MIR-1><EXP-V-3><VACCVIRUS-PROP-1> <MIR-1><ASSC-PHRASE-1><VACCVIRUS-PROP-1><PATTERN-1> <MIR-1><EXP-V-0><ART-0><VACCVIRUS-PROP-1> <MIR-1><EXP-V-0><ART-0><BE-V><VACCVIRUS-PROP-1> <MIR-1><EXP-V-0><EXP-N-0><VACCVIRUS-PROP-1>
      Hi. Thanks. The pattern
      <ART-\d+> <BE-V>)
      is optional, and it can be anything. The core pattern is
      <MIR-\d+> <EXP-V-\d+>|<ASSC-PHRASE-\d+> <VACCVIRUS-PROP-\d+>
Re: Combining Regex
by AnomalousMonk (Bishop) on Jul 23, 2013 at 14:33 UTC

    Your OP node title specifically refers to combining regexes, so here's an approach that decomposes what seem to be the essential elements of your regex and re-combines them to form the final matching regex. I find a decompositional approach makes it easier to think about a regex (especially a complex one) when writing it, and to maintain it later. (Note: Some of your StackOverflow examples have leading characters before the  $mir pattern. If this is really the case, eliminate the  \A absolute-beginning-of-string anchor from the matching regex.)

    Another Note: If it's just a matter of excluding anything matching  <EXP-N-\d+> then BrowserUk's 'simpler' solution here is by far the best.

    >perl -wMstrict -le "my @strs = qw( <MIR-1><EXP-V-3><VACCVIRUS-PROP-1> <MIR-1><ASSC-PHRASE-1><VACCVIRUS-PROP-1><PATTERN-1> <MIR-1><EXP-V-0><ART-0><VACCVIRUS-PROP-1> <MIR-1><EXP-V-0><ART-0><BE-V><VACCVIRUS-PROP-1> <MIR-1><EXP-V-0><EXP-N-0><VACCVIRUS-PROP-1> ); ;; my $tail = qr{ \d+ > }xms; ;; my $mir = qr{ < MIR- $tail }xms; my $exp_v = qr{ < EXP-V- $tail }xms; my $exp_n = qr{ < EXP-N- $tail }xms; my $assc_phr = qr{ < ASSC-PHRASE- $tail }xms; my $vaccvir = qr{ < VACCVIRUS-PROP- $tail }xms; ;; for my $str (@strs) { print qq{'$str'} if $str =~ m{ \A $mir (?: $exp_v (?! $exp_n) | $assc_phr ) .*? $vaccvir }xms; } " '<MIR-1><EXP-V-3><VACCVIRUS-PROP-1>' '<MIR-1><ASSC-PHRASE-1><VACCVIRUS-PROP-1><PATTERN-1>' '<MIR-1><EXP-V-0><ART-0><VACCVIRUS-PROP-1>' '<MIR-1><EXP-V-0><ART-0><BE-V><VACCVIRUS-PROP-1>'

