Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Combining Regex

by neversaint (Deacon)
on Jul 23, 2013 at 09:05 UTC ( #1045801=perlquestion: print w/replies, xml ) Need Help??

neversaint has asked for the wisdom of the Perl Monks concerning the following question:

Dear Masters,
I want a single regex that match these lines except the last one where it contains <EXP-N\d+>.
<MIR-1><EXP-V-3><VACCVIRUS-PROP-1> <MIR-1><ASSC-PHRASE-1><VACCVIRUS-PROP-1><PATTERN-1> <MIR-1><EXP-V-0><ART-0><VACCVIRUS-PROP-1> <MIR-1><EXP-V-0><ART-0><BE-V><VACCVIRUS-PROP-1> <MIR-1><EXP-V-0><EXP-N-0><VACCVIRUS-PROP-1>
I am stuck with this code (it also reflects the core pattern and it's ordering desired for the match)
<MIR-\d+>(?:<EXP-V-\d+>|<ASSC-PHRASE-\d+>).+<VACCVIRUS-PROP-\d+>.+
http://rubular.com/r/Z5sZ0nv7n1

What's the right way to do it?

---
neversaint and everlastingly indebted.......

Replies are listed 'Best First'.
Re: Combining Regex
by BrowserUk (Pope) on Jul 23, 2013 at 09:30 UTC

    @l = ...; m[ <MIR-\d+> (?:<EXP-V-\d+>)? (?:<ASSC-PHRASE-\d+>|<ART-\d+>)? (?:<BE-V>)? <VACCVIRUS-PROP-\d+> (?:<PATTERN-\d+>)? ]x and print for @l;; <MIR-1><EXP-V-3><VACCVIRUS-PROP-1> <MIR-1><ASSC-PHRASE-1><VACCVIRUS-PROP-1><PATTERN-1> <MIR-1><EXP-V-0><ART-0><VACCVIRUS-PROP-1> <MIR-1><EXP-V-0><ART-0><BE-V><VACCVIRUS-PROP-1>

    Or more simply:

    $_ !~ m[<EXP-N-\d+>] and print for @l;; <MIR-1><EXP-V-3><VACCVIRUS-PROP-1> <MIR-1><ASSC-PHRASE-1><VACCVIRUS-PROP-1><PATTERN-1> <MIR-1><EXP-V-0><ART-0><VACCVIRUS-PROP-1> <MIR-1><EXP-V-0><ART-0><BE-V><VACCVIRUS-PROP-1>

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Combining Regex
by tobyink (Canon) on Jul 23, 2013 at 09:30 UTC

    Is there something wrong with the answer you received here?

    package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name
Re: Combining Regex
by Happy-the-monk (Canon) on Jul 23, 2013 at 09:32 UTC

    .+

    Both your ".+" parts of the regex ask to match something where in your example data there isn't anything. Take 'em out.

    Cheers, Sören

    Créateur des bugs mobiles - let loose once, run everywhere.
    (hooked on the Perl Programming language)

Re: Combining Regex
by Loops (Curate) on Jul 23, 2013 at 09:33 UTC

    Okay you changed the question a few times while I was composing this reply ;o). Have to say i'm still left guessing what the ordering rules are for the angle bracket segments. I picked one ordering that gives the results you're requesting, but you may still have to tweak them a bit. The main idea is to ignore whitespace in the regex using the /x parameter so that you can format the regex for readability:

    use strict; use warnings; while (<DATA>) { print if / <MIR-\d+> ( <EXP-V-\d+> (<ART-\d+>)* (<BE-V>)* | <ASSC-PHRASE-\d+> ) <VACCVIRUS-PROP-\d+>/x; } __DATA__ <MIR-1><EXP-V-3><VACCVIRUS-PROP-1> <MIR-1><ASSC-PHRASE-1><VACCVIRUS-PROP-1><PATTERN-1> <MIR-1><EXP-V-0><ART-0><VACCVIRUS-PROP-1> <MIR-1><EXP-V-0><ART-0><BE-V><VACCVIRUS-PROP-1> <MIR-1><EXP-V-0><EXP-N-0><VACCVIRUS-PROP-1>
      Hi. Thanks. The pattern
      <ART-\d+> <BE-V>)
      is optional, and it can be anything. The core pattern is
      <MIR-\d+> <EXP-V-\d+>|<ASSC-PHRASE-\d+> <VACCVIRUS-PROP-\d+>
      ---
      neversaint and everlastingly indebted.......
Re: Combining Regex
by AnomalousMonk (Bishop) on Jul 23, 2013 at 14:33 UTC

    Your OP node title specifically refers to combining regexes, so here's an approach that decomposes what seem to be the essential elements of your regex and re-combines them to form the final matching regex. I find a decompositional approach makes it easier to think about a regex (especially a complex one) when writing it, and to maintain it later. (Note: Some of your StackOverflow examples have leading characters before the  $mir pattern. If this is really the case, eliminate the  \A absolute-beginning-of-string anchor from the matching regex.)

    Another Note: If it's just a matter of excluding anything matching  <EXP-N-\d+> then BrowserUk's 'simpler' solution here is by far the best.

    >perl -wMstrict -le "my @strs = qw( <MIR-1><EXP-V-3><VACCVIRUS-PROP-1> <MIR-1><ASSC-PHRASE-1><VACCVIRUS-PROP-1><PATTERN-1> <MIR-1><EXP-V-0><ART-0><VACCVIRUS-PROP-1> <MIR-1><EXP-V-0><ART-0><BE-V><VACCVIRUS-PROP-1> <MIR-1><EXP-V-0><EXP-N-0><VACCVIRUS-PROP-1> ); ;; my $tail = qr{ \d+ > }xms; ;; my $mir = qr{ < MIR- $tail }xms; my $exp_v = qr{ < EXP-V- $tail }xms; my $exp_n = qr{ < EXP-N- $tail }xms; my $assc_phr = qr{ < ASSC-PHRASE- $tail }xms; my $vaccvir = qr{ < VACCVIRUS-PROP- $tail }xms; ;; for my $str (@strs) { print qq{'$str'} if $str =~ m{ \A $mir (?: $exp_v (?! $exp_n) | $assc_phr ) .*? $vaccvir }xms; } " '<MIR-1><EXP-V-3><VACCVIRUS-PROP-1>' '<MIR-1><ASSC-PHRASE-1><VACCVIRUS-PROP-1><PATTERN-1>' '<MIR-1><EXP-V-0><ART-0><VACCVIRUS-PROP-1>' '<MIR-1><EXP-V-0><ART-0><BE-V><VACCVIRUS-PROP-1>'

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1045801]
Approved by hdb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (9)
As of 2020-02-28 16:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What numbers are you going to focus on primarily in 2020?










    Results (125 votes). Check out past polls.

    Notices?