Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Regex to match range of characters broken by dashes

by AnomalousMonk (Archbishop)
on Jul 16, 2016 at 06:30 UTC ( [id://1167872]=note: print w/replies, xml ) Need Help??


in reply to Regex to match range of characters broken by dashes

Like choroba, I'm wondering: What's supposed to happen to the dash in the 4th position in the second string?
    A-C-G--CTGGC
       ^ dash in 4th position

Assuming it should be replaced by  $tag because it's between the quantified groups of bases, here's a multi-regex solution. (Warning: Needs Perl version 5.10+ for the  \K regex operator — but I can get around that fairly easily if needed.)

c:\@Work\Perl>perl -wMstrict -le "use 5.010; ;; use Test::More 'no_plan'; use Test::NoWarnings; ;; my $tag = '___'; ;; VECTOR: for my $ar_vector ( [ qw(ATCGGATCTGGC AT___CGGA___TCTGGC) ], [ qw(A-C-G--CTGGC A-C___G--CTG___GC) ], ) { if (! ref $ar_vector) { note $ar_vector; next VECTOR; } ;; my ($seq, $expected) = @$ar_vector; my $got = xform($seq); is $got, $expected, qq{'$seq' -> '$expected'}; } ;; done_testing; ;; sub xform { my ($s) = @_; ;; my $u = qr{ [ATGC] -*? }xms; ;; $s =~ s{ $u{2} \K -* }{$tag}xms; $s =~ s{ $u{4} \K -* }{$tag}xms; return $s; } " ok 1 - 'ATCGGATCTGGC' -> 'AT___CGGA___TCTGGC' ok 2 - 'A-C-G--CTGGC' -> 'A-C___G--CTG___GC' 1..2 ok 3 - no warnings 1..3
Of course, more test cases are highly encouraged!

Update: And yes, this does seem like an XY Problem.

Update 2: Here's the pre-5.10 (no \K) version of the code (tested):
    $s =~ s{ ($bu{2}) -* }{$1$tag}xms;
    $s =~ s{ ($bu{4}) -* }{$1$tag}xms;
And versions, also tested, consolidating the two substitutions in a for-loop:
    $s =~ s{  (?:$bu){$_} \K -* }   {$tag}xms for 2, 4;  # 5.10+
    $s =~ s{ ((?:$bu){$_})   -* } {$1$tag}xms for 2, 4;  # pre-5.10
In all these variations,
    my $bu = qr{ [ATGC] -*? }xms;


Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1167872]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2024-04-19 02:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found