Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Find Number in String then Ignore Characters proceeding

by BenPen95 (Initiate)
on May 29, 2012 at 20:13 UTC ( #973113=perlquestion: print w/ replies, xml ) Need Help??
BenPen95 has asked for the wisdom of the Perl Monks concerning the following question:

I have a string such as: .,a..A,,C..+4ACGTG.,-2TG,,...,a.

I want to find + or - then ignore the certain number of characters which is defined by the number after the + or -

The output should be .,a..A,,C..G.,,,...,a.

Comment on Find Number in String then Ignore Characters proceeding
Replies are listed 'Best First'.
Re: Find Number in String then Ignore Characters proceeding
by Eliya (Vicar) on May 29, 2012 at 21:05 UTC
    $str =~ s/[+-](\d+)(??{".{$1}"})//g;

    (Unfortunately, the straightforward attempt $str =~ s/[+-](\d+).{\1}//g; doesn't work.)

    See (??{ code }).

    Upd: changed (??{"\\w{$1}"}) to (??{".{$1}"}), just in case you need to remove any character, not just alphanumeric.

      Thank you, This was my first time posting and you guys are awesome!

      I was trying the straight forward way after the first responce.

      How come the straight forward way doesn't work? And how come \\w rather than \w works?

        (Unfortunately, the straightforward attempt  $str =~ s/[+-](\d+).{\1}//g; doesn't work.)
        How come the straight forward way doesn't work?

        Because the regex compiler will attempt to compile the entire  [+-](\d+).{\1} regex (the 'search' regex of the substitution) at compile time, but the  \1 backreference of the  .{\1} counted quantifier sub-expression is not known until run time, when something may be captured that it can actually refer back to. OTOH, the  (??{".{$1}"}) 'postponed' extended pattern is specifically designed to both compile and run at run-time.

        BTW: The use of  $^N is, IMHO, 'safer' than the use of  $1 in the sub-expression  (??{ ".{$1}" }) (making it  (??{ ".{$^N}" }) instead) because  $^N equals the contents of the most recently closed capturing group and will not change (semantically) if the relative positional relationship between that capture group and the use of  $^N does not change; whereas adding another capture group anywhere before the  (\d+) group will change the semantics of  $1 because capture group counting will change.

        And how come \\w rather than \w works?

        The double backslash is just because it's in a double-quoted string, so a literal \w remains in the runtime constructed regex pattern fragment.

Re: Find Number in String then Ignore Characters proceeding
by aaron_baugher (Curate) on May 29, 2012 at 21:33 UTC

    If you know that the characters in question will always be uppercase letters (or some other particular character set that doesn't include the next + or -), it's fairly easy: capture the digits and letters that follow a + or -, and use substr to drop the correct number of letters off the beginning:

    #!/usr/bin/env perl use Modern::Perl; my $str = ".,a..A,,C..+4ACGTG.,-2TG,,...,a"; $str =~ s/[+-](\d+)(\w+)/substr $2, $1/ge; say $str;

    Aaron B.
    Available for small or large Perl jobs; see my home node.

Re: Find Number in String then Ignore Characters proceeding
by snape (Pilgrim) on May 29, 2012 at 20:32 UTC

    This should work. Change the "number of characters" as per your need.You need to modify the code as per your need

    #!/usr/bin/perl use strict; use warnings; my $str = ".,a..A,,C..+4ACGTG.,-2TG,,...,a"; $str =~ s/[+-]?\d\w{4}//g; $str =~ s/[+-]?\d\w+//g; print $str;

    Update 1: Eliya's method works awesomely well. I learnt something today. Thanks Eliya

    Update 2: After several tries, I got this regex and it should also work

    #!/usr/bin/perl use strict; use warnings; my $str = ".,a..A,,C..+4ACGTG.,-2TG,,...,a"; $str =~ s/[+|-](\d*)(\w*)/(substr $2, $1)/ge; print $str;

      The number after the + or - is the number of characters I would like to remove.

      If ..,+3AGCT.,. it should remove +3AGC but leaves the ..,T.,.

Re: Find Number in String then Ignore Characters proceeding
by temporal (Pilgrim) on May 29, 2012 at 21:33 UTC
    A little more legible (if not as elegant) version of what the previous post is doing:
    #! perl my $str = '.,a..A,,C..+4ACGTG.,-2TG,,...,a.'; print replace($str); sub replace { my $str = shift; if ($str =~ m/([+-])(\d*)/) { $str =~ s/\Q$1\E$2.{$2}//; return replace($str); } return $str; }

    Strange things are afoot at the Circle-K.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://973113]
Approved by jdporter
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (6)
As of 2016-02-07 11:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How many photographs, souvenirs, artworks, trophies or other decorative objects are displayed in your home?





    Results (251 votes), past polls