Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Find Number in String then Ignore Characters proceeding

by BenPen95 (Initiate)
on May 29, 2012 at 20:13 UTC ( [id://973113]=perlquestion: print w/replies, xml ) Need Help??

BenPen95 has asked for the wisdom of the Perl Monks concerning the following question:

I have a string such as: .,a..A,,C..+4ACGTG.,-2TG,,...,a.

I want to find + or - then ignore the certain number of characters which is defined by the number after the + or -

The output should be .,a..A,,C..G.,,,...,a.

  • Comment on Find Number in String then Ignore Characters proceeding

Replies are listed 'Best First'.
Re: Find Number in String then Ignore Characters proceeding
by Eliya (Vicar) on May 29, 2012 at 21:05 UTC
    $str =~ s/[+-](\d+)(??{".{$1}"})//g;

    (Unfortunately, the straightforward attempt $str =~ s/[+-](\d+).{\1}//g; doesn't work.)

    See (??{ code }).

    Upd: changed (??{"\\w{$1}"}) to (??{".{$1}"}), just in case you need to remove any character, not just alphanumeric.

      Thank you, This was my first time posting and you guys are awesome!

      I was trying the straight forward way after the first responce.

      How come the straight forward way doesn't work? And how come \\w rather than \w works?

        (Unfortunately, the straightforward attempt  $str =~ s/[+-](\d+).{\1}//g; doesn't work.)
        How come the straight forward way doesn't work?

        Because the regex compiler will attempt to compile the entire  [+-](\d+).{\1} regex (the 'search' regex of the substitution) at compile time, but the  \1 backreference of the  .{\1} counted quantifier sub-expression is not known until run time, when something may be captured that it can actually refer back to. OTOH, the  (??{".{$1}"}) 'postponed' extended pattern is specifically designed to both compile and run at run-time.

        BTW: The use of  $^N is, IMHO, 'safer' than the use of  $1 in the sub-expression  (??{ ".{$1}" }) (making it  (??{ ".{$^N}" }) instead) because  $^N equals the contents of the most recently closed capturing group and will not change (semantically) if the relative positional relationship between that capture group and the use of  $^N does not change; whereas adding another capture group anywhere before the  (\d+) group will change the semantics of  $1 because capture group counting will change.

        And how come \\w rather than \w works?

        The double backslash is just because it's in a double-quoted string, so a literal \w remains in the runtime constructed regex pattern fragment.

Re: Find Number in String then Ignore Characters proceeding
by aaron_baugher (Curate) on May 29, 2012 at 21:33 UTC

    If you know that the characters in question will always be uppercase letters (or some other particular character set that doesn't include the next + or -), it's fairly easy: capture the digits and letters that follow a + or -, and use substr to drop the correct number of letters off the beginning:

    #!/usr/bin/env perl use Modern::Perl; my $str = ".,a..A,,C..+4ACGTG.,-2TG,,...,a"; $str =~ s/[+-](\d+)(\w+)/substr $2, $1/ge; say $str;

    Aaron B.
    Available for small or large Perl jobs; see my home node.

Re: Find Number in String then Ignore Characters proceeding
by snape (Pilgrim) on May 29, 2012 at 20:32 UTC

    This should work. Change the "number of characters" as per your need.You need to modify the code as per your need

    #!/usr/bin/perl use strict; use warnings; my $str = ".,a..A,,C..+4ACGTG.,-2TG,,...,a"; $str =~ s/[+-]?\d\w{4}//g; $str =~ s/[+-]?\d\w+//g; print $str;

    Update 1: Eliya's method works awesomely well. I learnt something today. Thanks Eliya

    Update 2: After several tries, I got this regex and it should also work

    #!/usr/bin/perl use strict; use warnings; my $str = ".,a..A,,C..+4ACGTG.,-2TG,,...,a"; $str =~ s/[+|-](\d*)(\w*)/(substr $2, $1)/ge; print $str;

      The number after the + or - is the number of characters I would like to remove.

      If ..,+3AGCT.,. it should remove +3AGC but leaves the ..,T.,.

Re: Find Number in String then Ignore Characters proceeding
by temporal (Pilgrim) on May 29, 2012 at 21:33 UTC
    A little more legible (if not as elegant) version of what the previous post is doing:
    #! perl my $str = '.,a..A,,C..+4ACGTG.,-2TG,,...,a.'; print replace($str); sub replace { my $str = shift; if ($str =~ m/([+-])(\d*)/) { $str =~ s/\Q$1\E$2.{$2}//; return replace($str); } return $str; }

    Strange things are afoot at the Circle-K.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://973113]
Approved by jdporter
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2024-04-26 02:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found