http://www.perlmonks.org?node_id=813002


in reply to Re^3: Regex fun
in thread Regex fun

It still didn't work as needed. That didn't remove the bases after the +N. However it did help me a lot and I managed to get a version which does indeed work as I want it (prints just to keep me in clear that what happens is indeed what should happen). I also added the || last to prevent that eternal loop you mentioned.

while (m/[+-]([0-9]+)/g) { printf "pos ($1): %d -> ",pos($_)-1; pos($_) -= length($1)+1; printf "pos ($1): %d\n",pos($_)-1; s/\G[+-]$1[ACGTNacgtn]{$1}// || last; }
So thanks for helping me out on this :).

Though on the whole. I think that the (??{...}) would be best choice as repositioning the pos() is probably not a good idea in general. Note that I included the negative as well as positive match in this as that would remove the next regex I have :).

Replies are listed 'Best First'.
Re^5: Regex fun
by JavaFan (Canon) on Dec 16, 2009 at 10:54 UTC
    I think that the (??{...}) would be best choice as repositioning the pos() is probably not a good idea in general.
    What's the problem with setting pos()? (??{...}) and (?{...}) have their share of problems (many related to scoping of variables), and I avoid them as much as possible - and if I use them, I keep the code it in as simple as possible.

    What I don't understand is your use of 'last'. It means that if you have "+2A." in your string, nothing following it will change.

      What's the problem with setting pos()?
      Yes the pos() can be set, however I "feel" I'm changing things which I shouldn't if I can avoid it. My opinion is that the (??{...}) construct is better.

      What I don't understand is your use of 'last'. It means that if you have "+2A." in your string, nothing following it will change.
      I have a test later which will inform me that the cleaning wasn't complete (string length comparison as I have another string which should be equal length after this, base quality column in the input). As input like that is an error and shouldn't be there so I need to know that and not try to fix it in script.
Re^5: Regex fun
by JadeNB (Chaplain) on Dec 16, 2009 at 15:57 UTC
    It still didn't work as needed. That didn't remove the bases after the +N.

    I guess you mean the s/// while s/// code in Re^3: Regex fun? Yes, I know it doesn't work, but was just posting it as an example of what would work if Perl changed to fit my whims. :-) I should probably make that clearer.

    The actual fixed-but-inelegant code is in Re: Regex fun, in an update at the bottom. I tested it on your input, and it seemed to give the desired output. (It doesn't handle -N escapes, which you seem to want, because I didn't see anything about that in your original specification; but it would be easy enough to adapt it.)

    pos($_) -= length($1)+1;

    You may want to try pos = $-[0] instead. See pos (for implicit action on $_) and @-.