Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^3: Regex fun

by JadeNB (Chaplain)
on Dec 16, 2009 at 08:22 UTC ( #812993=note: print w/replies, xml ) Need Help??


in reply to Re^2: Regex fun
in thread Regex fun

I would like single regex as, as was pointed out, the double regex is wasteful since the matching has to be done twice.

The 2-regex version that I proposed avoids a lot of the double matching (it converts 2 number searches into 1 number search and then a hunt for a literal string). However, only benchmarking (which I'm too lazy to do) will show whether it's actually faster.

If s/// set pos (and behaved like m// in a while loop), then one could avoid any doubled effort at all:

s/\G\+$1[$bases]{$1}// while s/\+([0-9]+)//g;
(UPDATE: but note that this is fanciful, non-working code). In fact, that's what I had originally, until I tested it and discovered that it didn't work.

Running yours left strange things in it (+-1 and +0 below).

Oops, sorry! My original regex wasn't smart enough to stop grabbing bases once the count indicated that there were no more needed. I've fixed it, but, since the only point was its brevity (it's frightfully inefficient), it's not much fun any more.

I think adding /c in the match pattern should fix the problem with my while and possibly the (??{...}) version as well?

I'm not sure what you mean. I am pretty sure (but can't find the documentation) that /c only has an effect on the semantics of failed matches, and those aren't our problem here. Notice that (??{ }) doesn't have a problem to be fixed—the application you have in mind is essentially exactly for what that escape was designed (I assume), and it doesn't require any extra trickery.

Replies are listed 'Best First'.
Re^4: Regex fun
by Hena (Friar) on Dec 16, 2009 at 10:18 UTC
    It still didn't work as needed. That didn't remove the bases after the +N. However it did help me a lot and I managed to get a version which does indeed work as I want it (prints just to keep me in clear that what happens is indeed what should happen). I also added the || last to prevent that eternal loop you mentioned.

    while (m/[+-]([0-9]+)/g) { printf "pos ($1): %d -> ",pos($_)-1; pos($_) -= length($1)+1; printf "pos ($1): %d\n",pos($_)-1; s/\G[+-]$1[ACGTNacgtn]{$1}// || last; }
    So thanks for helping me out on this :).

    Though on the whole. I think that the (??{...}) would be best choice as repositioning the pos() is probably not a good idea in general. Note that I included the negative as well as positive match in this as that would remove the next regex I have :).
      I think that the (??{...}) would be best choice as repositioning the pos() is probably not a good idea in general.
      What's the problem with setting pos()? (??{...}) and (?{...}) have their share of problems (many related to scoping of variables), and I avoid them as much as possible - and if I use them, I keep the code it in as simple as possible.

      What I don't understand is your use of 'last'. It means that if you have "+2A." in your string, nothing following it will change.

        What's the problem with setting pos()?
        Yes the pos() can be set, however I "feel" I'm changing things which I shouldn't if I can avoid it. My opinion is that the (??{...}) construct is better.

        What I don't understand is your use of 'last'. It means that if you have "+2A." in your string, nothing following it will change.
        I have a test later which will inform me that the cleaning wasn't complete (string length comparison as I have another string which should be equal length after this, base quality column in the input). As input like that is an error and shouldn't be there so I need to know that and not try to fix it in script.
      It still didn't work as needed. That didn't remove the bases after the +N.

      I guess you mean the s/// while s/// code in Re^3: Regex fun? Yes, I know it doesn't work, but was just posting it as an example of what would work if Perl changed to fit my whims. :-) I should probably make that clearer.

      The actual fixed-but-inelegant code is in Re: Regex fun, in an update at the bottom. I tested it on your input, and it seemed to give the desired output. (It doesn't handle -N escapes, which you seem to want, because I didn't see anything about that in your original specification; but it would be easy enough to adapt it.)

      pos($_) -= length($1)+1;

      You may want to try pos = $-[0] instead. See pos (for implicit action on $_) and @-.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://812993]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2021-04-14 20:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?