Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: Efficient walk/iterate along a string

by GrandFather (Cardinal)
on Nov 23, 2010 at 06:52 UTC ( #873121=note: print w/ replies, xml ) Need Help??


in reply to Re: Efficient walk/iterate along a string
in thread Efficient walk/iterate along a string

If I understand the Trypsin example (I have no knowledge of this problem domain) the following code leverages Perl's string handling to produce the results I think you are looking for in this narrow case.

use strict; use warnings; my $str = 'GVGFTVILISFKYVGFFYNVIIAWALHYFFSSFTMDLPRWIHCNNTWNSPNCSDAHASN +SSDGLGLNDTFGTTPAAEYFER'; my @segments = map {[$_, (scalar tr/STY//)]} split /(?<=[KR])/, $str; printf "%45s: %d\n", @$_ for @segments;

Prints:

GVGFTVILISFK: 2 YVGFFYNVIIAWALHYFFSSFTMDLPR: 6 WIHCNNTWNSPNCSDAHASNSSDGLGLNDTFGTTPAAEYFER: 10

I've no idea how the performance may compare with whatever you are doing now, but if this does anything like the task you need then I'm sure it will be an order of magnitude faster than a character at a time implementation.

True laziness is hard work


Comment on Re^2: Efficient walk/iterate along a string
Select or Download Code
Re^3: Efficient walk/iterate along a string
by gje21c (Acolyte) on Nov 24, 2010 at 23:29 UTC
    Monks

    I tried the split/tr/map approach suggested by Grandfather and got ~ 60% reduction in time, great ! (and yes you got my app requirement exactly right)

    I think should withdraw my words that there's nothing to be gained from the special functions when you're just marching down strings and doing a seemingly arbitrary logic function (and also my words about inadequate context). Splitting at simple marker points and counting a simple set of chars within the splits, combined with Perl's superb list handling, is not arbitrary spaghetti logic. Lesson learned.

    One query: the ?<= construct was new to me, I've played with it and read up on it (perlre/Extended Patterns/Look-Around Assertions) and I'm 80% there but if anyone could explain exactly how this RE is ensuring I get exactly what I need I'd be grateful.

    Will continue to check some other posts, thanks all

    GJE

      I'm disappointed at only a 60% improvement! Maybe the segments are shorter than I would have guessed. This technique would give the best improvement for long strings containing few segments.

      I used a look behind in the split regular expression to retain the matched character at the end of the previous segment. Use a [KR] match to drop the 'K' or 'R', or use a look ahead match (?=[KR]) to retain the matched character at the start of the segment.

      Look behind and look ahead are anchors, they don't "consume" the matched characters, they just mark a position where some match condition is true. In this case we are testing for "was the last character R or K?".

      True laziness is hard work
Re^3: Efficient walk/iterate along a string
by Anonymous Monk on Dec 01, 2010 at 06:26 UTC
    Grandfather,

    Hope I've got the techno details and the "protocol" right here.

    I replied to your repy on my main thread as below. Appreciate if you could elucidate (love that word!) the regexp as per my words below. Various explorations with variants on ?<= give me the protein string split on the KR characters, with or without null strings, strings of just K or R, strings where consective K/R were condensed to just one, or none. I'm sure I can work it out with about 5 hours of exploration, but if you could just put into words what the ?<= is doibg I'd ber grateful.

    Thanks, GE.

    Cheers, GE

    -------------

    Monks
    I tried the split/tr/map approach suggested by Grandfather and got ~ 60% reduction in time, great ! (and yes you got my app requirement exactly right)

    I think should withdraw my words that there's nothing to be gained from the special functions when you're just marching down strings and doing a seemingly arbitrary logic function (and also my words about inadequate context). Splitting at simple marker points and counting a simple set of chars within the splits, combined with Perl's superb list handling, is not arbitrary spaghetti logic. Lesson learned.

    One query: the ?<= construct was new to me, I've played with it and read up on it (perlre/Extended Patterns/Look-Around Assertions) and I'm 80% there but if anyone could explain exactly how this RE is ensuring I get exactly what I need I'd be grateful.

    Will continue to check some other posts, thanks all

    GJE

    20101206 Janitored by Corion: Added formatting, code tags, as per Writeup Formatting Tips

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://873121]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (8)
As of 2014-11-22 16:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (123 votes), past polls