Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

How to run index on this one?

by Anonymous Monk
on Nov 25, 2009 at 15:03 UTC ( #809345=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks!
I have some strings like:
$string='YYSFTVMETDPVN[115]HMVGVISVEGRPGLFWFN[115]ISGGDK';
I want to get all positions in the string that have the letter N and then followed by [115]. It's always the same pattern to search, namely  N[115].
But my problem is that, if I use the index function (which I thought was the right way to go, won't it count for instance the first occurence of [115] as 5 characters? And, consequently, the index function would give a false position for the second 'N' in the string?
Thank you!

Replies are listed 'Best First'.
Re: How to run index on this one?
by cdarke (Prior) on Nov 25, 2009 at 15:17 UTC
    index returns an offset from the beginning of the string, although I'm not really sure what you mean by "false position". For example:
    use strict; use warnings; my $string='YYSFTVMETDPVN[115]HMVGVISVEGRPGLFWFN[115]ISGGDK'; my $offset = 0; my $substring = 'N[115]'; my @positions; while (1) { my $pos = index($string, $substring, $offset); last if $pos < 0; push @positions,$pos; $offset = $pos + length($substring); } print "Positions: @positions\n";
    Gives:
    Positions: 12 35
      Thanks for the reply!
      But, if you look at the string, I just want the position of 'N', without taking into consideration the [115] thing. In the code, you provided, it works ok for the first 'N', but for the second 'N', the correct position is 30 and not 35(this is because you count the first [115] and this increases the length of the string.
        OK. I said I didn't understand what you meant! How about this then (damn the hard-coded 5):
        use strict; use warnings; my $string='YYSFTVMETDPVN[115]HMVGVISVEGRPGLFWFN[115]ISGGDKN[115]'; my $offset = 0; my $substring = 'N[115]'; my @positions; while (1) { my $pos = index($string, $substring, $offset); last if $pos < 0; $offset = $pos + length($substring); $pos -= 5 * @positions; push @positions,$pos; } print "Positions: @positions\n";
        Gives:
        Positions: 12 30 37
        I added another N for testing.

        Update:I have recieved a /msg puzzling about a line of code. Here is a description of $pos -= 5 * @positions;
        We need to adjust the position by the length of [115] (5 characters), not just just once but for each position found. I use @positions in scalar context to get me the number of elements (scalar context is forced by the multiplication operator *). The first time around the loop the number of elements in @positions is zero, so $pos is not decremented. Subsequently it is decremented by 5 * number-of-items-found.
Re: How to run index on this one?
by johngg (Canon) on Nov 25, 2009 at 16:06 UTC

    How about using split rather than index? Using cdarke's extended data shows that you have to use a third argument to split of -1 to allow the trailing empty element if the split term is at the end of the line.

    $ perl -le ' > $str = q{YYSFTVMETDPVN[115]HMVGVISVEGRPGLFWFN[115]ISGGDKN[115]}; > @chunks = split m{(?<=N)\[115\]}, $str, -1; > $posn = -1; > print $posn += length for @chunks[ 0 .. $#chunks - 1 ];' 12 30 37 $

    I hope this is of interest.

    Cheers,

    JohnGG

Re: How to run index on this one?
by codeacrobat (Chaplain) on Nov 25, 2009 at 20:01 UTC
    How about a lookahead assertion and the perl5.10 (*FAIL) pattern to find all matches :-)
    perl -e '$_="YYSFTVMETDPVN[115]HMVGVISVEGRPGLFWFN[115]ISGGDK"; /N(?=\ +[115\])(?{print pos, "\n"})(*FAIL)/' 13 36
    See perlre for more information.

    print+qq(\L@{[ref\&@]}@{['@'x7^'!#2/"!4']});
Re: How to run index on this one?
by AnomalousMonk (Bishop) on Nov 25, 2009 at 23:26 UTC
    Yet another approach avoiding  split or  index (but using the latest requirements):
    >perl -wMstrict -le "sub substr_offsets { my ($str, $substr, $subst) = @_; $subst = qq{\0} unless defined $subst and length $subst; die qq{substitution string '$subst' found in '$str'} if $str =~ m{\Q$subst\E}xms; die 'substring same as substitution string' if $substr eq $subst; $str =~ s{ \Q$substr\E }{$subst}xmsg; my @offsets; $str =~ s{ \Q$subst\E }{ push @offsets, pos $str }xmsge; return @offsets; } for (@ARGV) { my @offsets = substr_offsets($_, 'n[115]'); print qq{string: '$_'; offsets: (@offsets)}; } " "" a ab n[115] An[115] n[115]A n[115]n[115] n[115]n[115]n[115] An[115]n[115]n[115] n[115]n[115]n[115]A 01234n[115]012345n[115]n[115] string: ''; offsets: () string: 'a'; offsets: () string: 'ab'; offsets: () string: 'n[115]'; offsets: (0) string: 'An[115]'; offsets: (1) string: 'n[115]A'; offsets: (0) string: 'n[115]n[115]'; offsets: (0 1) string: 'n[115]n[115]n[115]'; offsets: (0 1 2) string: 'An[115]n[115]n[115]'; offsets: (1 2 3) string: 'n[115]n[115]n[115]A'; offsets: (0 1 2) string: '01234n[115]012345n[115]n[115]'; offsets: (5 12 13)
Re: How to run index on this one?
by johngg (Canon) on Nov 26, 2009 at 18:36 UTC

    A solution that uses index and substr.

    $ perl -le ' > $str = q{YYSFTVMETDPVN[115]HMVGVISVEGRPGLFWFN[115]ISGGDKN[115]}; > $posn = -1; > while ( ( $posn = index $str, q{N[115]}, $posn ) > -1 ) > { > print +( $posn ++ ); > substr $str, $posn, 5, q{}; > }' 12 30 37 $

    Making the solution more general and avoiding modifying the original string.

    $ perl -Mstrict -wle ' > print for findPosns( > q{YYSFTVMETDPVN[115]HMVGVISVEGRPGLFWFN[115]ISGGDKN[115]}, > q{N}, > q{[115]} > ); > > sub findPosns > { > my ( $str, $textKeep, $textThrow ) = @_; > > my $text = $textKeep . $textThrow; > my $len = length $textThrow; > my @posns = (); > my $posn = -1; > while ( ( $posn = index $str, $text, $posn ) > -1 ) > { > push @posns, $posn ++; > substr $str, $posn, $len, q{}; > } > return @posns; > }' 12 30 37 $

    I hope this is useful.

    Cheers,

    JohnGG

Re: How to run index on this one?
by afresh1 (Hermit) on Nov 25, 2009 at 21:10 UTC

    I wondered what would happen if other letters had square bracketed numbers, I solved that in this way. Seems a regex is the "wrong" solution but it was what struck me first.

    use strict; use warnings; my $string='YYSFTVME[13]TDPVN[115]HMVGVISVE[13]GRPGLFWFN[115]ISGGDKN[1 +15]'; my $substring = 'N[115]'; my @chars = $string =~ /(.(?:\[\d+\])?)/gxms; my @positions; foreach my $i (0..$#chars) { push @positions, $i if $chars[$i] eq $substring; } print "Positions: @positions\n";
    l8rZ,
    --
    andrew
Re: How to run index on this one?
by bichonfrise74 (Vicar) on Nov 25, 2009 at 22:41 UTC
    Another way...
    #!/usr/bin/perl use strict; my $string = 'YYSFTVMETDPVN[115]HMVGV' . 'ISVEGRPGLFWFN[115]ISGGDK'; my @pos; my $test = 'N[115]'; for ( my $i = 0; $i < length( $string ); $i++ ) { push( @pos, $i ) if ( substr( $string, $i, 6 ) eq $test ); } @pos = map { $_ == 0 ? $pos[$_] : $pos[$_] - 5 } (0 .. $#pos ); print "Found N at positions: ". join( ', ', @pos ), "\n";

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://809345]
Approved by biohisham
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2021-10-18 16:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My first memorable Perl project was:







    Results (74 votes). Check out past polls.

    Notices?