Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

The nth occurrence of a character

by Anonymous Monk
on Feb 08, 2006 at 11:14 UTC ( [id://528774]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

How to write an efficient Perl function to return the position of the nth occurance of a specified character within a string?

2006-02-09 Retitled by planetscape, as per Monastery guidelines
Original title: 'The nth occurance of a character'

Replies are listed 'Best First'.
Re: The nth occurrence of a character
by salva (Canon) on Feb 08, 2006 at 11:27 UTC
    my $char = "C"; my $pat = "[^$char]*$char" x $n; my $where; /^($pat)/ and $where = length $1 - 1;
      You could go one step further and use /^$pat/ and $where = $+[0]-1, but that makes use of the @+ array, and I'd probably leave a comment on that line.

      Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
      How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart

        This is certainly cheaper. If capturing can be avoided and this code is important for speed, choose the non-capturing version.

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re: The nth occurrence of a character
by mickeyn (Priest) on Feb 08, 2006 at 15:03 UTC
    allow me to suggest another way that may serve you if you want to access different characters at different occurances many times:

    my %hash; my @split = split //, $string; my @positions = map { [$split[$_], $_] } 1 .. @split-1; undef @split; foreach (@positions){ push @{$hash{$_->[0]}}, $_->[1]; } undef @positions ;

    now, for each character you have a hash entry with all its positions in the string (array).
    if for example you want to get the 3rd occurance of 'A', you will simply need to access: $hash{A}[2]
    (2 will be the 3rd index in the array)

    You wannted efficient function, and this way allows you to get you data in one hash+array access which IMHO is efficient.
    the price you will pay is the one-time building of the hash (per string).

    Enjoy,
    Mickey

Re: The nth occurrence of a character
by duff (Parson) on Feb 08, 2006 at 18:42 UTC

    Does no one use index() anymore?!?

    #!/usr/bin/perl use warnings; use strict; sub find_nth { my ($s,$c,$n) = @_; my $pos = -1; while ($n--) { $pos = index($s,$c,$pos+1); return -1 if $pos == -1; } return $pos; } my $str="Now is the time for all good men to come to the aid of their +country"; my $n = 5; my $c = "o"; print find_nth($str,$c,$n), "\n";

    I'll leave it to those people who like to benchmark to decide whether or not this is an "efficient Perl function" :-)

      Ah well, in a fit of boredom I decided to benchmark the submissions myself. The results are unsurprising (at least to me).

      Here are some results on my system; the first set of benchmarks are for multple occurences in one string, the second set is for a single occurence in multiple strings:

                    Rate RoyJohnson      salva       duff    mickeyn
      RoyJohnson   373/s         --       -86%       -97%       -98%
      salva       2604/s       598%         --       -78%       -88%
      duff       11887/s      3089%       357%         --       -44%
      mickeyn    21159/s      5576%       713%        78%         --
                    Rate RoyJohnson      salva    mickeyn       duff
      RoyJohnson 15272/s         --        -9%       -42%       -48%
      salva      16847/s        10%         --       -36%       -42%
      mickeyn    26298/s        72%        56%         --       -10%
      duff       29202/s        91%        73%        11%         --
      
Re: The nth occurrence of a character
by Roy Johnson (Monsignor) on Feb 08, 2006 at 17:56 UTC
    my $str = 'one a two a three a four'; my $char = 'a'; my $n = 2; $str =~ /(?:.*?$char){$n}/g; print "Pos of '$char' #$n in '$str' is ", pos($str), "\n";

    Caution: Contents may have been coded under pressure.

      You have to make sure that you reset pos($str) if you use this more than once otherwise it continues from the last match.

      However, for reasons that I cannot explain, this method steadfastly refuses to match beyond position 21165. Ask for any match after that and (on my system) the script quietly self terminates. No segault. No panic. No message at all.

      Is this a bug worthy of reporting or am I missing something here?

      #! perl -w use strict; my $str = 'a' x 100000; my $char = 'a'; for my $n ( map{ $_ * 1000 + 165 } 1 .. 30 ) { pos($str)=0; $str =~ /(?:.*?$char){$n}/g; print "Pos of '$char' #$n in \$str is ", pos($str), "\n"; } __END__ C:\test>junk Pos of 'a' #1165 in $str is 1165 Pos of 'a' #2165 in $str is 2165 Pos of 'a' #3165 in $str is 3165 Pos of 'a' #4165 in $str is 4165 Pos of 'a' #5165 in $str is 5165 Pos of 'a' #6165 in $str is 6165 Pos of 'a' #7165 in $str is 7165 Pos of 'a' #8165 in $str is 8165 Pos of 'a' #9165 in $str is 9165 Pos of 'a' #10165 in $str is 10165 Pos of 'a' #11165 in $str is 11165 Pos of 'a' #12165 in $str is 12165 Pos of 'a' #13165 in $str is 13165 Pos of 'a' #14165 in $str is 14165 Pos of 'a' #15165 in $str is 15165 Pos of 'a' #16165 in $str is 16165 Pos of 'a' #17165 in $str is 17165 Pos of 'a' #18165 in $str is 18165 Pos of 'a' #19165 in $str is 19165 Pos of 'a' #20165 in $str is 20165 Pos of 'a' #21165 in $str is 21165

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Looks to be a version-specific limitation. Mine (ActivePerl) stops somewhere between 31200 and 31300. Interesting find. Since it quietly aborts, I'd say it's worth a bug report.

        Caution: Contents may have been coded under pressure.
Re: The nth occurrence of a character
by parv (Parson) on Feb 09, 2006 at 07:32 UTC

    Won't rindex() be most direct?

    Update: never mind that is just plain WRONG (that will give the last position). I should start waiting a few hours before posting.

Re: The nth occurrence of a character (simple /$c/g)
by tye (Sage) on Feb 09, 2006 at 07:38 UTC

    I'd've probably used something like:

    sub nthPos { my $s= \shift @_; my( $p, $n )= @_; local( pos($$s) ); 0 while $$s =~ /\Q$p/g && 0 < --$n; return if 0 < $n; return 1+pos($$s); }

    Which uses several things that I wouldn't actually use without testing first but that I don't feel like spoiling the mystique of to myself at this point and so I have not yet tested.

    - tye        

      Go on, spoil the mystique :)

      Even fixing up the out-by-two error s/1+pos($$s)/pos($$s)-1/, I can't see how to correct the pos memory factor without explicitly resetting it? Which makes taking the reference and localising it redundant?

      #! perl -slw use strict; sub tye { my $s= \shift @_; my( $p, $n )= @_; local( pos($$s) ); 0 while $$s =~ /\Q$p/g && 0 < --$n; return if 0 < $n; return 1+pos($$s); } my $data = 'a' x 100; print "want: got"; printf "%3d : %3d\n", $_-1, tye( $data, 'a', $_ ) for 1 .. 30; __END__ C:\test>junk want: got 0 : 2 1 : 4 2 : 7 3 : 11 4 : 16 5 : 22 6 : 29 7 : 37 8 : 46 9 : 56 10 : 67 11 : 79 12 : 92 Use of uninitialized value in printf at C:\test\junk.pl line 16. 13 : 0 14 : 16 15 : 32 16 : 49 17 : 67 18 : 86 Use of uninitialized value in printf at C:\test\junk.pl line 16. 19 : 0 20 : 22 21 : 44 22 : 67 23 : 91 Use of uninitialized value in printf at C:\test\junk.pl line 16. 24 : 0 25 : 27 26 : 54 27 : 82 Use of uninitialized value in printf at C:\test\junk.pl line 16. 28 : 0 29 : 31

      About the best performing modification of it I came up with is:

      sub tyem { my( $s, $c, $n ) = @_; 0 while $s =~ /\Q$c/g && 0 < --$n; my $rv = pos( $s )-1; pos($s) = -1; return if 0 < $n; return $rv; }

      which doesn't fair so well, though it is still quicker than any of the other regex-based solutions.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        If you copy the string (and then destroy the copy), then it is a waste of time to set pos($s)= -1.

        It was not an off-by-two error as I consider returning -1 (a 'true' value) for 'not found' to rather suck as an interface, and prefer to avoid returning a false value (0) for 'found at 1st character' so I meant to return the position of the character as asked, not the offset to it.

        - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://528774]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2024-04-23 19:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found