The nth occurrence of a character

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: The nth occurrence of a character by salva (Canon) on Feb 08, 2006 at 11:27 UTC
`my $char = "C"; my $pat = "[^$char]*$char" x $n; my $where; /^($pat)/ and $where = length $1 - 1;` [download]	[reply] [d/l]
Re^2: The nth occurrence of a character by japhy (Canon) on Feb 08, 2006 at 12:53 UTC
You could go one step further and use `/^$pat/ and $where = $+[0]-1`, but that makes use of the `@+` array, and I'd probably leave a comment on that line. Jeff `japhy` Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and `perl` hacker How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart	[reply] [d/l] [select]
Re^3: The nth occurrence of a character by diotalevi (Canon) on Feb 08, 2006 at 14:36 UTC
This is certainly cheaper. If capturing can be avoided and this code is important for speed, choose the non-capturing version. ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊	[reply]
Re: The nth occurrence of a character by mickeyn (Priest) on Feb 08, 2006 at 15:03 UTC
allow me to suggest another way that may serve you if you want to access different characters at different occurances many times: `my %hash; my @split = split //, $string; my @positions = map { [$split[$_], $_] } 1 .. @split-1; undef @split; foreach (@positions){ push @{$hash{$_->[0]}}, $_->[1]; } undef @positions ;` [download] now, for each character you have a hash entry with all its positions in the string (array). if for example you want to get the 3rd occurance of 'A', you will simply need to access: `$hash{A}[2]` (2 will be the 3rd index in the array) You wannted efficient function, and this way allows you to get you data in one hash+array access which IMHO is efficient. the price you will pay is the one-time building of the hash (per string). Enjoy, Mickey	[reply] [d/l] [select]
Re: The nth occurrence of a character by duff (Parson) on Feb 08, 2006 at 18:42 UTC
Does no one use `index()` anymore?!? `#!/usr/bin/perl use warnings; use strict; sub find_nth { my ($s,$c,$n) = @_; my $pos = -1; while ($n--) { $pos = index($s,$c,$pos+1); return -1 if $pos == -1; } return $pos; } my $str="Now is the time for all good men to come to the aid of their +country"; my $n = 5; my $c = "o"; print find_nth($str,$c,$n), "\n";` [download] I'll leave it to those people who like to benchmark to decide whether or not this is an "efficient Perl function" :-) duff	[reply] [d/l] [select]
Re^2: The nth occurrence of a character by duff (Parson) on Feb 09, 2006 at 06:07 UTC
Ah well, in a fit of boredom I decided to benchmark the submissions myself. The results are unsurprising (at least to me). Read more... (4 kB) Here are some results on my system; the first set of benchmarks are for multple occurences in one string, the second set is for a single occurence in multiple strings: Rate RoyJohnson salva duff mickeyn RoyJohnson 373/s -- -86% -97% -98% salva 2604/s 598% -- -78% -88% duff 11887/s 3089% 357% -- -44% mickeyn 21159/s 5576% 713% 78% -- Rate RoyJohnson salva mickeyn duff RoyJohnson 15272/s -- -9% -42% -48% salva 16847/s 10% -- -36% -42% mickeyn 26298/s 72% 56% -- -10% duff 29202/s 91% 73% 11% -- duff	[reply] [d/l] [select]
Re: The nth occurrence of a character by Roy Johnson (Monsignor) on Feb 08, 2006 at 17:56 UTC
`my $str = 'one a two a three a four'; my $char = 'a'; my $n = 2; $str =~ /(?:.?$char){$n}/g; print "Pos of '$char' #$n in '$str' is ", pos($str), "\n";` [download] Caution:* Contents may have been coded under pressure.	[reply] [d/l]
Re^2: The nth occurrence of a character by BrowserUk (Patriarch) on Feb 08, 2006 at 21:30 UTC
You have to make sure that you reset pos($str) if you use this more than once otherwise it continues from the last match. However, for reasons that I cannot explain, this method steadfastly refuses to match beyond position 21165. Ask for any match after that and (on my system) the script quietly self terminates. No segault. No panic. No message at all. Is this a bug worthy of reporting or am I missing something here? #! perl -w use strict; my $str = 'a' x 100000; my $char = 'a'; for my $n ( map{ $_ * 1000 + 165 } 1 .. 30 ) { pos($str)=0; $str =~ /(?:.*?$char){$n}/g; print "Pos of '$char' #$n in \$str is ", pos($str), "\n"; } __END__ C:\test>junk Pos of 'a' #1165 in $str is 1165 Pos of 'a' #2165 in $str is 2165 Pos of 'a' #3165 in $str is 3165 Pos of 'a' #4165 in $str is 4165 Pos of 'a' #5165 in $str is 5165 Pos of 'a' #6165 in $str is 6165 Pos of 'a' #7165 in $str is 7165 Pos of 'a' #8165 in $str is 8165 Pos of 'a' #9165 in $str is 9165 Pos of 'a' #10165 in $str is 10165 Pos of 'a' #11165 in $str is 11165 Pos of 'a' #12165 in $str is 12165 Pos of 'a' #13165 in $str is 13165 Pos of 'a' #14165 in $str is 14165 Pos of 'a' #15165 in $str is 15165 Pos of 'a' #16165 in $str is 16165 Pos of 'a' #17165 in $str is 17165 Pos of 'a' #18165 in $str is 18165 Pos of 'a' #19165 in $str is 19165 Pos of 'a' #20165 in $str is 20165 Pos of 'a' #21165 in $str is 21165 [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^3: The nth occurrence of a character by Roy Johnson (Monsignor) on Feb 08, 2006 at 22:07 UTC
Looks to be a version-specific limitation. Mine (ActivePerl) stops somewhere between 31200 and 31300. Interesting find. Since it quietly aborts, I'd say it's worth a bug report. Caution: Contents may have been coded under pressure.	[reply]
Re: The nth occurrence of a character by parv (Parson) on Feb 09, 2006 at 07:32 UTC
Won't `rindex()` be most direct? Update: never mind that is just plain WRONG (that will give the last position). I should start waiting a few hours before posting.	[reply] [d/l]
Re: The nth occurrence of a character (simple /$c/g) by tye (Sage) on Feb 09, 2006 at 07:38 UTC
I'd've probably used something like: `sub nthPos { my $s= \shift @_; my( $p, $n )= @_; local( pos($$s) ); 0 while $$s =~ /\Q$p/g && 0 < --$n; return if 0 < $n; return 1+pos($$s); }` [download] Which uses several things that I wouldn't actually use without testing first but that I don't feel like spoiling the mystique of to myself at this point and so I have not yet tested. - tye	[reply] [d/l]
Re^2: The nth occurrence of a character (simple /$c/g) by BrowserUk (Patriarch) on Feb 09, 2006 at 09:06 UTC
Go on, spoil the mystique :) Even fixing up the out-by-two error `s/1+pos($$s)/pos($$s)-1/`, I can't see how to correct the pos memory factor without explicitly resetting it? Which makes taking the reference and localising it redundant? #! perl -slw use strict; sub tye { my $s= \shift @_; my( $p, $n )= @_; local( pos($$s) ); 0 while $$s =~ /\Q$p/g && 0 < --$n; return if 0 < $n; return 1+pos($$s); } my $data = 'a' x 100; print "want: got"; printf "%3d : %3d\n", $_-1, tye( $data, 'a', $_ ) for 1 .. 30; __END__ C:\test>junk want: got 0 : 2 1 : 4 2 : 7 3 : 11 4 : 16 5 : 22 6 : 29 7 : 37 8 : 46 9 : 56 10 : 67 11 : 79 12 : 92 Use of uninitialized value in printf at C:\test\junk.pl line 16. 13 : 0 14 : 16 15 : 32 16 : 49 17 : 67 18 : 86 Use of uninitialized value in printf at C:\test\junk.pl line 16. 19 : 0 20 : 22 21 : 44 22 : 67 23 : 91 Use of uninitialized value in printf at C:\test\junk.pl line 16. 24 : 0 25 : 27 26 : 54 27 : 82 Use of uninitialized value in printf at C:\test\junk.pl line 16. 28 : 0 29 : 31 [download] About the best performing modification of it I came up with is: `sub tyem { my( $s, $c, $n ) = @_; 0 while $s =~ /\Q$c/g && 0 < --$n; my $rv = pos( $s )-1; pos($s) = -1; return if 0 < $n; return $rv; }` [download] which doesn't fair so well, though it is still quicker than any of the other regex-based solutions. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^3: The nth occurrence of a character (simple /$c/g) by tye (Sage) on Feb 09, 2006 at 14:34 UTC
If you copy the string (and then destroy the copy), then it is a waste of time to set `pos($s)= -1`. It was not an off-by-two error as I consider returning -1 (a 'true' value) for 'not found' to rather suck as an interface, and prefer to avoid returning a false value (0) for 'found at 1st character' so I meant to return the position of the character as asked, not the offset to it. - tye	[reply] [d/l]


Perl Monk, Perl Meditation
	PerlMonks