Re: The nth occurrence of a character
by salva (Canon) on Feb 08, 2006 at 11:27 UTC
|
my $char = "C";
my $pat = "[^$char]*$char" x $n;
my $where;
/^($pat)/ and $where = length $1 - 1;
| [reply] [d/l] |
|
You could go one step further and use /^$pat/ and $where = $+[0]-1, but that makes use of the @+ array, and I'd probably leave a comment on that line.
| [reply] [d/l] [select] |
|
| [reply] |
Re: The nth occurrence of a character
by mickeyn (Priest) on Feb 08, 2006 at 15:03 UTC
|
allow me to suggest another way that may serve you if you want to access different characters at different occurances many times:
my %hash;
my @split = split //, $string;
my @positions = map { [$split[$_], $_] } 1 .. @split-1;
undef @split;
foreach (@positions){
push @{$hash{$_->[0]}}, $_->[1];
}
undef @positions ;
now, for each character you have a hash entry with all its positions in the string (array).
if for example you want to get the 3rd occurance of 'A', you will simply need to access: $hash{A}[2]
(2 will be the 3rd index in the array)
You wannted efficient function, and this way allows you to get you data in one hash+array access which IMHO is efficient.
the price you will pay is the one-time building of the hash (per string).
Enjoy,
Mickey | [reply] [d/l] [select] |
Re: The nth occurrence of a character
by duff (Parson) on Feb 08, 2006 at 18:42 UTC
|
#!/usr/bin/perl
use warnings; use strict;
sub find_nth {
my ($s,$c,$n) = @_;
my $pos = -1;
while ($n--) {
$pos = index($s,$c,$pos+1);
return -1 if $pos == -1;
}
return $pos;
}
my $str="Now is the time for all good men to come to the aid of their
+country";
my $n = 5;
my $c = "o";
print find_nth($str,$c,$n), "\n";
I'll leave it to those people who like to benchmark to decide whether or not this is an "efficient Perl function" :-)
| [reply] [d/l] [select] |
|
Ah well, in a fit of boredom I decided to benchmark the submissions myself. The results are unsurprising (at least to me).
Here are some results on my system; the first set of benchmarks are for multple occurences in one string, the second set is for a single occurence in multiple strings:
Rate RoyJohnson salva duff mickeyn
RoyJohnson 373/s -- -86% -97% -98%
salva 2604/s 598% -- -78% -88%
duff 11887/s 3089% 357% -- -44%
mickeyn 21159/s 5576% 713% 78% --
Rate RoyJohnson salva mickeyn duff
RoyJohnson 15272/s -- -9% -42% -48%
salva 16847/s 10% -- -36% -42%
mickeyn 26298/s 72% 56% -- -10%
duff 29202/s 91% 73% 11% --
| [reply] [d/l] [select] |
Re: The nth occurrence of a character
by Roy Johnson (Monsignor) on Feb 08, 2006 at 17:56 UTC
|
my $str = 'one a two a three a four';
my $char = 'a';
my $n = 2;
$str =~ /(?:.*?$char){$n}/g;
print "Pos of '$char' #$n in '$str' is ", pos($str), "\n";
Caution: Contents may have been coded under pressure.
| [reply] [d/l] |
|
You have to make sure that you reset pos($str) if you use this more than once otherwise it continues from the last match.
However, for reasons that I cannot explain, this method steadfastly refuses to match beyond position 21165. Ask for any match after that and (on my system) the script quietly self terminates. No segault. No panic. No message at all.
Is this a bug worthy of reporting or am I missing something here?
#! perl -w
use strict;
my $str = 'a' x 100000;
my $char = 'a';
for my $n ( map{ $_ * 1000 + 165 } 1 .. 30 ) {
pos($str)=0;
$str =~ /(?:.*?$char){$n}/g;
print "Pos of '$char' #$n in \$str is ", pos($str), "\n";
}
__END__
C:\test>junk
Pos of 'a' #1165 in $str is 1165
Pos of 'a' #2165 in $str is 2165
Pos of 'a' #3165 in $str is 3165
Pos of 'a' #4165 in $str is 4165
Pos of 'a' #5165 in $str is 5165
Pos of 'a' #6165 in $str is 6165
Pos of 'a' #7165 in $str is 7165
Pos of 'a' #8165 in $str is 8165
Pos of 'a' #9165 in $str is 9165
Pos of 'a' #10165 in $str is 10165
Pos of 'a' #11165 in $str is 11165
Pos of 'a' #12165 in $str is 12165
Pos of 'a' #13165 in $str is 13165
Pos of 'a' #14165 in $str is 14165
Pos of 'a' #15165 in $str is 15165
Pos of 'a' #16165 in $str is 16165
Pos of 'a' #17165 in $str is 17165
Pos of 'a' #18165 in $str is 18165
Pos of 'a' #19165 in $str is 19165
Pos of 'a' #20165 in $str is 20165
Pos of 'a' #21165 in $str is 21165
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
| [reply] |
Re: The nth occurrence of a character
by parv (Parson) on Feb 09, 2006 at 07:32 UTC
|
Won't rindex() be most direct?
Update: never mind that is just plain WRONG (that will give the last position). I should start waiting a few hours before posting.
| [reply] [d/l] |
Re: The nth occurrence of a character (simple /$c/g)
by tye (Sage) on Feb 09, 2006 at 07:38 UTC
|
sub nthPos
{
my $s= \shift @_;
my( $p, $n )= @_;
local( pos($$s) );
0 while $$s =~ /\Q$p/g && 0 < --$n;
return if 0 < $n;
return 1+pos($$s);
}
Which uses several things that I wouldn't actually use without testing first but that I don't feel like spoiling the mystique of to myself at this point and so I have not yet tested.
| [reply] [d/l] |
|
Go on, spoil the mystique :)
Even fixing up the out-by-two error s/1+pos($$s)/pos($$s)-1/, I can't see how to correct the pos memory factor without explicitly resetting it? Which makes taking the reference and localising it redundant?
#! perl -slw
use strict;
sub tye {
my $s= \shift @_;
my( $p, $n )= @_;
local( pos($$s) );
0 while $$s =~ /\Q$p/g && 0 < --$n;
return if 0 < $n;
return 1+pos($$s);
}
my $data = 'a' x 100;
print "want: got";
printf "%3d : %3d\n", $_-1, tye( $data, 'a', $_ ) for 1 .. 30;
__END__
C:\test>junk
want: got
0 : 2
1 : 4
2 : 7
3 : 11
4 : 16
5 : 22
6 : 29
7 : 37
8 : 46
9 : 56
10 : 67
11 : 79
12 : 92
Use of uninitialized value in printf at C:\test\junk.pl line 16.
13 : 0
14 : 16
15 : 32
16 : 49
17 : 67
18 : 86
Use of uninitialized value in printf at C:\test\junk.pl line 16.
19 : 0
20 : 22
21 : 44
22 : 67
23 : 91
Use of uninitialized value in printf at C:\test\junk.pl line 16.
24 : 0
25 : 27
26 : 54
27 : 82
Use of uninitialized value in printf at C:\test\junk.pl line 16.
28 : 0
29 : 31
About the best performing modification of it I came up with is:
sub tyem {
my( $s, $c, $n ) = @_;
0 while $s =~ /\Q$c/g && 0 < --$n;
my $rv = pos( $s )-1;
pos($s) = -1;
return if 0 < $n;
return $rv;
}
which doesn't fair so well, though it is still quicker than any of the other regex-based solutions.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
If you copy the string (and then destroy the copy), then it is a waste of time to set pos($s)= -1.
It was not an off-by-two error as I consider returning -1 (a 'true' value) for 'not found' to rather suck as an interface, and prefer to avoid returning a false value (0) for 'found at 1st character' so I meant to return the position of the character as asked, not the offset to it.
| [reply] [d/l] |