How to access each char in a string most quickly?

llancet has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How to access each char in a string most quickly? by BrowserUk (Patriarch) on Jul 03, 2009 at 02:24 UTC
Here are a few alternatives assuming you cannot pre-split for your purpose. My favorite hack is chop if I need speed and order does matter. It's about 4x faster than split and substantially faster than substr even if you have to reverse and copy for your needs. Better if you can avoid doing both, which you frequently can. #! perl -sw use 5.010; use strict; use Benchmark qw[ cmpthese ]; our $LEN \|\|= 100; our $string = 'A' x $LEN; our @chars = split//, $string; cmpthese -3, { substr => q[ our $string; for ( 0 .. length $string ) { my $c = substr $string, $_, 1; } ], pre_split => q[ our @chars; for my $c ( @chars ) { ; } ], split => q[ our $string; my @chars = split //, $string; for my $c ( @chars ) { } ], unpack => q[ our $string; for ( unpack 'C*', $string ) { my $c = chr; } ], chop => q[ our $string; my $copy = $string; while( my $c = chop $copy ) { ; } ], rev_chop => q[ our $string; my $copy = reverse $string; while( my $c = chop $copy ) { ; } ], }; __END__ C:\test>byChar.pl Rate split unpack substr rev_chop chop p +re_split split 11018/s -- -74% -74% -81% -82% + -94% unpack 42513/s 286% -- -0% -26% -30% + -75% substr 42526/s 286% 0% -- -26% -30% + -75% rev_chop 57794/s 425% 36% 36% -- -5% + -66% chop 61056/s 454% 44% 44% 6% -- + -64% pre_split 171734/s 1459% 304% 304% 197% 181% + -- [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP PCW	[reply] [d/l]
Re^2: How to access each char in a string most quickly? by ikegami (Patriarch) on Jul 03, 2009 at 02:27 UTC
There's also `while ($string =~ /(.)/sg)` [download] and `for my $c ($string =~ /./sg)` [download]	[reply] [d/l] [select]
Re^3: How to access each char in a string most quickly? by BrowserUk (Patriarch) on Jul 03, 2009 at 04:46 UTC
Neither stand up too well unless I've screwed something (which as we all know is entirely possible :): Update: Code corrected in light of ikegami's post below: C:\test>byChar.pl ... rgx_scalar => q[ our $string; while ( $string =~ /(.)/sg) { my $c = $1; } ], rgx_list => q[ our $string; for my $c ( $string =~ /(.)/sg) { ; } ], C:\test>byChar.pl Rate split rgx_list substr_refs rgx_scalar unpack subs +tr chop rev_chop pre_split split 9984/s -- -34% -74% -75% -76% -7 +7% -82% -83% -94% rgx_list 15104/s 51% -- -61% -62% -63% -6 +5% -73% -74% -91% substr_refs 38242/s 283% 153% -- -3% -7% -1 +2% -33% -34% -78% rgx_scalar 39574/s 296% 162% 3% -- -3% - +9% -30% -32% -77% unpack 40959/s 310% 171% 7% 3% -- - +6% -28% -29% -76% substr 43352/s 334% 187% 13% 10% 6% +-- -24% -25% -75% chop 56695/s 468% 275% 48% 43% 38% 3 +1% -- -2% -67% rev_chop 57962/s 481% 284% 52% 46% 42% 3 +4% 2% -- -67% pre_split 173576/s 1639% 1049% 354% 339% 324% 30 +0% 206% 199% -- [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP PCW	[reply] [d/l]
Re^4: How to access each char in a string most quickly? by ikegami (Patriarch) on Jul 03, 2009 at 05:01 UTC
Re: How to access each char in a string most quickly? by Marshall (Canon) on Jul 03, 2009 at 02:02 UTC
Can you explain what you are trying to do??? I mean the slowest thing you have is ~~18K~~ 4.8K char/sec. Why isn't that fast enough? In Perl the one of the "power hitter" features is the ability to use regex (regular expressions) so that we don't have to deal with looking at individual characters. I don't see an application here. Your question is meaningless to me unless you tell me what you are trying to accomplish. Help us out with an application question!	[reply]
Re^2: How to access each char in a string most quickly? by llancet (Friar) on Jul 03, 2009 at 03:02 UTC
I have to process on thousands of 1200-char-length gene sequence, and I have to access each character of it.	[reply]
Re^3: How to access each char in a string most quickly? by Marshall (Canon) on Jul 03, 2009 at 04:12 UTC
I am saying that getting a character quickly in and of itself is meaningless. getc() will do that. There are some ways to get arrays of characters and process them efficiently. At the end of the day, you aren't asking "how do I read single characters efficiently?, you want to ask a much broader application question, but you haven't done it yet. Your question doesn't have any apparent connection to your intended application. Udate: it appears to me that you need to find patterns amongst very long strings. Generating a list of 1,200 chars is not going to help you achieve this objective. getc() is NOT the way. Perl excels at processing sequences of characters, not individual characters.	[reply]
Re: How to access each char in a string most quickly? by Marshall (Canon) on Jul 03, 2009 at 07:15 UTC
I ran some benchmarks: I used the previous code but added the "shift" case to the @array. Update:When dealing with an @thing variable in Perl, using shift is an extremely efficient way to get the "top of the list". The benchmarks below show that. I got a comment that indicated that somebody didn't believe that, well check it out for yourself. But again, this is not about reading characters, ~~it must or seems to be~~ Perl's power is about recognizing pattern matches! #!/usr/bin/perl -w use strict; use Benchmark qw[ cmpthese timethese]; my $string='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aaaaaaaaaaaaaaaaaaa'; my @chars=split //,$string; cmpthese ( 1000000, { 'Substr'=>sub { my $string='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aaaaaaaaaaaaaaaaaaaaaaaaa'; for (my $i=0;$i<length $string; $i++) { my $char=substr $string,$i,1; } }, 'pre_splitted'=>sub { foreach my $char (@chars) {} }, 'Split'=>sub { my @string=split //,'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'; foreach my $char (@string) {} }, 'Shift'=>sub { my @string='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aaaaaaaaaaaaaaaaaaaaaaaaaaaa'; shift @string;{} } } ); __END__ prints: Rate Split Substr pre_splitted Shift Split 8840/s -- -72% -94% -99% Substr 31465/s 256% -- -78% -95% pre_splitted 140667/s 1491% 347% -- -80% Shift 695410/s 7767% 2110% 394% -- [download]	[reply] [d/l]
Re^2: How to access each char in a string most quickly? by citromatik (Curate) on Jul 03, 2009 at 08:14 UTC
`{ my @string='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aaaaaaaaaaaaaaaaaaaaaaaaaaaa'; shift @string;{} }` [download] This is nonsense, did you mean? `my @string = ('a')x50; while (shift @string){ ; }` [download] citromatik	[reply] [d/l] [select]
Re^3: How to access each char in a string most quickly? by Marshall (Canon) on Jul 03, 2009 at 08:24 UTC
I am striking this whole post as this didn't work out very well. Somehow the main point just got completely lost. Perl is not a great character by character language and its just not the right way to use Perl, but somehow I wasn't able to get this across. Oooops. Yes. You are correct!!. This whole thread is a bit weird as the idea of processing a char at a time is sort of "anti-Perl". `The code should be: while (my $var =shift @string){}` But that makes no difference. `Rate Split Substr pre_splitted Shift Split 8851/s -- -73% -94% -98% Substr 32226/s 264% -- -80% -95% pre_splitted 158028/s 1685% 390% -- -73% Shift 587199/s 6534% 1722% 272% --` [download] The main point is that shift() is very, very fast, but Perl will work with regex even faster. I mean so what do you do with these chars that were read individually? ~~Update: Well Darn! the code above is not right, and I think I could write some faster "get a character" code, but that is just not the point at all!~~	[reply] [d/l] [select]