http://www.perlmonks.org?node_id=37234

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have tried the parse_csv routine from the Perl Cookbook, and it works fine on ascii encoded strings, but as soon as I start using shift-jis, which is a multi-byte character encoding, I end up with wierd results, because sometimes it matches in the middle of a character.

Right now I am using a real parser working character by character, but it is really slow... The problem comes from this line, I think:

my $sjis = q{ [\x00-\x7F] |[\x81-\x9F\xE0-\xFC][\x40-\x7E\x80-\xFC] |[\xA0-\xDF] }; @chars = $text =~ /$sjis/gox;
Any ideas on how to speed this/avoid backtracking?

Thanks

Originally posted as a Categorized Question.