http://www.perlmonks.org?node_id=629259

eyepopslikeamosquito has asked for the wisdom of the Perl Monks concerning the following question:

For a string 'ABBBCC', I want to produce a list ('A', 'BBB', 'CC'). That is, break the string into pieces based on change of character.

My Perl is getting a bit rusty and I found myself struggling today with this simple problem. Though I've found a solution, shown below, there is probably a better way to do it, hence my question. Apologies if this is a FAQ.

use strict; use warnings; my $str = 'AAABBCCCC'; # For str 'AAABBCCCC', I want to produce a list ('AAA', 'BB', 'CCCC'). # This works ... but is there a better way to do it? my $i = 0; # $i is used to filter out the captured $1 fields my @x = grep { ++$i % 2 } split(/(?<=(.))(?!\1)/, $str); for my $e (@x) { print "e='$e'\n" }

Replies are listed 'Best First'.
Re: Split a string based on change of character (also)
by tye (Sage) on Jul 28, 2007 at 07:41 UTC
    my $str= "AAABBCCCC"; my @x; push @x, $1 while $str =~ /((.)\2*)/g;

    - tye        

Re: Split a string based on change of character
by moritz (Cardinal) on Jul 28, 2007 at 08:04 UTC
    I don't know if you can use a split here, because your pattern may not consume characters to achieve what you want.

    I've tried this one: m/((?<=.))(?!\1)/ which should be "a position before which there is a character, and after that a different character", but it doesn't work.

    Can anybody tell me why this doesn't match the string 'aaabbbc'?

    Update: Zoffix told me on IRC that the right thing would be m/(?<=(.))(?!\1)/ (because the assertion is zero-width), however that doesn't work in split as well, because it returns the captured character:

    $ perl -MData::Dumper -wle '$Data::Dumper::Indent=0; $_="aaabbc"; prin +t Dumper([split m/(?<=(.))(?!\1)/]);' $VAR1 = ['aaa','a','bb','b','c','c'];

    This way you had to discard every second item of the returned list - not pretty either ;-)

      You can use split:
      my $string= 'AAABBBCCCDD'; my $i=0; my @words= grep $i=!$i,split /(?<=(.))(?!\1)/,$string; print join "\n",@words,'';

      s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
      +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
        I never thought of
        grep $i=!$i,
        before. I usually use
        grep $i^=$i,

        Yours is easier to understand, though.

        Update: Yeah, I mean grep $i^=1,.

      <Zoffix> eval: $_='zyxxaabbbcccccc'; push @a, $1 while s/((.)\2*)//;[@ +a] <_ZofBot> Zoffix: ['z','y','xx','aa','bbb','cccccc']

      20070730 Janitored by Corion: Added code tags, as per Writeup Formatting Tips

        Or even: $_='zyxxaabbbcccccc'; push @a, $1 while /((.)\2*)/g; Which doesn't destroy original data.