Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Split a string based on change of character

by eyepopslikeamosquito (Chancellor)
on Jul 28, 2007 at 06:51 UTC ( #629259=perlquestion: print w/replies, xml ) Need Help??
eyepopslikeamosquito has asked for the wisdom of the Perl Monks concerning the following question:

For a string 'ABBBCC', I want to produce a list ('A', 'BBB', 'CC'). That is, break the string into pieces based on change of character.

My Perl is getting a bit rusty and I found myself struggling today with this simple problem. Though I've found a solution, shown below, there is probably a better way to do it, hence my question. Apologies if this is a FAQ.

use strict; use warnings; my $str = 'AAABBCCCC'; # For str 'AAABBCCCC', I want to produce a list ('AAA', 'BB', 'CCCC'). # This works ... but is there a better way to do it? my $i = 0; # $i is used to filter out the captured $1 fields my @x = grep { ++$i % 2 } split(/(?<=(.))(?!\1)/, $str); for my $e (@x) { print "e='$e'\n" }

Replies are listed 'Best First'.
Re: Split a string based on change of character (also)
by tye (Sage) on Jul 28, 2007 at 07:41 UTC
    my $str= "AAABBCCCC"; my @x; push @x, $1 while $str =~ /((.)\2*)/g;

    - tye        

Re: Split a string based on change of character
by moritz (Cardinal) on Jul 28, 2007 at 08:04 UTC
    I don't know if you can use a split here, because your pattern may not consume characters to achieve what you want.

    I've tried this one: m/((?<=.))(?!\1)/ which should be "a position before which there is a character, and after that a different character", but it doesn't work.

    Can anybody tell me why this doesn't match the string 'aaabbbc'?

    Update: Zoffix told me on IRC that the right thing would be m/(?<=(.))(?!\1)/ (because the assertion is zero-width), however that doesn't work in split as well, because it returns the captured character:

    $ perl -MData::Dumper -wle '$Data::Dumper::Indent=0; $_="aaabbc"; prin +t Dumper([split m/(?<=(.))(?!\1)/]);' $VAR1 = ['aaa','a','bb','b','c','c'];

    This way you had to discard every second item of the returned list - not pretty either ;-)

      You can use split:
      my $string= 'AAABBBCCCDD'; my $i=0; my @words= grep $i=!$i,split /(?<=(.))(?!\1)/,$string; print join "\n",@words,'';

        I never thought of
        grep $i=!$i,
        before. I usually use
        grep $i^=$i,

        Yours is easier to understand, though.

        Update: Yeah, I mean grep $i^=1,.

      <Zoffix> eval: $_='zyxxaabbbcccccc'; push @a, $1 while s/((.)\2*)//;[@ +a] <_ZofBot> Zoffix: ['z','y','xx','aa','bbb','cccccc']

      20070730 Janitored by Corion: Added code tags, as per Writeup Formatting Tips

        Or even: $_='zyxxaabbbcccccc'; push @a, $1 while /((.)\2*)/g; Which doesn't destroy original data.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://629259]
Approved by FunkyMonk
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (5)
As of 2017-08-24 10:08 GMT
Find Nodes?
    Voting Booth?
    Who is your favorite scientist and why?

    Results (367 votes). Check out past polls.