Problems? Is your data what you think it is? PerlMonks

### Yet Another Rosetta Code Problem (Perl, Ruby, Python, Haskell, ...)

by eyepopslikeamosquito (Archbishop)
 on Aug 12, 2007 at 07:35 UTC Need Help??

this meditation describes an arbitrary problem to be solved in as many different languages as possible.

I chose this particular problem purely by chance after being surprised that solving it in Perl turned out to be more awkward than I'd expected. Curious as to whether this awkwardness was specific to Perl, I then tried solving it in Ruby, Python and Haskell. Hence this node. :-)

The Problem

Given an input string, for example "ZBBBCZZ", produce a list, for example, ["Z", "BBB", "C", "ZZ"].

That is, break the input string into pieces based on change of character.

My preferred solutions in Perl, Ruby, Python and Haskell follow. Improvements welcome.

Perl

My favourite solution from Split a string based on change of character was given in the first response by tye:

```my \$s = "ZBBBCZZ";
my @x;
push @x, \$1 while \$s =~ /((.)\2*)/g;

Ruby

This was discussed at length on the Ruby-Talk mailing list, with many different solutions offered. My favourite is similar to tye's Perl solution:

```s = "ZBBBCZZ"
x = []
s.scan(/((.)\2*)/){x.push [\$~[0]]}

Python

This problem was also discussed on Python-list. My favourite uses Python's SML-inspired itertools library:

```import itertools
s = "ZBBBCCZZ"
x = [''.join(g) for k, g in itertools.groupby(s)]

Browsing the Haskell Data.List documentation, I noticed that a Haskell solution seems trivial courtesy of Data.List's group/groupBy function:

```group "ZBBBCCZZ"

Discussion

As you might expect, the only solution that I found truly satisfying was the Haskell one :-) ... though this seems to be more of a library issue than a language one.

BTW, does anyone know of a CPAN module that can solve this problem directly? I took a quick look at:

but didn't notice any obvious, trivial way to do it with those libraries.

I'm also curious to learn about plans for Perl 6 "functional-style" libraries. I've been impressed by Haskell's libraries and hope they will help inspire some of the new Perl 6 library.

Anyway, I'd find it interesting and educational to compare solutions to this little problem in a variety of other languages. So, please respond away in your favourite language!

References

2-sep-2007: Updated references with some more links based on the responses below.

Replies are listed 'Best First'.
Re: Yet Another Rosetta Code Problem (Perl, Ruby, Python, Haskell, ...)
by moritz (Cardinal) on Aug 12, 2007 at 08:00 UTC
Allow me to add my favorite Perl 6 solutions. The split version is not tested yet because Pugs::Compiler::Rule doesn't seem to implement <before> and <after>:

```my \$s = "ZBBBCZZ";
my @list = \$s.comb: rx/(.)\$0*/;

# or if you are very fond on split:
my @list = \$s.split(rx/<after (.)><!before \$0>/);

The comb-Version seems to be the same as the ruby example, the split-Version shows off the new regex syntax ;-)

Re: Yet Another Rosetta Code Problem (Perl, Ruby, Python, Haskell, ...)
by BrowserUk (Patriarch) on Aug 12, 2007 at 14:34 UTC

Given that haskell's group is a library routine defined (one flavour) as:

```group                   =  groupBy (==)

groupBy _  []=  []
groupBy eq (x:xs)=  (x:ys) : groupBy eq zs
where (ys,zs) = span (eq x) xs

span p [] = ([],[])
span p xs@(x:xs')
| p x = (x:ys, zs)
| otherwise = ([],xs)
where (ys,zs) = span p xs'

it would be equivalent to hide the complexity of a perl equivalent in a function, and it might be defined something like this:

```sub group {
map{
pack 'C*', @\$_
} @{ +reduce {
defined \$a->[0] && \$a->[-1][0] && \$a->[-1][0] == \$b
? push @{ \$a->[-1] }, \$b
: push @{ \$a }, [ \$b ];
\$a;
} [], unpack 'C*', shift }
}

Giving a roughly equivalent one line usage:

```print for group 'ZBBBCCZZ';
Z
BBB
CC
ZZ

Of course, the equivalence is only rough because Haskell's syntactic sugar of allowing a list of chars to look like a string. The above won't work for an arbitrary list.

I remember suggesting that it would be nice if Perl 6 would allow a scalar to be treated syntactically as a list of chars: \$scalar[3] = 'x'; etc. It would allow a single routine to be polymorphic to arbitrary lists and scalars and simplify the above, but I guess it wouldn't work for many other cases.

BTW, does anyone know of a CPAN module that can solve this problem directly?

I just did a codesearch for lang:haskell 'group', and without promising to have exhaustively searched every link, I couldn't find a single use of group in any of the first 5 or so pages of hits.

Loads of definitions, mostly the same but some variations in different libraries, but no uses.

I guess that's why you won't find it on cpan. Other than for essentially academic uses like this, there are very few uses for it.

Of course, now someone will turn up a 300 lines haskell module with 250 uses of group, but on the basis of what I found, it's not exactly an essential function.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Yet Another Rosetta Code Problem (Perl, Ruby, Python, Haskell, ...)
by jwkrahn (Abbot) on Aug 12, 2007 at 08:48 UTC
If you don't mind the different order of the output then:
```\$ perl -le'print for keys %{ { "ZBBBCZZ" =~ /((.)\2*)/g } }'
Z
ZZ
BBB
C
:-)

Excellent excellent excellent.
A solution to the order problem would be to use an order-remembering hash (e.g. Tie::IxHash). Or you could pluck out the even-numbered elements of the list, e.g.

```\$_ = 'abbbccddde';
@a =  /((.)\2*)/g;
@a = @a[ grep { not \$_ & 1 } 0 .. \$#a ];
A word spoken in Mind will reach its own level, in the objective world, by its own weight
If you don't mind the different order of the output...

... and also don't mind the possible loss of some intended output:

```# OP's solution, given a different input:

\$ perl -le'print \$1 while "ZBBBCZZCZ" =~ /((.)\2*)/g'
Z
BBB
C
ZZ
C
Z

# jwkrahn's solution for that input:

\$ perl -le'print for keys %{{"ZBBBCZZCZ" =~ /((.)\2*)/g}}'
Z
ZZ
BBB
C
(Not to mention that the OP approach wins at golfing. :)
If you don't mind the different order of the output then:

Sorry if I miss something, but your solution based on a hash, like all hash based ones seems aimed at uniqueness... but that doesn't seem a requirement as of the root node...

Re: Yet Another Rosetta Code Problem (Perl, Ruby, Python, Haskell, ...)
by graff (Chancellor) on Aug 12, 2007 at 14:49 UTC
As you might expect, the only solution that I found truly satisfying was the Haskell one :-)

Anyone who loves to browse voluminous manuals that describe huge inventories of library functions would agree with you, I'm sure. :)

... though this seems to be more of a library issue than a language one.

Exactly. If the implementation of Haskell's "group" function has the same parsimony and efficiency as the perl "while /regex/g" solution, and if I really need to use it often, then I would certainly prefer to use the function call (once I've discovered that it actually exists).

But if the "standard operators" provided by the language yield a fairly concise idiom for the task, why bother with the overhead (and relative obscurity) of a function call, especially if the particular task doesn't come up that often?

Anyone who loves to browse voluminous manuals that describe huge inventories of library functions would agree with you, I'm sure. :)

And just how many lines of documentation are on the CPAN?

Not enough ;-)

holli, /regexed monk/
Re: Yet Another Rosetta Code Problem (Perl, Ruby, Python, Haskell, ...)
by Limbic~Region (Chancellor) on Aug 14, 2007 at 00:33 UTC
eyepopslikeamosquito,
BTW, does anyone know of a CPAN module that can solve this problem directly?

It is hard to imagine why this specific problem would have its own solution.

The problem can be generalized into something worth solving. Assume we want to divide a list on break points. I will present an iterator solution.

```#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

sub gen_divider {
my %opt = @_;
my \$is_break = \$opt{detect_break};

my (\$curr, @list);

return sub {
for (@_) {
if (! defined \$curr || \$is_break->(\$_, \$curr)) {
push @list, [\$_];
\$curr = \$_;
}
else {
push @{\$list[-1]}, \$_;
}
}
return \@list;
};
}

my \$break_point = sub {
my (\$item_to_test, \$prev_break_point) = @_;
return \$item_to_test ne \$prev_break_point;
};

my \$divide = gen_divider(detect_break => \$break_point);

for (split //, "ZBBBCZZ") {
\$divide->(\$_);
}

my \$list = \$divide->();
print Dumper(\$list);
This leaves a lot to be desired but the idea jumped into my head and so I thought I would share.

Cheers - L~R

Ruby solution
```x = []
"ZBBBCZZ".scan(/((.)\2*)/){ x << [\$~[0]]; x.flatten!}
.flatten is needed else you get
```# => [["Z"], ["BBB"], ["C"], ["ZZ"]]
as result.

But this is quite an ugly solution anyway... in fact, perl looks almost as readable in this example ;)

not sure how to solve this any easier though, hmm i wonder if .each could be used and then another grouping way... I really dont like the regex magic of the ruby solution, cant you guys think of another solution :-)

I really dont like the regex magic of the ruby solution, cant you guys think of another solution :-)
```x = s.split(//).inject([]) {|a,e| (a.last && a.last[e] ? a.last : a) <
+< e; a}

A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Yet Another Rosetta Code Problem (Perl, Ruby, Python, Haskell, ...)
by educated_foo (Vicar) on Aug 12, 2007 at 19:03 UTC
Actually, this is a good illustration of Haskell's strengths: given a contrived problem involving lists, there is an obscurely-named library function to solve it in one line.

The Vatican just called and want to commission you to write the software for their soon to be released answer to the current vogue for more and more violent and Satanic video games--The God in a Box console. They antisipate huge sales amongst followers once the Pope edicts that all good Catholic homes should have one.

There is one caveat. To ensure that there are no hidden sex scenes or other inappropriate Easter eggs, they have decreed that all games for this new omni-threaded processor console shall be reviewed, at the source level, by a senior member of the Vatican Council.

To facilitate this review, the source code must be written in Latin. What you gonna do? Does Haskell have an appropriate parser available?

Thanks to theDamian, Perl does!

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
I remember suggesting that it would be nice if Perl 6 would allow a scalar to be treated syntactically as a list of chars: \$scalar3 = 'x'; etc. It would allow a single routine to be polymorphic to arbitrary lists and scalars and simplify the above, but I guess it wouldn't work for many other cases. And funny thing, ruby gives me idea on adding some for my handmade jewelry. lol
Re: Yet Another Rosetta Code Problem (Perl, Ruby, Python, Haskell, ...)
by pKai (Priest) on Aug 13, 2007 at 08:34 UTC
My favourite solution from … was given in the first response by tye:

Seems like he has a boiler plate reply prepared for this questions ;-). It was also his reply when I brought up that question with my very first post at PM. And it is a good solution.

Beginning with ysth's reply to my question there was some in-depth discussion of using ??{} to approach this problem. And for me this solution has some appeal:

my @split_on_char_change = \$string =~ /((??{'(.)\1*'}))/g;

Re: Yet Another Rosetta Code Problem (Perl, Ruby, Python, Haskell, ...)
by Anonymous Monk on Sep 11, 2007 at 08:47 UTC
In JavaScript it's pretty concise, but still understandable:
var arr = "ZBBBCZZ".match(/((.)\2*)/g);
I really like that Javascript version.

You either like or hate regexps for this kind of thing, but as regexp-based solutions go "ZBBBCZZ".match(/((.)\2*)/g) is damn concise.

I couldn't get the perl version to be that concise... pity.

I hope perl6's Rules allow one to specify a sub-match that can be referenced and then thrown away.

-David

```    "ZBBBCZZ".comb(/(.)\$0*/)
I think that does what you want here, though in this case it's the .comb method that's forgetting the \$0 capture, not the regex, so you raise an interesting point for the general case.

It's not specced yet, but the current standard grammar has two rules called "same" and "differ" that could be used for this, along with the "quantify via separator" form:

```    "ZBBBCZZ" ~~ m:g/ . ** <?same> /)
Though that is also kinda cheating on your general point by not capturing anything in the first place...

At the moment, forgetting a capture within Perl 6 regex involves forcefully rebinding the capture to nothing, or returning an alternate result via a closure. Perhaps there should be a way to declare temp bindings that are not intended for the final match. But maybe not, if it's just going to turn into an obscure feature. So more likely it'll be some kind of context that just happens to work that way when you expect it to, like .comb does, only inside regex syntax.

I couldn't get that javascript version to work. Sure, it looks nice. It looks very much like something many tried as a first stab in Perl. But I'm not convinced it works in javascript any better than it works in Perl. I was not able to get it to return anything (while in Perl it returns too much).

Update: Thanks, eB. I was eventually able to get your example to demonstrate that the code works.

- tye

Re: Yet Another Rosetta Code Problem (Perl, Ruby, Python, Haskell, ...)
by tadman (Prior) on Sep 11, 2007 at 03:44 UTC
My take on a slightly more concise Ruby version:
```s = "ZBBBCZZ"

x = s.scan(/((.)\2*)/).map{|v|v[0]}
Generic version in the D programming language
```\$ cat test4.d && gdc test4.d -o test4 && echo "----" && ./test4
import std.stdio;

T[][] group(T)(T[] array) {
T[][] res;
foreach (index, elem; array) {
if (index == 0 || elem != array[index-1])
res~=(T[]).init;
res[\$-1]~=elem;
}
return res;
}

void main() { writefln(group("ZBBBCZZ")); }
----
[Z,BBB,C,ZZ]
Re: Yet Another Rosetta Code Problem (Perl, Ruby, Python, Haskell, ...)
by Anonymous Monk on Sep 11, 2007 at 03:12 UTC
My take on solution in Erlang:
```-module(rosetta).
-compile(export_all).

test() ->
lists:reverse(lists:foldl(fun rosetta/2, [], "ZYEEEEEABBCCCDDDDF")).

rosetta(H, [[H|T]|TT]) ->
[[H,H|T]|TT];
rosetta(H, L) ->
[[H]|L].
Another erlangish
```-module(rosetta).
-compile(export_all).

test() ->
rosetta("ZYEEEEEABBCCCDDDDF").

rosetta([H|T]) ->
rosetta([], [H], T).

rosetta(L, [H|_]=Act, [H|T1]) ->
rosetta(L, [H|Act], T1);

rosetta(L, Act, [H|T]) ->
rosetta([Act|L], [H], T);

rosetta(L, Act, []) ->
lists:reverse([Act|L]).
maciek
Re: Yet Another Rosetta Code Problem (Perl, Ruby, Python, Haskell, ...)
by LanX (Saint) on Oct 20, 2021 at 14:25 UTC
I was missing a split solution with lookbehind and lookahead here, so I wanted to add it.

(step by step demo with 'perl -de0')

```  DB<134> p \$in
ZBBBCZZ
DB<135> x split /(?<=(.))(?!\1)/, \$in
0  'Z'
1  'Z'
2  'BBB'
3  'B'
4  'C'
5  'C'
6  'ZZ'
7  'Z'

Alas I couldn't find a way to "forget" the group needed for the backreference \1 °

But this could be turned into something more flexible by applying "pair" functions from List::Util (which were probably not available back then)

```  DB<137> use List::Util qw/pairkeys pairvalues/

DB<138> x pairkeys  split /(?<=(.))(?!\1)/, \$in
0  'Z'
1  'BBB'
2  'C'
3  'ZZ'
DB<139> x pairvalues  split /(?<=(.))(?!\1)/, \$in
0  'Z'
1  'B'
2  'C'
3  'Z'
DB<140>

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

°) but I didn't search long, what I found is /n modifier, to turn each (...) into (?:...) , but then \1 isn't usable anymore

##### Update

Tho shorter tye's solution without split fits better here

```  DB<8> use List::Util qw/pairkeys/

DB<9> \$_='ZBBBCZZ'

DB<10> x pairkeys /((.)\2*)/g
0  'Z'
1  'BBB'
2  'C'
3  'ZZ'
DB<11>

Here's a way to "forget" the group...

```#!/usr/bin/perl

use strict; # https://perlmonks.org/?node_id=11137799
use warnings;

\$_='ZBBBCZZ';

my @list = split ' ', s/(.)\K(?!\1)/ /gr;
use Data::Dump 'dd'; dd "given \$_ got", \@list;

Outputs:

```("given ZBBBCZZ got", ["Z", "BBB", "C", "ZZ"])
> a way to "forget" the group...

Ha! Someone already found a generic solution 14 years ago....:)

``` DB<1> x "ZBBCZZ" =~ /(??{'(.)\1*'})/g;
0  'Z'
1  'BB'
2  'C'
3  'ZZ'
DB<2>

Brillant! :)

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

Re: Yet Another Rosetta Code Problem (Perl, Ruby, Python, Haskell, ...)
by eyepopslikeamosquito (Archbishop) on Sep 13, 2007 at 02:53 UTC

For cheap thrills, a Perl 5 version using reduce:

```use List::Util qw(reduce);
my \$s = "ZBBBCZZ";
my \$x = reduce { if (@\$a && index(\$a->[-1], \$b) >= 0) { \$a->[-1] .= \$b
+ } else { push @\$a, \$b } \$a } [], split //, \$s;

I like the scan/map solution of ruby but I must admit too, that the javascript solution comes out cleanest :) "ZBBBCZZ".match(/((.)\2*)/g)
Re: Yet Another Rosetta Code Problem (Perl, Ruby, Python, Haskell, ...)
by Anonymous Monk on Sep 11, 2007 at 23:25 UTC
That would be:
```#!/usr/bin/env python

str = "ABBBCCDDZ"

parts = []
for i in range(len(str)):
if len(parts)>0:
if parts[len(parts)-1][len(parts[len(parts)-1])-1]==str[i]:
parts[len(parts)-1] += str[i]
else:
parts.append(str[i])
else:
parts.append(str[i])

print parts
- Antonio Ognio, Lima-Peru

Another solution, much more verbose but certainly more readable:

```#!/usr/bin/env python

str = "ABBBCCDDZ"

parts = []
last_char = ''
current_chunk = ''
for i in range(len(str)):
current_char = str[i]
if (current_chunk == '') or (last_char == current_char):
current_chunk = current_chunk + current_char
else:
parts.append(current_chunk)
current_chunk = current_char
last_char = current_char
if len(current_chunk)>0:
parts.append(current_chunk)
print parts

Also available here:

http://pastebin.com/f1213db97

- Antonio Ognio, Lima-Peru

I find them equivalently unreadable ;-)
Gnah. No No and No.

The itertools version is quite lovely, but if you must do an iterative version please at least make it idiomatic. If you really needed an index, you could at least use enumerate() ;-P

```mystr = "ABBBCCDDZ"

def get_iterparts(mystr):
buf = ""
for char in mystr:
if not buf or buf.endswith(char):
buf += char
else:
yield buf
buf = char
if buf:
yield buf
print list(get_iterparts(mystr))

def get_parts(mystr):
parts = []
buf = ""
for char in mystr:
if not buf or buf.endswith(char):
buf += char
else:
parts.append(buf)
buf = char
if buf:
parts.append(buf)
return parts
print get_parts(mystr)
Re: Yet Another Rosetta Code Problem (Perl, Ruby, Python, Haskell, ...)
by Kjetil_Skotheim (Initiate) on Oct 03, 2008 at 20:34 UTC
Shorter Perl 5 variants: Destroys \$s: \$s="ZBBBCZZ"; \$s=~s/((.)\2*)/push@x,\$1/eg; Nondestructive: \$s="ZBBBCZZ"; \$s=~s/((.)\2*)/push@x,\$1;\$1/eg; Needs a counter \$i: \$s="ZBBBCZZ"; @x=grep++\$i%2,\$s=~/((.)\2*)/g;
Re: Yet Another Rosetta Code Problem (Perl, Ruby, Python, Haskell, ...)
by Anonymous Monk on Dec 11, 2011 at 22:17 UTC
``` > s.chars.chunk { |c| c }.map { |c, cs| xs.join }
=> ["Z", "BBB", "C", "ZZ"]
Re: Yet Another Rosetta Code Problem (Perl, Ruby, Python, Haskell, ...)
by Anonymous Monk on Sep 11, 2007 at 23:20 UTC

This is an imperative solution in Python:

http://pastebin.com/f6a5389b0

- Antonio Ognio, Lima-Peru.

Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://632023]
Approved by bingos
Front-paged by clinton
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?