Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Regexp: can I do it in one go?

by moxliukas (Curate)
on Aug 22, 2002 at 11:13 UTC ( #191980=perlquestion: print w/replies, xml ) Need Help??
moxliukas has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

I have been writing a regexp that would transform this:

$s = 'aaaabababbbbaaaccccbbbbbbaadddd';


$s = '4ababa4b3a3a4c6b2a4d';

Basicly it is something similar to mathematical series test (ummm... not sure if this is the correct translation from Lithuanian) where subsequent occurrences of the same character are counted (except no number would be inserted if there is only one character).

I have been trying to come up with a regexp that would do this transformation and I got to the point where everything works:

$s = 'aaaabababbbbaaaccccbbbbbbaadddd'; $s =~ s"($_{2,})"length($1).$_"ge for ('a'..'d'); print $s;

However I am not very happy with the for loop. I wonder if the same can be achieved in one regexp, without the need to scan the line for each character. Can character classes be somehow involved in the regexp to avoid looping?

Thanks for any help in advance.

Replies are listed 'Best First'.
Re: Regexp: can I do it in one go?
by jmcnamara (Monsignor) on Aug 22, 2002 at 11:34 UTC

    You can use a backreference to obtain a single regex:
    #!/usr/bin/perl -wl use strict; my $s = 'aaaabababbbbaaaccccbbbbbbaadddd'; print $s; $s =~ s/((.)\2+)/length($1) . $2/eg; print $s; __END__ Prints: aaaabababbbbaaaccccbbbbbbaadddd 4ababa4b3a4c6b2a4d


      Thanks a lot. I can't believe that I didn't think about it this way ;)

      Thank you again

Re: Regexp: can I do it in one go?
by Arien (Pilgrim) on Aug 22, 2002 at 11:31 UTC

    What you want to do is globally match a something including possible repetitions, and replace what you've found with that something followed by the length of your match:

    $s =~ s/((.)\2*)/$2 . length $1/eg;

    — Arien

    Edit: It seems I misread the output you want. To only have sequences of two or more repeated letters replaced, change the star to a plus sign. (And after some sleep...) Also, you'd want to swap length $1 and $2 to have the length preceed the letter.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://191980]
Approved by simon.proctor
NodeReaper patrols the perimeter

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2018-06-23 04:56 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (125 votes). Check out past polls.