s/((\D)\2+)/length($1).$2/ge;
it search for a non-number, then search if it's repeated 1 or more times, then replace it by the count of them followed by the the repeated char
Oha
PS: could be usefull to limit the number of repeated matches, and include the numbers. but in this case numbers must always have a counter:
s/((\D)\2{1,8}|(\d)\3{0,8})/length($1).$2.$3/ge;
this one will compare 2 or more char, or 1 or more if digit and match no more then a sequence of 9.
in this way you can decode the data with no side-effect if strange data is used:
X2AAAAAAAAAAAAAAAAAAAAAAA1111 become X129A9A5A41
A repetitions are grouped up to a max of 9, numbers are counted as repetition also if not repeated.
to decode use:
s/(\d)(.)/$2x$1/ge;
Oha | [reply] [Watch: Dir/Any] [d/l] [select] |
$string =~ s/(.)\1+/length($&).$1/eg;
90% of every Perl application is already written. ⇒ | dragonchild |
| [reply] [Watch: Dir/Any] [d/l] |
My guess is you'll need to decode it at some point too. This all very similar to run length encoding, though I think RLE encodes even single character occurrences. So here's short snippet to encode/decode which you can modify to suit your needs:
use strict;
sub encode {
s/((.)\2+)/(length $1) . $2/eg;
$_;
}
sub decode {
$_ = shift;
my @list;
while (/((\d+)?(.))/g) {
push @list, [$2,$3];
}
join '', map { (defined $_->[0]) ? $_->[1] x $_->[0] : $_->[1]; }
+@list;
}
while (<DATA>) {
print;
my $enc = encode($_);
my $dec = decode($enc);
print $enc;
print $dec;
}
__DATA__
XYZAAAAAAAADEFAAcdAA
Which gives the following output:
XYZAAAAAAAADEFAAcdAA
XYZA8DEFA2cdA2
XYZAAAAAAAADEFAAcdAA
---
s;;:<).>|\;\;_>?\\^0<|=!]=,|{\$/.'>|<?.|/"&?=#!>%\$|#/\$%{};;y;,'} -/:-@[-`{-};,'}`-{/" -;;s;;$_;see;
Warning: Any code posted by tuxz0r is untested, unless otherwise stated, and is used at your own risk.
| [reply] [Watch: Dir/Any] [d/l] [select] |
You should at least make it symmetric...
sub encode { $_ = shift; s/((\D)\2+)/length($1).$2/eg; $_ }
sub decode { $_ = shift; s/(\d+)(\D)/$2 x $1/eg; $_ }
| [reply] [Watch: Dir/Any] [d/l] |
And by 'characters' I sincerly hope we are talking about only ALPHAs and no NUMERICs.
ALPHANUMERICS works for RLE because every char gets a count. But if you are only doing multiples, you can't have digits in there as part of your character set.
--
I used to drive a Heisenbergmobile, but every time I looked at the speedometer, I got lost.
| [reply] [Watch: Dir/Any] |