http://www.perlmonks.org?node_id=724921

Jts has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Monks,

My regex won't let me do what I want. My goal is to build a compression on a string of digits. I only want to transform patterns of exactly 4 times a 3. I want it to count as many as matches as possible,
count the amount of occurence and for example change it to [ 4 ] or [ 8 ] or [ 12 ] and so on. This must sound very confusing, my example should clear things up. So for example, i have a var I need to transform:

my $toTranslate = "3333333333223333322";

I would like the output to become:
[ 8 ] 3322[ 4 ] 322

The script I made so far looks like this:

#!/usr/str/perl -w use strict; use warnings; my $toTranslate ="3333333333333333333333333333333333333333333333333333 +333333333333333333333333333333333333333333333333333333333333333332333 +333333223333333332333333333333333333333333333333333233333333223333333 +32333333333"; my $tmpFound; my $count = 0; while($toTranslate =~ /(3{4,})+/g) { $tmpFound = $1; $tmpFound = length($tmpFound); $toTranslate =~ s/$1/[$tmpFound]/; } print $toTranslate,"\n";


Output: [ 117 ] 2 [ 9 ] 22 [ 9 ] 2 [ 33 ] 2 [ 8 ] 22 [ 8 ] 2 [ 9 ]

The problem here though is the {4,} being ATLEAST 4 times, so it replaces "3333322" to "[ 5 ] 22".

When I make it: $toTranslate =~ /(3{4})+/g)
I face the next problem, it will store the data on exact matches, and the output will look like [ 4 ] [ 4 ] [ 4 ] 33222[ 4 ] [ 4 ] for example.

Do any of you Monks have a clue how to adjust this regex so that it will give me an output like " [ 80 ] 322[ 36 ] 23323332" etc.?

P.S. I want this done by a regex and not for example a function that calculates the output.

Kind regards,
Joris

Replies are listed 'Best First'.
Re: Regex exact pattern match problem!
by moritz (Cardinal) on Nov 20, 2008 at 16:55 UTC
    This regex should do what you want: ((?:3{4})+)

    With the /e modifier you can simplify your program a bit:

    #!/usr/str/perl use strict; use warnings; my $toTranslate = "333333333333333333333333333333333333333333333333333333333333333333333 +333333333333333333333333333333333333333333333333233333333322333333333 +233333333333333333333333333333333323333333322333333332333333333"; 1 while $toTranslate =~ s/((?:3{4})+)/"[" . length($1) . "]"/ge; print $toTranslate,"\n";
      1 while $toTranslate =~ s/((?:3{4})+)/"[" . length($1) . "]"/ge;
      I don't understand the point of the 1 while in the expression above. The substitution seems to work just fine without it.
      >perl -wMstrict -le "for my $string ('3' x 33332, @ARGV) { $string =~ s{ ( (?:3{4})+ ) }{ '[' . length($1) . ']' }xmsge; print $string; } " 3333333333223333322 3333333333333333333333333333333333333333333333333333333333333333333333 +333333333333333333333333333333333333333333333332333333333223333333332 +33333333333333333333333333333333323333333322333333332333333333 [33332] [8]3322[4]322 [116]32[8]322[8]32[32]32[8]22[8]2[8]3
Re: Regex exact pattern match problem!
by JadeNB (Chaplain) on Nov 20, 2008 at 17:18 UTC
    Note that, in fact, moritz's improved version is essential—without it, your match has the potential to find your replaced text (since the s/// resets pos, so that the //g match starts all over again). For example, if you try your code (even suitably modified) on $toTranslate = 3 x 33332, then you'll get [[4]2], not [33332].

    Note also that Perl goes out of its way to make sure that you can do lots of things that seem like they must be multi-step in a single statement. Thus, as moritz implicitly observed, it's probably better to write $tmpFound = length($1) (well, and use a better name) than $tmpFound = $1; $tmpFound = length($tmpFound).

    UPDATE: On testing, moritz's version does the same thing. I think that you want something like

    my $translated; while ( $toTranslate =~ /(.*?)((?:3{4})*)/g ) { $translated .= $1; $translated .= '[' . length($2) . ']' if $2; }

    UPDATE 2: eye and AnomalousMonk pointed out a better, s///g-based solution: just omit the while.

Re: Regex exact pattern match problem!
by eye (Chaplain) on Nov 20, 2008 at 19:52 UTC
    moritz's regex does work. Calling the replace operator (s///) once, produces the desired result. The solution fails because the replace operator is potentially called more than once by the while loop. I think the easiest solution is to remove the while loop.
    #!/usr/bin/perl use strict; use warnings; my $toTranslate = "333333333333333333333333333333333333333333333333333 +333333333333333333333333333333333333333333333333333333333333333333233 +333333322333333333233333333333333333333333333333333323333333322333333 +332333333333"; $toTranslate =~ s/((?:3{4})+)/"[" . length($1) . "]"/ge; print $toTranslate,"\n";
    The documentation notes that the global replace is not progressive like the match operator (in scalar context). That is only applicable to successive calls to the operators, not for the completion of a single call.
      The solution fails because the replace operator is potentially called more than once by the while loop.

      The solution seems to work with or without the while, either way producing identical results. (I changed the 2s to 7s as the 2s were difficult to spot.)

      use strict; use warnings; my $toTranslate = q{3333333333773333377}; # 19 # .........1....1.... # ....5....0....5.... print qq{Sequences of 3s:-\n}; print qq{[@{ [ length $1 ] }]\n} while $toTranslate =~ m{(3+)}g; ( my $translated = $toTranslate ) =~ s{((?:3{4})+)}{ qq{[@{ [ length $1 ] }]} }eg; print qq{@{ [ q{-} x 50 ] }\n$translated\n@{ [ q{-} x 50 ] }\n}; 1 while $toTranslate =~ s{((?:3{4})+)}{ qq{[@{ [ length $1 ] }]} }eg; print qq{$toTranslate\n@{ [ q{=} x 50 ] }\n}; $toTranslate = q{3333333333333333333333333} # 25 . q{3333333333333333333333333} # 50 . q{3333333333333333333333333} # 75 . q{3333333333333333333333333} # 100 . q{3333333333333333373333333} # 125 . q{3377333333333733333333333} # 150 . q{3333333333333333333333733} # 175 . q{3333337733333333733333333} # 200 . q{3}; # 201 # .........1....1....2....2 # ....5....0....5....0....5 print qq{Sequences of 3s:-\n}; print qq{[@{ [ length $1 ] }]\n} while $toTranslate =~ m{(3+)}g; ( $translated = $toTranslate ) =~ s{((?:3{4})+)}{ qq{[@{ [ length $1 ] }]} }eg; print qq{@{ [ q{-} x 50 ] }\n$translated\n@{ [ q{-} x 50 ] }\n}; 1 while $toTranslate =~ s{((?:3{4})+)}{ qq{[@{ [ length $1 ] }]} }eg; print qq{$toTranslate\n@{ [ q{=} x 50 ] }\n};

      Produces

      Sequences of 3s:- [10] [5] -------------------------------------------------- [8]3377[4]377 -------------------------------------------------- [8]3377[4]377 ================================================== Sequences of 3s:- [117] [9] [9] [33] [8] [8] [9] -------------------------------------------------- [116]37[8]377[8]37[32]37[8]77[8]7[8]3 -------------------------------------------------- [116]37[8]377[8]37[32]37[8]77[8]7[8]3 ==================================================

      I hope this is of interest

      Cheers,

      JohnGG

        As I mentioned above, it doesn't work (unless you really want multiple passes) if the result of the first substitution itself contains 4 consecutive 3's (for example, if the original input has 33332 consecutive 3's). Also, to save yourself some typing, you may want to look at the documentation for the string repetition operator x.

        UPDATE: Thanks to johngg for pointing out that my documentation link was broken.

Re: Regex exact pattern match problem!
by CountZero (Bishop) on Nov 20, 2008 at 19:40 UTC
    I want this done by a regex and not for example a function that calculates the output.
    Why this restriction?

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      Hey CountZero,

      The reason I wanted to do this strictly by a regex was simply because I knew it was possible (and how to) in a normal loop.
      And as I am still in a very young age of learning Perl, I was trying to learn more about regex's.

      You guys gave alot of new insights of ways how to do this, thanks alot guys!

      Regards,
      Joris