Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

How can I reduce this with a REGEX?

by misterperl (Beadle)
on Mar 14, 2014 at 15:30 UTC ( #1078350=perlquestion: print w/replies, xml ) Need Help??
misterperl has asked for the wisdom of the Perl Monks concerning the following question:

I'm not too shabby at REGEXes but this one isn't behaving as I'd expect.

I want to remove all redundant numbers and their separators in a string (the numbers are always 2 digits) like:

12-12 -> 12
12-12-12 -> 12
12-13-12-13 -> 12-13
12-12-13-13- -> 12-13
12-13-13-14 -> 12-13-14

and so on.

I figured this was a good start:

s/(\d\d)(-\d\d)*?(-\1)/$1$2/g;

but although it DOES remove a number, it (to me, unexpectedly) only removes one. I read that as:
FIND A NUMBER
LOOK FOR other seperator / numbers, miminally
find a seperator and matching number to the first one
replace it with the initial number and the minuimally matched stuff
repeat

I get this:
12-14-12-12 -> 12-14-12

I realize I could put this in some sort of while () loop until it no longer matches but that seems like it should be unnecessary.

I know I need a non-consumiung match on the (-\d\d)*? part so I tried: ?=(-\d\d)*? instead, but then it never matched ANYTHING....

Wise ones please enlighten me? Perhaps there is a switch that tells the regex to go back to ^ after each substitution?

Thank-You.

Replies are listed 'Best First'.
Re: How can I reduce this with a REGEX?
by pvaldes (Chaplain) on Mar 14, 2014 at 15:50 UTC

    You don't need to use regexes for this. In pseudocode:

    use List::MoreUtils qw{uniq} while (<FH>){ chomp; convert each line to an array of numbers splitting by "-" sort the resulting list of numbers and apply uniq to this list: my @list_of_unique_items = uniq @list; join or print again each element followed by a "-" chomp again if needed to delete the last "-" undef the array process the next line
      ..appreciate it guys but I'm trying to make this solution simpler & shorter- not longer!

      This isn't too bad I spose.. Seems to work in a 1-liner:
      s/(\d\d)((-\d\d)*?)(-\1)/$1$2/g while /(\d\d).*(-\1)/;
        Which can probably be further reduced to:
        $ cat 1078350.in 12-12 12-12-12 12-13-12-13 12-12-13-13 12-13-13-14 $ perl -pe '1 while s/(\d\d)(.*?)(?:-\1)/$1$2/g' 1078350.in 12 12 12-13 12-13 12-13-14
Re: How can I reduce this with a REGEX?
by hdb (Monsignor) on Mar 14, 2014 at 17:48 UTC

    Also using a hash but in a different way:

    use strict; use warnings; my @strings = qw( 12-12 12-12-12 12-13-12-13 12-12-13-13 12-13-13-14 ) +; for (@strings) { my %hash; s/(\-?)(\d\d)/ $hash{$2}++ ? '' : $1.$2 /ge; print "$_\n"; }
Re: How can I reduce this with a REGEX?
by johngg (Abbot) on Mar 14, 2014 at 23:38 UTC

    Another way.

    $ perl -Mstrict -Mwarnings -E ' say join q{-}, do { my %seen; grep { not $seen{ $_ } ++ } split m{-}; } for qw{ 12-12 12-12-12 12-13-12-13 12-12-13-13 12-13-13-14 };' 12 12 12-13 12-13 12-13-14 $

    I hope this is helpful.

    Cheers,

    JohnGG

Re: How can I reduce this with a REGEX?
by Laurent_R (Canon) on Mar 14, 2014 at 22:40 UTC
    Or using a data pipeline with sort and grep:
    use strict; use warnings; my @strings = qw( 12-12 12-12-12 12-13-12-13 12-12-13-13 12-13-13-14 ) +; for (@strings) { my $prev = -1; # numbers being separated by dashes, I assume that +a negative value is not possible print join "-", grep {$prev != $_ and $prev = $_} sort {$a <=> $b} + split /-/, $_; print "\n"; }
    which prints:
    12 12 12-13 12-13 12-13-14
Re: How can I reduce this with a REGEX?
by Lennotoecom (Pilgrim) on Mar 14, 2014 at 16:46 UTC
    maybe you'll consider hashes?
    while(<DATA>){ %h = (); undef $h{$1} while(s/(\d\d)//); print join ("-", sort keys %h),"\n"; } __DATA__ 12-12 12-12-12 12-13-12-13 12-12-13-13 12-13-13-14
    updated
    $h{$1} = null became undef $h{$1}
    thank you Mr. choroba
      I would also use matching instead of substitution. You mentioned in a private message that the substitution is faster (using such a trick would be worth a comment), but my benchmark doesn't show the effect. To speed things up, rather use my %h instead of %h = (). Output:
      Rate lenno choro lenno 25358/s -- -4% choro 26458/s 4% --

      Perl 5.16.2, x86_64-linux-thread-multi.

      Update: Significantly faster: use a hash slice undef @h{m/\d\d/g}; instead of the inner loop.

      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        Yes, thank you for your reply, I apparently got mistaken,
        I'm deeply sorry. You were right.
        My results:
        use Benchmark(cmpthese); cmpthese(1000000, { 'a' => sub { $_ = '12-12-12-12-12'; undef $h{$1} while (s/(\d\d)//); }, 'b' => sub { $_ = '12-12-12-12-12'; undef $h{$1} while (m/(\d\d)/g); }, 'c' => sub { $_ = '12-12-12-12-12'; undef @h{m/\d\d/g}; } });
        results:
        Rate a b c a 195427/s -- -19% -35% b 241896/s 24% -- -20% c 300933/s 54% 24% --
Re: How can I reduce this with a REGEX?
by Anonymous Monk on Mar 14, 2014 at 15:44 UTC
    still working on it- found a mistake.. this seems to work:

    while ( /(\d\d).*(-\1)/ ) {
    s/(\d\d)((-\d\d)*?)(-\1)/$1$2/g;
    }

    but I still arent in love with having to use the while ()

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1078350]
Approved by mtmcc
help
Chatterbox?
What's the matter? Cat got your tongue?...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2018-07-22 20:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?















    Results (455 votes). Check out past polls.

    Notices?