Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Minimizing the amount of place holders on long identical regex

by thanos1983 (Parson)
on Jun 20, 2018 at 15:53 UTC ( [id://1217011]=perlquestion: print w/replies, xml ) Need Help??

thanos1983 has asked for the wisdom of the Perl Monks concerning the following question:

I am really bad in regex and my best attempt that is working from my point of view is really poor in syntax. I am sure that it can be done in a different way and shorter.

I am currently having a string that it is 24 numerical characters long and I have created a regex to split the string on pieces character by character so I can extract the odd place holders that contain the actual information that I need.

What I have so far is:

#!/usr/bin/env perl use strict; use warnings; my $sample = "041424344454647484940414"; $sample =~ /([0-9])([0-9])([0-9])([0-9])([0-9])([0-9])([0-9])([0-9])([ +0-9])([0-9])([0-9])([0-9])([0-9])([0-9])([0-9])([0-9])([0-9])([0-9])( +[0-9])([0-9])([0-9])([0-9])([0-9])([0-9])/; # 24 times the same patte +rn print "$1$3$5$7$9$11$13$15$17$19$21$23\n"; __END__ $ perl test.pl 012345678901

This is the desired output and it is working but I was wondering if there is a more elegant way to minimize replicating the same group 24 times but also being able to get the odd place holders ($1$3$5...).

I could use potentially split the string character by character and store the output in an array. Where from there I would remove the even elements and reform the array into string with join. But in my case this is not possible as the system that I am writing the regex does not support the split function or join it only supports C format commands syntax, so I am using Perl as a test tool before implementation.

If any one has any idea how to make this regex shorter feel free to drop a comment.

Thanks in advance for your time and effort, BR.

Seeking for Perl wisdom...on the process of learning...not there...yet!

Replies are listed 'Best First'.
Re: Minimizing the amount of place holders on long identical regex
by tybalt89 (Monsignor) on Jun 20, 2018 at 16:05 UTC
    #!/usr/bin/perl # https://perlmonks.org/?node_id=1217011 use strict; use warnings; my $sample = "041424344454647484940414"; my @odds = $sample =~ /(\d)\d/g; print @odds, "\n";

      Hello tybalt89,

      Thank you for your time and effort. The solution is working but in my case I can not using array as an output or join to put the array in a string. This is why I am using place holders. Is there any way to use only odd place holders to store the characters?

      Thanks again for your time and effort.

      Seeking for Perl wisdom...on the process of learning...not there...yet!

        This is starting to sound like an XY problem.

        Can you give a little more context on why you "can not using array as an output or join to put the array in a string"?

        We might be able to help you overcome this with this additional information.

        Best,

        Jim

        πάντων χρημάτων μέτρον έστιν άνθρωπος.

Re: Minimizing the amount of place holders on long identical regex
by hippo (Bishop) on Jun 20, 2018 at 17:05 UTC

    This is not much more elegant but it does save long lines in your source.

    #!/usr/bin/env perl use strict; use warnings; my $sample = "041424344454647484940414"; my $re = '([0-9])' x 24; $sample =~ /$re/; print "$1$3$5$7$9$11$13$15$17$19$21$23\n";

      Hello hippo,

      Thanks for the time and effort. I had no idea that it can be written like this :)

      BR / Thanos

      Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: Minimizing the amount of place holders on long identical regex
by haukex (Archbishop) on Jun 20, 2018 at 17:06 UTC
    But in my case this is not possible as the system that I am writing the regex does not support the split function or join it only supports C format commands syntax, so I am using Perl as a test tool before implementation.

    Does the system you're on (which one - PCRE?) support search and replace?

    my $sample = "041424344454647484940414"; (my $output = $sample) =~ s/.\K.//g; # alternative s/(.)./$1/g; die $output unless $output eq "012345678901";

      Hello haukex,

      Awesome thanks a lot that worked perfectly.

      BR / Thanos

      Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: Minimizing the amount of place holders on long identical regex
by Eily (Monsignor) on Jun 20, 2018 at 16:09 UTC

    perl -lE "print '041424344454647484940414' =~ /(\d).?/g" 012345678901
    The /g flag on regex means global search, so the regex will be applied several times on the string, starting each time from the end of the previous match. In list context (eg, when you affect the result to an array), this returns the list of captures (or, if none, the list of full matches). In my example above, I capture one digit, then try to match another character. If perl does manage to match that additional character (ie, it's not the end of the string), it will start looking after that position on the next attempt.

    More info on that in perlretut and the description of the m// operator in perlop.

      Hello Eily,

      Thank you for your time and effort. The solution is working but in my case I can not use array as an output. This is because I can not use join to put the array in a string. I am using place holders as a solution to this problem.

      #!/usr/bin/env perl use strict; use warnings; use Data::Dumper; my $sample = "041424344454647484940414"; $sample =~ /([0-9])([0-9])([0-9])([0-9])([0-9])([0-9])([0-9])([0-9])([ +0-9])([0-9])([0-9])([0-9])([0-9])([0-9])([0-9])([0-9])([0-9])([0-9])( +[0-9])([0-9])([0-9])([0-9])([0-9])([0-9])/; # 24 times the same patte +rn print "$1$3$5$7$9$11$13$15$17$19$21$23\n"; my $new = "041424344454647484940414"; $new =~ /(\d).?/g; print "$1$3$5$7$9$11$13$15$17$19$21$23\n"; my @array = $new =~ /(\d).?/g; print Dumper \@array; __END__ $ perl test.pl 012345678901 Use of uninitialized value $3 in concatenation (.) or string at test.p +l line 12. Use of uninitialized value $5 in concatenation (.) or string at test.p +l line 12. Use of uninitialized value $7 in concatenation (.) or string at test.p +l line 12. Use of uninitialized value $9 in concatenation (.) or string at test.p +l line 12. Use of uninitialized value $11 in concatenation (.) or string at test. +pl line 12. Use of uninitialized value $13 in concatenation (.) or string at test. +pl line 12. Use of uninitialized value $15 in concatenation (.) or string at test. +pl line 12. Use of uninitialized value $17 in concatenation (.) or string at test. +pl line 12. Use of uninitialized value $19 in concatenation (.) or string at test. +pl line 12. Use of uninitialized value $21 in concatenation (.) or string at test. +pl line 12. Use of uninitialized value $23 in concatenation (.) or string at test. +pl line 12. 0 $VAR1 = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '0', '1' ];

      Thanks again for your time and effort.

      Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: Minimizing the amount of place holders on long identical regex
by choroba (Cardinal) on Jun 20, 2018 at 22:30 UTC
    Is pack/unpack a good solution for you?
    #! /usr/bin/perl use warnings; use strict; my $sample = "041424344454647484940414"; my $expected = '012345678901'; my $result = pack '(a)*', unpack '(ax)*', $sample; use Test::More tests => 1; is $result, $expected;
    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Minimizing the amount of place holders on long identical regex
by bliako (Monsignor) on Jun 21, 2018 at 00:23 UTC

    Just realised that you want to port that to C eventually so below is no good. I would just use ikegami's - don't forget to free.

    You want to separate even/odd: here are 2 variations on tybalt89's which do retain the even matches as well as the odd. Assuming your system can take it:

    use strict; use warnings; # trivial variation on tybalt89's to retain # both odd and even one after the other my $sample = "041424344454647484940414"; my @allin = $sample =~ /(\d)(\d)/g; # more ordered output using subs in regex my @odds_vs_evens = (); $sample =~ s/(\d)(\d)/push(@odds_vs_evens,[$1,$2])/eg; print $_->[0].'->'.$_->[1]."\n" for @odds_vs_evens;
    0->4 1->4 2->4 3->4 4->4 5->4 6->4 7->4 8->4 9->4 0->4 1->4

    But this is what I was after:

    my $sample = "041424344454647484940414"; my %odds_vs_evens = $sample =~ /(\d)(\d)/g;

    Alas it is not ordered and the World just lost some more balance.

    bw, bliako

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1217011]
Approved by Eily
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (5)
As of 2024-04-23 20:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found