Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Search and replace regex, but only retain a portion of the string

by ghenry (Vicar)
on Dec 22, 2018 at 22:11 UTC ( #1227618=perlquestion: print w/replies, xml ) Need Help??

ghenry has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

It's been a long time, but I know this is the right place to ask. So here goes. I'm looking for a pointer to the right doc section to read in https://perldoc.perl.org/perlre.html to help me with my test code below:

use Modern::Perl; use strict; use warnings; my $dest = q{UK Mobile - Vodafone [GBRVF] [MSRN]}; $dest =~ s/(\W+)\s?(\[MSRN\])?/_/g; say $dest;

which gives me UK_Mobile_Vodafone_GBRVF_MSRN_

I'm trying to turn UK Mobile - Vodafone [GBRVF] [MSRN] into UK_Mobile_Vodafone_GBRVF_, but [MSRN] might not always be there. I want this done all in one substitution too. I think I even want to get to UK_Mobile_Vodafone_GBRVF.

The software I'm doing this in can possibly process the same variable twice, with the second run being a second substitution, so I could make UK Mobile - Vodafone [GBRVF] [MSRN] become UK Mobile - Vodafone [GBRVF], then process it again to become UK_Mobile_Vodafone_GBRVF_

Many thanks,
Gavin.

Replies are listed 'Best First'.
Re: Search and replace regex, but only retain a portion of the string
by choroba (Archbishop) on Dec 22, 2018 at 23:11 UTC
    Using two substitutions is cleaner:
    $dest =~ s/\W+\[MSRN\]|\W+/_/g; $dest =~ s/_$//;
    i.e. replace sequences of non-word characters by underscores, then remove the last underscore.

    I can't think of a non-hackish solution with one substitution only.

    $dest =~ s/(?:\W+(?:MSRN\])?|\W+)(?(?=$)(?{$%=1})|(?{$%=0}))/$%?'' +:'_'/ge;
    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      Of course. For goodness sake, an OR. Thanks a lot! I can live with <code>_</code.

Re: Search and replace regex, but only retain a portion of the string
by tybalt89 (Parson) on Dec 23, 2018 at 01:12 UTC
    #!/usr/bin/perl -l # https://perlmonks.org/?node_id=1227618 use strict; use warnings; my $want = q{UK_Mobile_Vodafone_GBRVF}; my $dest = q{UK Mobile - Vodafone [GBRVF] [MSRN]}; print $dest; $dest =~ s/( )|- |\[|\].*/ '_' x !!$1 /ge; print $dest; print $want;

      Very nice. Will have a play. Thank you.

Re: Search and replace regex, but only retain a portion of the string (updated x4)
by AnomalousMonk (Bishop) on Dec 22, 2018 at 22:51 UTC
    I think I even want to get to UK_Mobile_Vodafone_GBRVF.

    Assuming that's really what you want:

    c:\@Work\Perl\monks>perl -wMstrict -le "use constant T => q{UK Mobile - Vodafone [GBRVF] [MSRN]}; ;; my $dest = T; $dest =~ s{ \] \s+ \S+ \z }{}xms; print qq{2 steps: a: '$dest'}; ;; $dest =~ s{ \W+ }{_}xmsg; print qq{2 steps: b: '$dest'}; ;; $dest = T; ;; $dest =~ s{ \W+ (\w+ \] \z)? }{ $1 ? '' : '_' }xmsge; print qq{1 step: '$dest'}; " 2 steps: a: 'UK Mobile - Vodafone [GBRVF' 2 steps: b: 'UK_Mobile_Vodafone_GBRVF' 1 step: 'UK_Mobile_Vodafone_GBRVF'

    Update 1: Or (with inspiration from tybalt89):
        $dest =~ s{ \W+ (\w+ \] \z)? }{ !$1 && '_' }xmsge;

    Update 2: Or with no eval:

    c:\@Work\Perl\monks>perl -wMstrict -le "my $dest = q{UK Mobile - Vodafone [GBRVF] [MSRN]}; ;; my @xlate = ('', '_'); ;; $dest =~ s{ \W+ (\w+ \] \z)? }{$xlate[! $1]}xmsg; print qq{'$dest'}; " 'UK_Mobile_Vodafone_GBRVF'
    With Perl version 5.10+, you could use a persistent state variable:
        state $xlate = [ '', '_' ];
    (changing the replacement expression, of course).
    (The original code of this update had a  no warnings 'uninitialized'; statement which turned out to be unneeded. Removed.)

    Update 3: Or another two-stepper:

    c:\@Work\Perl\monks>perl -wMstrict -le "my $dest = q{UK Mobile - Vodafone [GBRVF] [MSRN]}; ;; $dest =~ tr{A-Za-z}{_}cs; $dest =~ s{ _ [^_]+ _ \z }{}xms; ;; print qq{'$dest'}; " 'UK_Mobile_Vodafone_GBRVF'

    Update 4: Or:

    c:\@Work\Perl\monks>perl -wMstrict -le "my $dest = q{UK Mobile - Vodafone [GBRVF] [MSRN]}; $dest =~ s{ \W+ (\w+ \] \z)? }{ [ '', '_' ]->[! $1] }xmsge; print qq{'$dest'}; " 'UK_Mobile_Vodafone_GBRVF'
    I'm going to bed now. (Update: I don't know why I based this solution on an anonymous array;  ('', '_')[! $1] is a bit less obscure and probably slightly faster. But beyond that, this eval-ed solution doesn't really offer anything more than the ternary expression used in the eval one-step in my original reply. Well, it was late...)


    Give a man a fish:  <%-{-{-{-<

      Thanks. Amazing effort. Always stunned when someone spends a lot of time on these.

Re: Search and replace regex, but only retain a portion of the string
by kschwab (Vicar) on Dec 23, 2018 at 20:18 UTC
    If the rules are "change any sequence of characters that isn't a-zA-Z into a single underscore", you might like tr// better than a regex:
    $ perl my $s="UK Mobile - Vodafone [GBRVF] [MSRN]"; $s=~tr/a-zA-Z/_/cs; print "$s\n"; ^D UK_Mobile_Vodafone_GBRVF_MSRN_

    Still needs the second pass to get rid of MSRN_, but clean and straightforward otherwise.

    Edit: I see someone already updated another comment node to suggest this, apologies for the dupe.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1227618]
Approved by Athanasius
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2020-07-05 13:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?