Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Define regex substitution $1,$2,... from a string

by sylph001 (Sexton)
on May 24, 2016 at 05:49 UTC ( [id://1163925]=perlquestion: print w/replies, xml ) Need Help??

sylph001 has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

Today I'm trying to define a regex substitution from a parameter which is received from outside.

In the s/<regex>/<replacement>/ structure, both the <regex> and the <replacement> are passed from other strings.

Seems the <regex> part works well, however for the <replacement> part, there is a '$1' to capture what's matched from the <regex> part and this seems not working.

In fact, it looks like the '$1' was not interpretted at all.

 

My script is like the following:

$a = 'https://www.domain.com/aaa/bbb/file.zip'; $from = 'aaa.*\/(.+)$'; $to = 'file_publish?file=$1'; $a =~s/$from/$to/i; print "\$1: $1\n"; print "RESUTL: $a\n";

 

The result is like:

> ./pt_demo.pl
$1:  file.zip
RESUTL: https://www.domain.com/file_publish?file=$1

 

But what I need is to replace the '$1' with the actual file name 'file.zip', so the result I actually need is like:

RESUTL: https://www.domain.com/file_publish?file=file.zip

 

My monks, could you please cast your light of wisdom on me to help me get the '$1' work regex substitution?

 

My best wishes

Replies are listed 'Best First'.
Re: Define regex substitution $1,$2,... from a string
by afoken (Chancellor) on May 24, 2016 at 06:03 UTC

    s/// has an /e option that can be used twice:

    Update: example

    >perl -E '$_="h:/p/f.zip";$replace=q[qq<x?f=$1>];s|([^/]+)$|$replace|e +e;say' h:/p/x?f=f.zip

    (End of update)

    Make sure that your strings are properly validated, because the double /e option is nothing else than a string eval. So if the replacement string contains qx|rm -rf /|, you'll have a bad day:

    >perl -E '$_="ABC";$replace="qx|head -5 /etc/passwd|";s/B/$replace/ee; +say' Aroot:x:0:0::/root:/bin/bash bin:x:1:1:bin:/bin:/bin/false daemon:x:2:2:daemon:/sbin:/bin/false adm:x:3:4:adm:/var/log:/bin/false lp:x:4:7:lp:/var/spool/lpd:/bin/false C

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Hello afoken,

      ++ for the s///ee solution. In sylph001’s original problem, your version of the replacement string would become:

      my $to = '"file_publish?file=$1"';

      FWIW, an explicit concatenation could also be used:

      my $to = '"file_publish?file=" . $1';

      I always have trouble coming up with the correct syntax for the replacement part (RHS) of an s///ee expression. It seems the first /e just de-stringifies the contents of $to, and then the second /e evaluates the resulting expression. (In this case, the evaluation consists in a variable interpolation and a string concatenation.)

      Hope that helps (someone),

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      I spent some more time and am able to reproduce the example now.

      Thank you again for this.

       

      On the other hand, you are right that the evaluated string may expose some risk especially I'm planning to let other people define this regex string from command parameters.

      So, just limit it in the perl program scope, do you have any suggetions how we could easily check through these regex strings to make sure they do no harm?

        On the other hand, you are right that the evaluated string may expose some risk especially I'm planning to let other people define this regex string from command parameters.

        So, just limit it in the perl program scope, do you have any suggetions how we could easily check through these regex strings to make sure they do no harm?

        Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. ;-)
        -- attributed to jwz

        Yes, string eval is problematic with arbiatary input. But: How much harm can be done?

        If the script runs with user privileges (not root privileges) from a login shell, the worst thing that can happen is that the user executes some perl code with his/her own privileges. Nothing worse can happen than when you allow the user to execute perl -e SOMETHING. "Here's some rope, try not to hang yourself."

        If the script runs with user privileges, but from a network context (started from a webserver, from a mail handler, from an IRC bot, from a restricted shell invoked by some networked program), s///ee might be a bad idea. It allows an attacker to execute arbitary perl code, and from there, arbitary code on the machine, with user privileges.

        If the script runs with root privileges, perhaps using a setuid wrapper, s///ee is a plain stupid idea. Any user could run arbitary perl code, and from there, arbitary code on the machine, with root privileges.

        Root privileges + network + s///ee is the digital equivalent of the Darwin award.


        Now what?

        You could try a two-step aproach. First extract all interesting parts of the input, collect that in an array. Then use a template string and replace markers in the template with array elements.

        #!/usr/bin/perl use strict; use warnings; # user input: my $regexp='^http://www.example.com/(\d+)/([a-z]+.zip)$'; my $template='https://secure.foobar.lan/fetch?file=$2&id=$1&missingVal +ue=$3'; # data my $url='http://www.example.com/12345/foo.zip'; # actual program my @array=($url=~/$regexp/); my $result=$template; do { no warnings 'uninitialized'; $result=~s/\$(\d+)/$array[$1]/ge; }; print "'$url' rewritten to '$result'\n";

        Output:

        'http://www.example.com/12345/foo.zip' rewritten to 'https://secure.fo +obar.lan/fetch?file=&id=foo.zip&missingValue='

        <Update>Note that @array is empty if nothing was matched (to be more precise: nothing was captured). You may want to abort with an error if @array is empty.</Update>

        So, we can do that with just one /e. Is it secure now?

        NO, sorry. Regular expression patterns may contain arbitary perl code:

        my $regexp='^(?{system "uname -a"})http://www.example.com/(\d+)/([a-z] ++.zip)$'; # everything else unchanged

        My relatively new perls (5.14.1, 5.18.1) refuses to execute the code, even the old 5.8.8 from a fresh Slackware 12.1 (dated 2008) installation in a virtual machine:

        Eval-group not allowed at runtime, use re 'eval' in regex m/^(?{system + "uname -a"})http://www.example.com/(\d+)/([a-z]+.zip)$/ at pmre.pl l +ine 15.

        The really ancient 5.005_03 from Slackware 4.0 (released in 1999) has a different error message:

        /^(?{system "uname -a"})http://www.example.com/(\d+)/([a-z]+.zip)$/: E +val-group not allowed at runtime, use re 'eval' at pmre.pl line 15.

        BUT: Once you use re 'eval';, or someone manages to trick perl into importing eval from re, this little bit of protection is gone and all perls back to at least 5.005_03 start executing arbitary code from the rexexp:

        $ perl pmre.pl Linux slack121 2.6.24.5-smp #2 SMP Wed Apr 30 13:41:38 CDT 2008 i686 A +MD FX(tm)-8320 Eight-Core Processor AuthenticAMD GNU/Linux 'http://www.example.com/12345/foo.zip' rewritten to 'https://secure.fo +obar.lan/fetch?file=&id=foo.zip&missingValue=' $

        Access to perl without s///ee, and thus the ability to execute arbitary code. Ouch.

        <Update>You may want to use Safe for executing the critical code. But then, Safe is trying to prevent "evil" things instead of only allowing "secure" things. One bug in Safe and you loose again. And even Safe can't prevent everything, see RISKS in Safe.</Update>

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Thank you for your suggestion.

      Could you kindly let me know what's wrong with my commands when I'm trying reproduce your example as below?

      @$> perl -e '$_="h:/p/f.zip";$replace=q[qq<x?f=$1>];s/([^/]+)$/$replac +e/e+e;say' Scalar found where operator expected at -e line 1, near "s/([^/]+)$/$r +eplace" Final $ should be \$ or $name at -e line 1, within string syntax error at -e line 1, near ";s/([^/]+)$/" Execution of -e aborted due to compilation errors. @mnsdev10> perl -v This is perl, v5.8.8 built for sun4-solaris Copyright 1987-2006, Larry Wall Perl may be copied only under the terms of either the Artistic License + or the GNU General Public License, which may be found in the Perl 5 source ki +t. Complete documentation for Perl, including FAQ lists, should be found +on this system using "man perl" or "perldoc perl". If you have access to + the Internet, point your browser at http://www.perl.org/, the Perl Home Pa +ge. @$> perl -e '$_="h:/p/f.zip";$replace=q[qq<x?f=$1>];s|([^/]+)$|$replac +e|ee;say' @$>
        There's no e+e . Also, if you want to use s/// with slashes, you need to backslash literal slashes. Moreover, there's no say in 5.8.8. Use perl -l and print instead.
        perl -le '$_="h:/p/f.zip";$replace=q[qq<x?f=$1>];s=([^/]+)$=$replace=e +e;print'

        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Define regex substitution $1,$2,... from a string
by ablanke (Monsignor) on May 24, 2016 at 07:25 UTC
    If your parameter is always an URI, you could use the CPAN Modules URI and File::Basename.
    use File::Basename; use URI::Split qw(uri_split uri_join); my @uri = uri_split('https://www.domain.com/aaa/bbb/file.zip'); $uri[3] = 'file='.basename($uri[2]); $uri[2] = 'file_publish'; my $uri = uri_join(@uri); print $uri."\n";
Re: Define regex substitution $1,$2,... from a string
by AnomalousMonk (Archbishop) on May 24, 2016 at 15:14 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1163925]
Approved by beech
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2024-04-23 06:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found