Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

regex escaping forward slash in regex

by gilemon (Initiate)
on Dec 09, 2009 at 16:22 UTC ( [id://811960]=perlquestion: print w/replies, xml ) Need Help??

gilemon has asked for the wisdom of the Perl Monks concerning the following question:

I'm doing a Perl script to replace 5.2 deprecated PHP functions like split.
I'm hitting a problem when it comes to replace something like
 split('/',$string)
to
preg_split('/\//',$string)

so far I came with this regex:

s/[^_]split\s*?\(\s*?(["'])((?:\\?.)*?)\1/preg_split($1\/$2\/$1/g

but this obviously doesn't work if the regex contains the forward slash character.
I also tried this:

s/[^_]split\s*?\(\s*?(["'])((?:\\?.)*?)\1/preg_split($1\/\Q$2\E\/$1/g

which escapes too many things as it escapes every meta characters.


Any secret escape sequence code that only escape forward slashes?
Any idea?

Replies are listed 'Best First'.
Re: regex escaping forward slash in regex
by moritz (Cardinal) on Dec 09, 2009 at 16:45 UTC
    Any secret escape sequence code that only escape forward slashes?
    s{/}{\\/}g

    Don't try to do it all at once. Write a simple function that escape forward slashes, and call that in the replacement part of your substitution. You can use s{}{}ge to evaluated the replacement part, and comfortably call functions there.

Re: regex escaping forward slash in regex
by Marshall (Canon) on Dec 09, 2009 at 17:59 UTC
    I don't know much about PHP, but here is a straightforward idea that might make some sense or I hope at least give you something to work from. The general idea is to break this down into smaller "chunks". I assume that the code that your Perl program "re-writes" does compile and that parens match up, etc. Compressing the spaces is also a simplifying assumption.
    #!/usr/bin/perl -w use strict; my $line = q{ split( '/', $string ) }; $line =~ s/ //g; #compress spaces #now just "split('/',$string)" if ($line =~ m/^split/) # check for the split keyword { # get (parm1, parm2) of the split, ie the two # things separated by commas within the parens my ($parm1,$parm2) = ($line =~ m/\((.*?),(.*?)\)/)[0,1]; # now get stuff between quotes in parm 1 my $inside_quote = ($parm1 =~ m|'(.*?)'|)[0]; # change any / to \/ $inside_quote =~ s|/|\\/|g; #now just print back out print "preg_split('/$inside_quote/',$parm2)\n"; } __END__ Prints: preg_split('/\//',$string)
Re: regex escaping forward slash in regex
by ikegami (Patriarch) on Dec 09, 2009 at 17:27 UTC

    What's with "\s*?"? Don't you simply mean "\s*"?

    Anyway,

    s/ (?<!_) ( split \s* \( \s* ) ( " (?:[^\\"]+|\\.)* " | ' (?:[^\\']+|\\.)* ' ) / my @x = ($1,$2); $x[1] =~ s{/}{\\/}g; "preg_$x[0]$x[1]" /xesg

    Untested.

    Will create an error if the "/" is already escaped.

Re: regex escaping forward slash in regex
by JadeNB (Chaplain) on Dec 10, 2009 at 01:19 UTC

    ‘Dumb’ search-and-replaces (and I mean that as a comment on the code, not on you!) always strike fear in my heart; if we can't even parse such a rigid language as XML with regexes (which we can't, right? Or at least no sane person would?), how can we expect correctly to parse the rich grammar of a programming language? I always think of clbuttic.

    If this were my job, the first thing I'd do would be to look at some means of getting at the internal, not textual, representation of a PHP program. The first result for PHP + AST is php-ast; I'm not sure if it does what you want, but you might be able to fold, spindle, and mutilate it, or else just look further down in the results.

      Dumb search-and-replaces always strike fear in my heart

      I do dumb search-and-replaces a hundred times a day, mostly with my editor's search and replace box. There's nothing wrong with it if the process is supervised.

      In the OP's case, he can use a visual diff tool — I love Beyond Compare — to compare the pre- and post-change versions of his files and fix the mistakes.

      And then he has to do a manual search to change the ones the tool didn't catch.

Re: regex escaping forward slash in regex
by gilemon (Initiate) on Dec 10, 2009 at 09:01 UTC
    Thanks a LOT for the answers.

    It's now working as I wanted here:
    http://nthinking.net/scripts/php5-3Migration/repDeprec.php

    Basically as you said I had to separate the job in easier chunks.

    As for the search and replace "paradigm", I do agree it's not the best solution...It is actually a very stressful way to go. I have been bumping into automatic code modification for few times now and the AST approach is definitely the way to go.
    My solution can't deal with cases like:
    split($myreg,$string)
    where:
    $myreg = '/'

    But I guess this approach will do enough for what it is intend for...

    Thanks Again!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://811960]
Approved by wfsp
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2024-03-29 14:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found