Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Replace part of a regex match

by monarch (Priest)
on Dec 24, 2008 at 05:58 UTC ( [id://732419]=note: print w/replies, xml ) Need Help??


in reply to Replace part of a regex match

I know that, in the past, I've just wanted to jam something into $1 - it seems a natural way of expressing what I want to do. Unfortunately Perl doesn't do it this way. So you've got to think of other means. And there are many approaches (see perlre).

One method might be to calculate where the match of your capture finishes, calculate where the capture started, and replace that portion of the string directly using substr.

Example code:

use strict; my $val = 'http://adserver.adtech.de/' . '?addyn|2.0|323|91793|1|277|target=_blank'; my $re = qr/ .*\|.*\|.*\|.*\|.*\| (.*) (?=\|.*) # look-ahead /x; if ( $val =~ m/$re/g ) { print( "found: \"$1\"\n" ); print( "pos is: " . scalar( pos( $val ) ) . "\n" ); print( "behind pos is: \"" . substr( $val, pos( $val ) - length( $1 ), length( $1 ) ) . "\"\n" ); # perform substitution here using calculated offsets substr( $val, pos( $val ) - length( $1 ), length( $1 ) ) = "moo"; print( "$val\n" ); }

The problem is that you have to use a look-ahead to prevent the pos function returning the end of the entire regexp match.. The other problem is that you have to use the /g (global) flag on your match to ensure the position is calculated.

This code replaces the "277" with the word "moo". Try it!

Replies are listed 'Best First'.
Re^2: Replace part of a regex match
by monarch (Priest) on Dec 24, 2008 at 06:09 UTC
    I might also offer the suggestion that the regexp you've chosen is hard work for Perl. It means capture as much as possible, and only then check if a pipe symbol comes afterwards. That there are several of these ensures that this regexp isn't that efficient.

    You might like to try something like the following:

    my $re = qr/ (?: # start non-capturing group [^\|]+ # as many non-pipe characters as possible \| # followed by a pipe character ){5} # and ensure there are 5 such groups in a row ([^\|]+) # capture as many non-pipe chars as possible /x;

    Using Benchmark the new regular expression here was 400% faster on my computer:

    Rate oldway newway oldway 108225/s -- -81% newway 563910/s 421% --

    Update: corrected URL

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://732419]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2024-04-26 08:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found