Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Function call in regex replacement string

by PoorLuzer (Beadle)
on Feb 24, 2009 at 05:23 UTC ( #745900=perlquestion: print w/ replies, xml ) Need Help??
PoorLuzer has asked for the wisdom of the Perl Monks concerning the following question:

I have a regex search expression, and then a replacement expression.
My requirement is to have function call invocations embedded in the replacement expression that would be kept separate from the logic - say at the beginning of the script, and I want the replacement expression to be evaluated when the search and replacement is done - and not when the replacement expression is initialized to the variable that contains it.

And now some code of (what I want to do in comments) what I have:
use strict; use warnings; use Data::Dumper; use Tie::File; # open the file if it exists, but fail if it does not use Fcntl 'O_RDWR'; our @array = (); our $file = 'taste'; tie @array, 'Tie::File', $file, mode => O_RDWR or die $!; our $SearchString = 'HAI WORLD Times ([0-9]+).*$'; #our $ReplaceString = "\t\t<exML LOLZ = \"" . unpack('a1', $1) . '"/>' +; # we could use such a statement ONLY if we figured out what somethi +ng() would be my $elementIdx = 0; while($elementIdx <= $#array) { if($array[$elementIdx] =~ $SearchString) { # Maintain only the first character #$array[$elementIdx] = unpack ('a' . (1 + length("\t\t<exML LO +LZ = \"\"/>")), something($ReplaceString)); # I would love to know wh +at something() could be so that I can separate the configuration from + the logic $array[$elementIdx++] = "\t\t<exML LOLZ = \"" . unpack('a1', $ +1) . '"/>'; #++$elementIdx; } else { splice @array, $elementIdx, 1; # we don't want this in the output } } untie @array; # all done!

Sample input file:
HAI WORLD Times 0 HAI WORLD Times 1 HAI WORLD Times 2 HAI WORLD Times 3 HAI WORLD Times 4 HAI WORLD Times 5 HAI WORLD Times 6 HAI WORLD Times 7 HAI WORLD Times 8 HAI WORLD Times 9 HAI WORLD Times 10 HAI WORLD Times 11 HAI WORLD Times 12 HAI WORLD Times 13 HAI WORLD Times 14 HAI WORLD Times 15 HAI WORLD Times 16 HAI WORLD Times 17 HAI WORLD Times 18 HAI WORLD Times 19 HAI WORLD Times 20 HAI WORLD Times 21 HAI WORLD Times 22 HAI WORLD Times 23 HAI WORLD Times 24 HAI WORLD Times 25

Corresponding output file:
<exML LOLZ = "0"/> <exML LOLZ = "1"/> <exML LOLZ = "2"/> <exML LOLZ = "3"/> <exML LOLZ = "4"/> <exML LOLZ = "5"/> <exML LOLZ = "6"/> <exML LOLZ = "7"/> <exML LOLZ = "8"/> <exML LOLZ = "9"/> <exML LOLZ = "1"/> <exML LOLZ = "1"/> <exML LOLZ = "1"/> <exML LOLZ = "1"/> <exML LOLZ = "1"/> <exML LOLZ = "1"/> <exML LOLZ = "1"/> <exML LOLZ = "1"/> <exML LOLZ = "1"/> <exML LOLZ = "1"/> <exML LOLZ = "2"/> <exML LOLZ = "2"/> <exML LOLZ = "2"/> <exML LOLZ = "2"/> <exML LOLZ = "2"/> <exML LOLZ = "2"/>

What would something() look like?
Update : Unfortunately, something looks like eval... oh well...

Comment on Function call in regex replacement string
Select or Download Code
Re: Function call in regex replacement string
by tilly (Archbishop) on Feb 24, 2009 at 05:43 UTC
    I'm not sure exactly what you want, but you should be able to do it with the /e modifier. At the beginning of your script you have something like this:
    my $replacement = sub { return unpack("a1", shift); };
    and then later
    for (@array) { s/HAI WORLD Times ([0-9]+).*/ '\t\t<exML LOLZ = "' . $replacement->($1) . '"/>' /eg; }
      :-)

      Yes.. I thought about both the /e and eval ( :-o ) approaches, but they do not satisfy my needs... let me try and clarify:

      I want to provide the replacement string as a configurable parameter and not embedded within the search - replace code.

      I would like to have :
      for (@array) { s/HAI WORLD Times ([0-9]+).*/ '\t\t<exML LOLZ = "' . $replacement->($1) . '"/>' /eg; }

      as
      for (@array) { s/$searchString/$replaceString/g; }


      Um.. does that make sense?
        What you're asking for won't work without a bunch of magic. If you really, really want it to look like that, create an object with an overload that causes it to do something particular when you stringify it. If you do that, though, then don't blame me if the maintenance programmer turns out to be a psychopath with an axe who knows where you live.

        However the following can work and at least gives a hint as to the evil lurking within:

        for (@array) { s/$searchString/$replaceString/eeg; }
        Where the ee means you eval it, and then eval the result of that. Or in other words you execute whatever is in $replaceString as code. I would set that up more sanely as:
        for (@array) { s/$searchString/$replace->()/eg; }
        But then again I'm a sane sort of guy who doesn't want to worry about maintenance programmers on a vendetta.
Re: Function call in regex replacement string
by Marshall (Prior) on Feb 24, 2009 at 17:57 UTC
    I'm not quite sure what needs to be accomplished here. Consider the following code. Note that within a regex, you can use a $var in place of a fixed regex. s/$find/$replace/;

    In general I think you will discover that:
    1) regex is better than pack/unpack (except for specific cases where you know that columns are guaranteed to line up - and even then just use regex to skip columns and use list slice to get what you want. Of course pack/unpack are necessary when editing binary files, but in general this is not the right way to go.

    2)indexed variables are almost never necessary in Perl. One magic thing of Perl is it reduces the probability of "off by one errors"

    3)prefer foreach (@array){..} over any kind of C style for loop. I write one C style for loop per about 5K lines of Perl.

    #!usr/bin/perl -w use strict; while (<DATA>) { my $last_first_digit = ($_=~ m/(\d)\d*\s*$/)[0]; # print "$last_first_digit\n"; #for debugging print "<exML LOLZ = \"$last_first_digit\"/>\n"; } __DATA__ HAI WORLD Times 0 HAI WORLD Times 1 HAI WORLD Times 2 HAI WORLD Times 3 HAI WORLD Times 4 HAI WORLD Times 5 HAI WORLD Times 6 HAI WORLD Times 7 HAI WORLD Times 8 HAI WORLD Times 9 HAI WORLD Times 10 HAI WORLD Times 11 HAI WORLD Times 12 HAI WORLD Times 13 HAI WORLD Times 14 HAI WORLD Times 15 HAI WORLD Times 16 HAI WORLD Times 17 HAI WORLD Times 18 HAI WORLD Times 19 HAI WORLD Times 20 HAI WORLD Times 21 HAI WORLD Times 22 HAI WORLD Times 23 HAI WORLD Times 24 HAI WORLD Times 25 ============== this prints: <exML LOLZ = "0"/> <exML LOLZ = "1"/> <exML LOLZ = "2"/> <exML LOLZ = "3"/> <exML LOLZ = "4"/> <exML LOLZ = "5"/> <exML LOLZ = "6"/> <exML LOLZ = "7"/> <exML LOLZ = "8"/> <exML LOLZ = "9"/> <exML LOLZ = "1"/> <exML LOLZ = "1"/> <exML LOLZ = "1"/> <exML LOLZ = "1"/> <exML LOLZ = "1"/> <exML LOLZ = "1"/> <exML LOLZ = "1"/> <exML LOLZ = "1"/> <exML LOLZ = "1"/> <exML LOLZ = "1"/> <exML LOLZ = "2"/> <exML LOLZ = "2"/> <exML LOLZ = "2"/> <exML LOLZ = "2"/> <exML LOLZ = "2"/> <exML LOLZ = "2"/>
    I think you just need to get the right regex to feed a simple loop and that will do what you want.
      I agree with two of your points.

      I vehemently disagree with the first one : "regex is better than pack/unpack".
      Almost never is regex is better than pack/unpack


      Almost never is even substr is better than pack/unpack

      Please go ahead and disagree with facts on the table.

      What would ever make you things like that?

      In fact this post gives me the idea for some discussion I have been wanting to have for a long time.. the misuse of things like regex and substr etc which makes PERL ever so slow in runtime than it really should be.
        Well like I said, except in cases where you know for sure that you have a fixed column alignment. Virtually all the data that I work with does not have fixed byte alignment. And in some of the files I work with, even if alignment is "fixed", the alignment shifts when some new release comes out of the other program. There are always trade-offs between efficiency and maintainability, etc.

        I've done some testing with the regex engine in Perl 5.10 vs Perl 5.8 and earlier....its a LOT faster now. I've got one application that does a LOT of I/O and I've been considering using Storable for intermediate steps. This of course uses byte stream (and pack/unpack) to dump and re-create internal Perl structures. At the end of the day, final output will be in ASCII format of some type.

        Most performance issues that I've found can be traced to improper algorithm or just flawed implementation. Perl allows very sophisticated algorithms to be implemented quickly and better algorithms can make a big difference! I can write Perl code about 5-10x faster than in C. Code runs maybe 1/3 the speed of C. So there are trade-offs!

        I've seen some really bad code here on Monks and some of it will run just like a "herd of turtles". Sometimes that doesn't matter and sometimes it does!

        So I guess this a "your mileage may vary" sort of thing.

        Update:Now that I think more about this, misuse of OO techniques is probably a far greater performance hit. The OO performance hit is about 30%. This stuff is great for DB, GUI, but I've seen some situations where it is just plain goofy.

Re: Function call in regex replacement string
by Tanktalus (Canon) on Feb 25, 2009 at 06:36 UTC

    This node gave me the idea to post Yet another perl-rename tool which has the way I approached this problem. I simply created a code ref on the fly using eval STR, and then called the function to make the change. Not recommended for insecure environments, but, really, if you're logged in as your own user, all you can screw up is yourself ;-) (i.e., don't use this for CGI or stuff)

      Oh how I wish there was another way out than eval.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://745900]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2014-12-27 19:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (177 votes), past polls