http://www.perlmonks.org?node_id=513731

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a file, which I like to modify without creating any additional file.

I have somewhere in the file : A P B and then elsewhere C X P.

A, B and C are fixed item where X and P are not.

I have Q , which is first few characters of P. I like to convert second line C X P to C X Q.

Need help.

Replies are listed 'Best First'.
Re: Second Time Replace Regex
by xdg (Monsignor) on Dec 02, 2005 at 22:33 UTC

    Well, that's charmingly abstract. Some questions:

    • Can X contain P? Or, rather, how can you be sure you've found the right P?
    • Do either APB or CXP appear multiple times? I.e. APB APB CXP CXP? If so, how should they be treated?
    • Do A?B and C?P appear multiple times with different values? E.g. APB CXP ASB CXS? How should they be treated?
    • Can you give some real examples of APB and CXP?

    -xdg

    Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Re: Second Time Replace Regex
by ikegami (Patriarch) on Dec 02, 2005 at 22:29 UTC

    How about something like:

    my $file_name = ...; my $file; { open(my $fh, '<', $file_name) or die("Unable to open input file: $!\n"); local $/; $file = <$fh>; } my ($P) = $file =~ /A(.*?)B/ or die("Unable to find 'P'\n"); $file =~ s/(C.*?)\Q$P\E/$1Q/g; { open(my $fh, '>', $file_name) or die("Unable to open output file: $!\n"); print $fh $file; }

    Update: Added \Q...\E.

      Correct me if I'm wrong but . . . My understanding is that \Q and \E would not be needed in the regex if you quotemeta $P, or vise versa (quotemeta is not needed if you use \Q .. \E). From quotemeta:

      quotemeta
      Returns the value of EXPR with all non-"word" characters backslashed. (That is, all characters not matching /A-Za-z_0-9/ will be preceded by a backslash in the returned string, regardless of any locale settings.) This is the internal function implementing the \Q escape in double-quoted strings.

      If EXPR is omitted, uses $_ .


      They say that time changes things, but you actually have to change them yourself.

      —Andy Warhol

        I accidently updated prematurely. You saw bad code I had there for 5s. quotemeta plus \Q...\E would indeed give incorrect results.

Re: Second Time Replace Regex
by robin (Chaplain) on Dec 02, 2005 at 22:40 UTC
    If I understand your question correctly (which is far from guaranteed), you might try something like:
    #!perl -pi~ my $p; my $done_replacement = 0; while (<>) { if (/^A (.*) B$/) { $p = $1; } elsif (defined $p && !$done-replacement) { s/^(C .*)$p$/$1 Q/; $done_replacement = 1; } }
    Of course you must replace A, B, C and Q with your actual data, and pass the filename as a command-line argument to this script. It would be easier to give a precise answer if you could make your question more concrete by giving some example data. and saying what A, B and C really are.

    I'm assuming that your A P B and C X P each constitutes an entire line of the file, because your use of “second line C X P” made me think that's what you meant.

    Also, why is it important not to create an additional file? Usually it's safer to save the results of a transformation in a new file, in case something goes wrong. (In the code above, the switch -i~ causes the original file to be backed up with a name like oldname~, which gives you some protection at least.)

Re: Second Time Replace Regex
by davidrw (Prior) on Dec 03, 2005 at 14:34 UTC
    xdg had some valid questions, but barring additional info, here's a one-liner (see perlrun) solution:
    perl -i -0777 -pe 's/(\bAAAA )(QQQQ)(PPPP)( BBBB\b.*?\bCCCC XXXX )\2\3 +\b/$1$2$3$4$2/sg' /tmp/f
    Tested against this data:
    blah stuff AAAA QQQQPPPP BBBB stuff more stuff blah and CCCC XXXX QQQQPPPP and more stuff. blah stuff AAAA QQQQPPPP BBBB stuff more stuff blah and CCCC XXXX QQQQPPPP and more stuff.