Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Substitute Question

by RiotTown (Scribe)
on May 30, 2001 at 00:37 UTC ( [id://84071]=perlquestion: print w/replies, xml ) Need Help??

RiotTown has asked for the wisdom of the Perl Monks concerning the following question:

This was posted to the vim mailing list, and it got my mind thinking "Perl can do that better":
Given the following file: *** begin file *** old_word OLD_WORD oLd_wOrd olD_worD *** end file *** Is there a substitute command that will transform the file to: *** begin substituted file *** new_word NEW_WORD nEw_wOrd neW_worD *** end substituted file *** That is to say, the replacement text will presereve the original.
I came up with this (I'm sure there are much shorter, easier ways to do it, hence this post).
#!/usr/bin/perl @words = qw[ old_word OLD_WORD old_WORD oLd_WoRd OLD_word ]; foreach $word ( @words ) { $new = 'this is the new string'; print "$word:$new:".UpperLower( $word, $new )."\n"; } sub UpperLower { my ( $old, $new ) = @_; my ( $a ); ( length( $old ) >= length( $new ) ) ? $length = length( $old ) : $ +length = length( $new ); for ( $i = 1; $i < $length+1; $i++ ) { if ( uc( substr( $old, $i-1, 1 ) ) eq substr( $old, $i-1, 1 ) && substr( $old, $i-1, 1 ) ) { $a .= uc( substr( $new, $i-1, 1 ) ); } elsif ( uc( substr( $old, $i-1, 1 ) ) ne substr( $old, $i-1, 1 ) && substr( $old, $i-1, 1 ) ) { $a .= lc( substr( $new, $i-1, 1 ) ); } else { $a .= substr( $new, $i-1, 1 ); } } return $a; }

Replies are listed 'Best First'.
Re: Substitute Question
by japhy (Canon) on May 30, 2001 at 01:05 UTC
    Nat Torkington had an approach like yours, probably in the Perl Cookbook. I offer a much simpler solution:
    s/(old)/preserve_case("new", $1)/ieg; sub preserve_case { my ($to, $from) = @_; die "strings don't match in length" unless length($to) == length($fr +om); my $mask = $from & (' ' x length($from)); return uc($to) | $mask; }
    Update: it works by creating a mask of NULs and spaces from the "from" string ("OlD" maps to "NUL SP NUL"). Then, it turns the "to" string to uppercase, and bitwise ORs the mask onto it ("NEW" & "\x00\x20\x00" is "NeW").

    Golfing it:

    # requires $" hasn't been altered # 123456789_123456789_12345678 = 28 sub pc {uc$_[0]|$_[1]&$"x length pop}
    Update 2: if you use this to change "ThIs{" to "ThAt[", you'll end up getting "ThAt{", because of the bitwise operations. This is a side effect of the unprejudiced ANDing. It can be "fixed" like so (this, and the corresponding golf, are locale sensitive):
    sub preserve_case { my ($to, $from) = @_; die "strings don't match in length" unless length($to) == length($fr +om); my $mask = join '', map /\w/ ? $_ & ' ' : "\0", split //, $from; return uc($to) | $mask; }
    An alternative to the $mask creation is to use a regex:
    (my $mask = $from) =~ s/(\w)|./ $1 ? $1 & ' ' : "\0" /egs;
    Golfing that, I get:
    # requires $" hasn't been altered # 123456789_123456789_123456789_123456789_123456789_ = 50 sub pc {(my$x=pop)=~s/(\w)|./$1?$1&' ':"\0"/egs;uc$_[0]|$x}
    Update 3: another fix, to account for changing "a" to "BcDe", and for changing "AbCd" to "e":
    sub preserve_case { my ($to, $from) = @_; my $len = length $to; if ($len < length $from) { $from = substr $from, 0, $len } else { $from .= substr $to, length($from) } (my $mask = $from) =~ s{(\w)|.}{ $1 ? $1 & ' ' : "\0" }egs; return uc($to) | $mask; }
    Update 4: Larry Rosler has code in the FAQ for this, and I've found faster way to form the mask from it:
    sub preserve_case { my ($to, $from) = @_; my $len = length $to; if ($len < length $from) { $from = substr $from, 0, $len } else { $from .= substr $to, length($from) } return uc($to) | ($from ^ uc $from); }
    This works slightly differently from Larry's code. His code turns a "TeSt" => "succesS" to "SuCcESS", whereas mine preserves the case of the new word, if it's longer than the original ("SuCcesS").

    japhy -- Perl and Regex Hacker
(Ovid - coming in late) Re: Substitute Question
by Ovid (Cardinal) on May 30, 2001 at 02:27 UTC
    Okay, so this is a bit late. I'm just adding it because it's an obvious use of the range operator for extracting the data. Using some of the tomfoolery above but trying to match the specs as I understand them (no, not trying to golf):
    use strict; use warnings; my $start_tag = qr/\*\*\* begin file \*\*\*/; my $end_tag = qr/\*\*\* end file \*\*\*/; my $case = ''; while ( <DATA> ) { # The range operator is sorely underused. if ( /$start_tag/ .. /$end_tag/ ) { if ( /$start_tag/ or /$end_tag/ ) { s/file/substituted file/; } else { $case = $_ & ' ' x length; s/old/NEW/i; } print $_ | $case; } } __DATA__ test *** begin file *** old_word OLD_WORD oLd_wOrd olD_worD *** end file *** test2
    Prints:
    *** begin substituted file *** new_word NEW_WORD nEw_wOrd neW_worD *** end substituted file ***

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      You can replace:
      if ( /$start_tag/ .. /$end_tag/ ) { if ( /$start_tag/ or /$end_tag/ ) {
      with
      if ( my $where = /$start_tag/ .. /$end_tag/ ) { if ( $where == 1 or $where =~ /E/ ) {
      for more reliable (and faster!) operation.

      -- Randal L. Schwartz, Perl hacker

        Hey, that's sweet! If you want to get rid of another regex and make it even faster:

        if ( $where == 1 or substr($where, -2) eq 'E0' ) {

        Though I am wondering one thing: why is 'E0' appended to the return value of the final sequence number? Is it just an easy test for the end or is there something significant here that I'm not aware of? (i.e., why not just append an 'x' or 'mssux' or something like that?).

        Cheers,
        Ovid

        Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re: Substitute Question
by Albannach (Monsignor) on May 30, 2001 at 01:10 UTC
Re: Substitute Question
by jeroenes (Priest) on May 30, 2001 at 01:26 UTC
    ASCII space conveniently has only the lowercase-bit set. If tye hasn't told you yet, this is some code to remember and set that bit:
    for (<DATA>){ chomp; $case = $_ & ' ' x length; s/old/NEW/i; print $_ | $case. "\n"; } __DATA__ old_word OLD_WORD oLd_wOrd olD_worD
    Cheers,

    Jeroen
    "We are not alone"(FZ)

Re: Substitute Question
by I0 (Priest) on May 30, 2001 at 02:38 UTC
    s/(old_word)/"new_word" ^ $1 ^ lc $1/egi
Re: Substitute Question
by chipmunk (Parson) on May 30, 2001 at 07:04 UTC
Re: Substitute Question
by Vynce (Friar) on May 30, 2001 at 00:46 UTC

    Sounds like a potential Golf to me, but i have a few questions. by "preserve the original" i assume you mean preserve the case of the original... what if the old word was "ELEphant" and the new word is "donkey"?

    as for faster ways to do it, i bet bitwise functions will come in useful here, since all capital letters are bitwise-and-equal to 64, and lowers are all bitwise-and-equal to 96.

Re: Substitute Question
by wog (Curate) on May 30, 2001 at 01:10 UTC
    Here's one idea:

    sub UpperLower { my( $old, $new ) = @_; $new = lc $new; my @new = split //, $new; my @mask = map { uc $_ eq $_ } split //, $old; foreach (0..$#new) { $new[$_] = uc $new[$_] if $mask[$_]; } return join '' => @new; }

    Or:

    sub UpperLower2 { my( $old, $new ) = @_; $new = lc $new; while ($old =~ /[[:upper:]]/g) { $_ = uc $_ for substr($new, pos($old) - 1, 1); } return $new; }

    All carefully tuned to not be dependent upon oddities of ASCII.

Re: Substitute Question
by myocom (Deacon) on May 30, 2001 at 00:49 UTC

    Methinks your spec is incomplete. What if the new word is of a different length than the old word? I see that you wrote code that acts on the longer of the two, but how did you determine that was what the poster wanted to happen?

      Sorry for the incompleteness, but that is how it was posted to the vim mailing list. I took it upon myself (and forgot to mention it) to simply append/disregard characters in cases where the old/new lengths do not match. Not really sure if that is what the poster wanted, but for the purposes of this lets assume it is acceptable. Also, where is the proper place to put 'golf' type questions?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://84071]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-03-28 13:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found