modifying a string in place

davidj has asked for the wisdom of the Perl Monks concerning the following question:

My fellow monks,
I have an interesting text processing task before (Not homework). What I need to do is open a file, skip the first 4 lines, then on all the remaining lines, duplicate each character except for the '^' and '#' characters, and rewrite the file.

On an input file of:

andromeda:davidj perl_test > cat f.txt
^this^
^is^
^a^
^test^
^david#jenkins^
^ cinea#jenkins ^
[download]

the output should be:

andromeda:davidj perl_test > cat out.txt
^this^
^is^
^a^
^test^
^ddaavviidd#jjeennkkiinnss^
^  cciinneeaa#jjeennkkiinnss  ^
[download]

I currently have the following code which works perfectly well:

#!/usr/bin/perl

use strict;

open(FILE, "<f.txt");
open(OUT, ">out.txt");
while(<FILE>) {
    my $str = "";
    chomp $_;
    if( 1 .. 4 ) {
        print OUT "$_\n";
        next;
    }
    while( $_ =~ m/(.)/g ) {
        if( $1 =~ m/(\^|\#)/ ) {
            $str .= "$1";
        } else {
            $str .= "$1$1";
        }
    }
    print "$str\n";
    print OUT "$str\n";
}
close(FILE);
close(OUT);
[download]

I didn't like the idea of creating a temporary string, so I have the following which modifies the text as it is processing it, and also works perfectly well:

#!/usr/bin/perl

use strict;

open(FILE, "<f.txt");
open(OUT, ">out.txt");
while(<FILE>) {
    chomp $_;
    if( 1 .. 4 ) {
        print OUT "$_\n";
        next;
    }

    for( my $i = 0; $i < length($_); $i++ ) {
        if( substr($_, $i, 1) =~ m/(\^|\#)/ ) {
            substr($_, $i, 1) = "$1";
        } elsif( substr($_, $i, 1) =~ m/(.)/ ) {
            substr($_, $i, 1) = "$1$1";
            $i++;
        }
    }
    print OUT "$_\n";
}
close(FILE);
close(OUT);
[download]

I don't like this solution because it breaks the cardinal rule of not modifying a for loop counter inside the loop. (Not that I'm any kind of coding purist, mind you :)

Benchmarking the solutions indicates that (not surprisingly) using a temporary string is quicker. The following results are on 250000 iterations of a file with 1750 lines, each line no more than 50 characters.

andromeda:davidj perl_test > perl test.pl
              Rate 2nd string   In place
2nd string 28969/s         --       -17%
In place   35112/s        21%         --
[download]

Now to my curiosity: Both of these solutions work and I am satisfied with using either of them. What I'd like to have, purely for the educational value, is a more "Perlish" way of doing this, and/or a more efficient way.

as always thank you for your assistance,

davidj

Comment on modifying a string in place Select or Download Code

Replies are listed 'Best First'.
Re: modifying a string in place by Roy Johnson (Monsignor) on Jan 19, 2006 at 18:15 UTC
`while (<>) { s/([^^#])/$1$1/g if ($. > 4); print; }` [download] or just `perl -pe 's/([^^#])/$1$1/g if ($. > 4)' file > newfile` Caution: Contents may have been coded under pressure.	[reply] [d/l] [select]
Re: modifying a string in place by japhy (Canon) on Jan 19, 2006 at 18:16 UTC
This substitution, `s/([^^#])/$1$1/g`, will do what you want. I'd expect it to be the fastest. It replaces any non-^ non-# character with itself twice. Jeff `japhy` Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and `perl` hacker How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart	[reply] [d/l]
Re: modifying a string in place by graff (Chancellor) on Jan 20, 2006 at 04:12 UTC
Thank you for demonstrating this use of the range operator: `for ( @whatever ) { if ( 1 .. 4 ) { # enter this block during the first four iterations, then neve +r again } }` [download] It's documented in the perlop man page, but I had never seen it before, and when I first looked at your post, I thought "that can't be right -- how could that possibly work". But once I ran it myself, and studied the man page carefully, I saw the beauty of it, and I'm grateful for that. Update: Having said that, it seems I'm still missing something. I stepped through the OP code with "perl -d", and sure enough, the "if ( 1 .. 4 )" worked as the OP says it should: the "if" block is entered on the first four iterations, then the "else" block is entered on the remaining iterations. But when I tried the simplest possible snippet to do the same basic thing, it didn't work that way: `$_ = 0; while ($_<6) { $_++; if ( 1 .. 4 ) { print "$_: True\n"; next; } print "$_: False\n"; } __OUTPUT__ 1: False 2: False 3: False 4: False 5: False 6: False` [download] When I study the man page again, this is sort of what I should have expected (but I think I should have gotten at least one "True" output): In scalar context, ".." returns a boolean value. The operator is bistable, like a flip-flop, and emulates the line-range (comma) operator of sed, awk, and various editors. Each ".." operator maintains its own boolean state. It is false as long as its left operand is false. Once the left operand is true, the range operator stays true until the right operand is true, AFTER which the range operator becomes false again. It doesn't become false till the next time the range operator is evaluated. It can test the right operand and become false on the same evaluation it became true (as in awk), but it still returns true once. I get even more puzzled when I try this variation, which should evaluate to false on the first iteration (but doesn't -- and it doesn't flip-flop either): `$_ = 0 while ($_<6) { $_++; if ( 0 .. 3 ) { print "$_: True\n"; next; } else { print "$_: False\n"; } } __OUTPUT__ 1: True 2: True 3: True 4: True 5: True 6: True` [download] What am I doing wrong here?	[reply] [d/l] [select]
Re^2: modifying a string in place by BrowserUk (Patriarch) on Jan 20, 2006 at 06:20 UTC
The constant form of the flip-flop only operates against `$.`. Ie. The line number of the current file being read. Hence the first loop below produces the output expected, but the second does not. `#! perl -slw use strict; while( <DATA> ) { print if 1..4; } for ( 1 .. 8 ) { print if 1..4; } __DATA__ line 1 line 2 line 3 line 4 line 5 line 6 line 7 line 8` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]


Don't ask to ask, just ask
	PerlMonks