negzero7 has asked for the wisdom of the Perl Monks concerning the following question:

I have been trying to teach myself perl with a few books and am stuck on several exercises. The first is this:

Write a program that substitutes every occurrence of the string "th" (either or both letters can be capital) with the string "TH" in file part1.txt (attached). The program must print the entire file to standard output, not just the replaced lines.

The Unix command line should look like:

$ proj2_1.pl part1.txt

The output should look like:

BIG BLUE LOOKED TO BIG RED -- IBM received great publicity when its
Deep Blue chess machine beat THe world chess champion, Gary Kasparov.
Deep Blue is a supercomputer THat runs on THe IBM AIX version of THe
UNIX system. After THe computer won THe second game of THe match, THe
head of THe IBM team introduced to THe audience his consultant and THe
creator of UNIX, Lucent's legendary Dennis Ritchie. THe audience gave
Ritchie a standing ovation. Maybe Bell Labs should get its deserved
share of publicity in THis man vs. machine hyperbole.

-- Alex Lubashevsky, Warren, N.J. (Lucent Today 5/13/97)

That's what it is supposed to do. My code looks like this:

#!usr/bin/env perl open (test, "@ARGV"); while (<test>) { $_ =~ <test>; chomp; s/(th)/TH/gi; print "$_\n"; } close (test);

The problem is this, when it prints out the paragraph of text, it skips every other line like this:

BIG BLUE LOOKED TO BIG RED -- IBM received great publicity when its
Deep Blue is a supercomputer THat runs on THe IBM AIX version of THe
head of THe IBM team introduced to THe audience his consultant and THe
Ritchie a standing ovation. Maybe Bell Labs should get its deserved

-- Alex Lubashevsky, Warren, N.J. (Lucent Today 5/13/97)

Can anyone provide some insight?

Replies are listed 'Best First'.
Re: Why are lines being skipped in output?
by toolic (Bishop) on Mar 18, 2008 at 14:01 UTC
    In addition to the solutions provided by others, here are some other things to consider:
    • Always check the success of open.
    • There is no need for these quotes: "@ARGV"
    • Use the strictures to make your code more robust:
      use warnings; use strict;

    By adding use warnings;, I get the message:

    Unquoted string "test" may clash with future reserved word

    Putting it all together, the code could be written as:

    #!/usr/bin/env perl use warnings; use strict; open my $test_fh, '<', @ARGV or die "Can not open file $!\n"; while (<$test_fh>) { s/th/TH/gi; print; } close $test_fh;

      ++toolic. You forgot explicitly state a couple of pieces of good advice that you followed in your fixed code. Since negzero7 is an admitted newbie, it's worth spelling them out.

      • Use the 3 argument version of open.
      • Use lexical filehandles.

      negzero7, when I am trying to debug a thorny bit of code or learn a new language, I'll sometimes resort to using CS 101 style comments.

      #!usr/bin/env perl open ( # open a filehandle test, # called 'test' "@ARGV" # pointing to file specified in arguments ); while ( # Loop <test> # read line from filehandle "test" and assign to $_ ) { $_ =~ <test>; # chomp; # remove newline from end of $_ s/(th)/TH/gi; # Case insensitive replace 'th' with TH everywhere print "$_\n"; # print $_ with a new line }

      By over-commenting everything I can document my assumptions about every bit of code. Then I can RTFM and verify my understanding is correct. If my first pass does not find the problem, I return and look for anything that I did not comment fully enough.

      For example, my notes on the open call were not exhaustive. "@ARGV" stringifies the @ARGV array, by stringifying each element of the array and inserting a space between members of the array: "$ARGV[0] $ARGV[1] $ARGV[2] ... $ARGV[n]".

      Sometimes a second pass is not detailed enough. A third pass would show that, the list separator is inserted between elements, which is usually a space, but it can be redefined by setting $". So "@ARGV" is really: "$ARGV[0]$"$ARGV[1]$"$ARGV[2]$"...$"$ARGV[n]".

      I don't leave these kinds of comments in production code--there they'd do more harm than good. But for learning a new language or debugging a tough problem, I find this procedure hard to beat.


      TGI says moo

Re: Why are lines being skipped in output?
by andreas1234567 (Vicar) on Mar 18, 2008 at 13:43 UTC
    Remove the line
    $_ =~ <test>;
    You are reading 2 lines for each iteration of your while loop.
    --
    Andreas
      Would you explain that? I'm not sure why 2 lines are being read. Of course, I've never seen =~ used in such a way either...

        Each time Perl sees an <test> it reads another line from 'test'. There is no problem with using =~ in conjunction with your <test> construct...but it does cause another read from test. I'm not sure what you were intending or hoping the line $_ =~ <test> was supposed to do. Can you say what you were thinking?

        In case you're wondering, the other read occurs in your loop control line while(<test>).

        To keep it from reading twice but to still do what it looks like you're wanting to do you can use a construct like:

        while (<test>) [ chomp; s/(th)/TH/gi; print "$_\n"; } close (test);

        Several other monks that have answered your inquiry show this and I think it is the right answer. But since I'm not sure what you intended with the line $_ =~ <test>, I'm not positive that deleting the line really results in what you were intending.

        ack Albuquerque, NM
        C:\>perl -MO=Deparse,-p #!usr/bin/env perl open (test, "@ARGV"); while (<test>) { $_ =~ <test>; chomp; s/(th)/TH/gi; print "$_\n"; } close (test); ^Z open(test, "@ARGV"); while (defined(($_ = <test>))) { ($_ =~ <test>); chomp($_); s/(th)/TH/gi; print("$_\n"); } close(test); - syntax OK
Re: Why are lines being skipped in output?
by lidden (Curate) on Mar 18, 2008 at 13:43 UTC
    Change:
    while (<test>) { $_ =~ <test>; chomp; s/(th)/TH/gi; print "$_\n"; }
    To:
    while (<test>) { s/th/TH/gi; print; }
    and it should work.

      Fantastic! Thanks for the help everyone, I didn't realize it would read two separate lines like that, but now it makes sense.

Re: Why are lines being skipped in output?
by halley (Prior) on Mar 18, 2008 at 16:37 UTC
    By way of insight (or explanation): the while statement has some magic to do common tasks like this without stating everything specifically.
    while (<FOO>) { ... }
    seems to a newcomer like it would mean
    while (not_the_end_of_the_file(*FOO)) { ... }
    but in actuality it means
    while (defined ($_ = readline(*FOO))) { ... }
    So it does both tasks at once: breaks the loop at the end of the file, AND reads the next line into the default variable $_.

    --
    [ e d @ h a l l e y . c c ]

Re: Why are lines being skipped in output?
by sundialsvc4 (Abbot) on Mar 18, 2008 at 23:21 UTC

    One of the favorite phrases that you'll hear around Perl water-coolers is:   TMTOWTDI™ – “There's More Than One Way To Do It.”

    This can be both a blessing and a curse.

    On the one hand, Perl can do a lot with a little. On the other hand, you can wind up writing an unintended contribution to the (very amusing) section of this site called Obfuscations.

    Let your watchword above all be clarity. The computer does not care if you minimize or if you maximize the amount of source-code that you write. It does not care if you include comments or if you omit them. But mark my words, you will, and at the worst possible time. So, make sure that whatever you write, however you write it, is simple and clear and very defensive of possible errors. (“Don't assume” the file will open... Get familiar with die and with the package Carp. Yes, it's much better for your code to croak than to briefly be among the walking-dead.)

    Unfortunately, the “DWIM principle” (Do What I Mean...) has been applied to the design of the language and to its implementation. Perl sometimes tries to “guess” what you mean, and to “do ‘it’ anyway,” whatever ‘it’ might mean. This is once-again an argument for simplicity and clarity on your part ... and good documentation. The simpler and easier-to-understand you make your code, the less likely you will be to run into a “feature.”

Re: Why are lines being skipped in output?
by bigearsbilly (Novice) on Mar 19, 2008 at 11:24 UTC
      Aw, shucks.
      #!perl -pe s/th/TH/g
      Can't emulate -e on #! line at being_too_clever.pl line 1.

      --
      [ e d @ h a l l e y . c c ]

      mmmm, need help with your "quick problem."
      1. what is the question?
      2. what is the problem?
      3. what is the input?
      4. what output are you looking for?
      Duh! Misread post above as OP. Apologies.
Re: Why are lines being skipped in output?
by MichaelORourke (Initiate) on Mar 19, 2008 at 19:18 UTC
    I'll include the entire thing here and --sorry I'm still not with the wiki style for 'code' tags. I'll get it eventually.
    #!usr/bin/env perl open (test, "@ARGV"); while (<test>) { $_ =~ <test>; chomp; s/(th)/TH/gi; print "$_\n"; } close (test);
    The open is for reading and surely you mean
    open(test, "<$ARGV[0]"); # what happens when more args?
    This, I believe is the issue so instead of this
    s/(th)/TH/gi;
    try this
    s/th/TH/go;
    Its likely that a simpler
    tr/th/TH/;
    might work.

      tr/th/TH/; won't work as required, it replaces every t with a T and every h with an H, even those that are not part of a th.

      Alexander

Re: Why are lines being skipped in output?
by oko1 (Deacon) on Mar 20, 2008 at 19:14 UTC

    I'm going to comment your script one line at a time - it really is an excellent opportunity to explain some good bits of Perl (presumably, that's just what the author of your book intended.) My comments are prefixed with a '###'.

    #!usr/bin/env perl ### I suspect that you're retyping your code, since the ### above would fail with a "Bad interpreter" error: there ### is no such thing as "usr/bin/env" (note absence of ### initial '/'.) As well, the 'env' trick isn't usually ### necessary unless you're moving your script between ### radically different systems where Perl is located ### in different places; usually, "#!/usr/bin/perl" will do. ### In addition, you should always - yes, *always* - enable ### warnings (i.e., let the computer do your troubleshooting ### for you) - so that shebang line should also have a ' -w' ### added to its end. open (test, "@ARGV"); ### Problem #1: you're trying to use an array (@ARGV) where ### you should be using a string containing a filename. This ### _will_ break if @ARGV contains more than one element. ### Problem #2: Whenever you perform any system-related task ### - i.e., an operation that is external to Perl - you ### should always (yes, *always*) check the result. As a ### minor additional correction, you should elide any ### unnecessary punctuation in your Perl - and you should ### either use sentence-case or all-caps names for file ### handles. As a result, the corrected version of the above ### line should read ### ### open Test, $ARGV[0] or die "$ARGV[0]: $!\n"; while (<test>) { $_ =~ <test>; ### This line is a simple error, and should be omitted. chomp; ### This is unnecessary, since you're going to want ### the newline character when you print the output. ### Omit it. s/(th)/TH/gi; ### Neither grouping nor capturing are used in the ### above regex; therefore, the parentheses do ### nothing and should be omitted. print "$_\n"; ### If you omit the above 'chomp', it will become ### unnecessary to explicitly specify '$_' or '\n'. ### This line then simply becomes 'print;'. } close (test); ### Again, the parens are unnecessary. Omit them.

    As a result, the entire script should look like this:

    #!/usr/bin/perl -w open Test, $ARGV[0] or die "$ARGV[0]: $!\n"; while (<Test>) { s/th/TH/gi; print; } close Test;

    Conversely, you could use Perl's "diamond operator" which will do the same thing without having to explicitly open the file that you've specified on the command line:

    #!/usr/bin/perl -w while (<>) { s/th/TH/gi; print; }
Re: Why are lines being skipped in output?
by wardy3 (Scribe) on Mar 21, 2008 at 02:15 UTC
    Hi negzero7

    There was a bit of a discussion in one of your previous nodes "Help with split() function" where the @ARGV thing was brought up.

    There was a reply by toolic about it specifically, as well as other comments that were given to help you along the Perl learning path.

    I think you should try to incorporate past lessons into your new problems as much as you can. It certainly makes for a better foundation in Perl as well allowing others to assist with new problems and questions, rather than the same things again.

    My 2c

Re: Why are lines being skipped in output?
by Anonymous Monk on Mar 20, 2008 at 19:26 UTC
    You are running <> operator twice per loop. You can loop through every line of a file and gather the input of the line in one statement: while($variable = <test>) In your case you are using $_ as your variable. While this helps readability, it is not necessary, as while(<test>) will automatically assign $_ to the value of the next line in the file. So, because <test> is running twice per loop (once in the while condition and then again right beneath it) two lines of text are being pulled each time the loop runs causing the loop to appear to skip a line.
Re: Why are lines being skipped in output?
by Anonymous Monk on Mar 21, 2008 at 20:19 UTC
    Hello! $_ =~ <test>; is your problem. You do not need this line because perl automatically reads a line from the input file(<test> in your case) line into $_. By using $_ =~ <test>, the problem actually reads TWO lines every iteration so every other line will be skipped! It's a good idea to get in the habit of running perl scripts with -w as it will catch warnings for you. -w makes your life easier. Hope this helps Have a nice day
Re: Why are lines being skipped in output?
by alpha (Scribe) on Mar 22, 2008 at 20:45 UTC
    Well that was some funny piece of code :D An illustration called "how many errors could be done in like 5 lines of perl code". LMAO
Re: Why are lines being skipped in output?
by linuxer (Curate) on Mar 23, 2008 at 13:10 UTC
    Hi, ignoring other issues in your script: your problem is located at:
    # read linewise from handle test to $_
    while (<test>) {
        # read again one line from handle test
        $_ =~ <test>;
        chomp;
        s/(th)/TH/gi;
        print "$_\n";
    }
    Solution:
    while ( my $line = <test> ) {
      $line =~ s/th/TH/gi;
      print $line;
    }