Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

What is the difference?

by YAFZ (Pilgrim)
on Apr 08, 2003 at 14:20 UTC ( #248932=perlquestion: print w/replies, xml ) Need Help??

YAFZ has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

I have 2 small Perl programs which are intended to calculate the same thing, namely the character count in a text file. By character, I mean, any character except blank space and/or tabs.
The first version is something like that:
#!/bin/perl -w while (<>) { chomp; $line = join('', split /[ \t]*/); print $line; }

And the second version is like that:
#!/bin/perl -w while (<>) { chomp; $line += split /[ \t]*/; } print $line;

I use like that:

$ ./ < | wc 0 1 69

And I use like that:

$ ./ < Use of implicit split to @_ is deprecated at ./ line 5. 72

As you see, combined with wc command tells me that the text file consists of 69 while the by itself is reporting the same file has 72 characters.

Why do I get different number of characters?

Replies are listed 'Best First'.
Re: What is the difference?
by Thelonius (Priest) on Apr 08, 2003 at 14:45 UTC
    You'll get different counts on lines that have leading white space. The second program will count an empty field at the beginning of the line. The correct way to do this program is:
    #!/bin/perl -w my $line = 0 while (<>) { chomp; $line += tr/ \t/./c; } print $line, "\n";
    #!/bin/perl -w my $line = 0 while (<>) { $line += tr/ \t\r\n/c; } print $line, "\n";
      Thanks for enlightening me! ;-) I prefer your second example. The only problem with that example is that you've forgotten one single slash character at the end of tr:
      $line += tr/ \t\r\n/c;
      which must be:
      $line += tr/ \t\r\n//c;
      After fixing that, the sript works cool ;-)

      I also realized that I can handle the situation using the command line:
      $ tr [:space:] -d < | wc 0 1 69
      Now I think that tr (in Perl or in shell) is a better way to get rid of unwanted characters, am I wrong?

      P.S.: If Thelonious S. Monk programmed (in any language) woe to the ones who could maintain his code ;-)

        Your tr (the program) example does, indeed, get rid of non-whitespace. The Perl program, howver, does not. What it does is: take all characters but some whitespace (those with ASCII codings 32, 9, 13, 10, in order), changing them with themselves, and returning how many have been (non) changed.

        The /c option complements the first list of tr///, while leaving the second list empty makes it equal to the first (in this case, the complemented first). tr/// in any case returns the number of characters interested by the transliteration.

                dakkar - Mobilis in mobile

        Most of my code is tested...

Re: What is the difference?
by hardburn (Abbot) on Apr 08, 2003 at 14:26 UTC

    I'm not sure if this is the root of the problem, but you're attempting to "add" a string numerically. That just isn't going to work. The offending code in the second example is:

    $line += split /[ \t]*/; # Notice '+=' operator

    Which should be:

    $line .= split /[ \t]*/; # Notice '.=' operator

    As I said, this may or may not fix your stated problem.

    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    Note: All code is untested, unless otherwise stated

      Split doesn't return a string, it returns a list, the scalar value of which is the number of elements, and since the split is splitting into individual characters, $line (although a poorly chosen variable name) contains the count of all the characters so far. Your 'fixed' example would end up giving you a list of numbers, if the text file contained two lines of 8 characters each, your example would return '88'.

      We're not surrounded, we're in a target-rich environment!
Re: What is the difference?
by jasonk (Parson) on Apr 08, 2003 at 14:32 UTC

    What is in your text file? Everything I've tried it on both techniques return the same count.

    We're not surrounded, we're in a target-rich environment!
      My text file consists of the Perl source code of the, which you see as being listed in my original message under the header of

      I've edited this file and the other file, and tried my examples on a CYGWIN system running Perl v5.6.1 on Win2000.

      I'm still a little bit confused after you say that your results are different than mine. :)

Re: What is the difference?
by zby (Vicar) on Apr 08, 2003 at 14:29 UTC
    Look at the output without wc. The += operator does not concatenate the right side to the left. It is the .= operator that does it.

    Update Ok - there is no wc in the second example. But once again look at the line with += the left side of it is a list - a list does not return its length in a scalar context (only an array do it).

    Update As Abigail-II pointed out split in scalar context returns the number of fields.

      Well, yeah, += doesn't do that. += doesn't multiple the right hand side with the left hand side either.

      You're the second person to assume something needs to be concatenated. Why, I have no idea. split doesn't return a string, it returns a number, and the program is counting things. I see no reason at all to come up with concatenation.

      To answer the question, split returns the number of fields. And if you split on [ \t]*, the number of fields isn't necessaryly the same as the number of characters.


      Look at the command line, the second one doesn't use wc at all, it does the counting itself.

      We're not surrounded, we're in a target-rich environment!

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://248932]
Approved by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (2)
As of 2023-06-03 10:33 GMT
Find Nodes?
    Voting Booth?
    How often do you go to conferences?

    Results (12 votes). Check out past polls.