Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

char count windows vs linux

by Anonymous Monk
on Dec 17, 2002 at 12:36 UTC ( [id://220497]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have this code
open(FILE, "$file"); @lines = <FILE>; $count=(); foreach $line (@lines){ $count+=length($line); } print "\n$count";
If I use this to count characters in a text file in windows I get a different result from using the same code on linux. It seems to be something to do with carriage returns but I'm not sure. I would like to get the same result for both platforms without altering the file content. Can anyone help please?

Replies are listed 'Best First'.
Re: char count windows vs linux
by gjb (Vicar) on Dec 17, 2002 at 12:43 UTC

    If you chomp the lines you read, the number of chars should be the same.

    while (<FILE>) { chomp($_); $count += length($_) + 1; }

    Hope this helps, -gjb-

      While this is absolutely correct, it does raise the question - What should be considered "characters" in a text file? To my mind, all characters, including carriage return (\r) and line feed (\n), should be counted as these do contribute to the size of the file. The difference in reported size encountered by the venerable Anonymous Monk, as gjb has rightly alluded to, is due to platform differences in the interpretation of these characters.

      An alternate method of counting the number of characters in a file, including carriage return and line feeds, which should return the same result irrelevant of platform, would be:

      print length do { local $/; local @ARGV = ( $file ); <> }, "\n";

      Where the variable $file contains the text file name whose characters are to be counted.

       

      Update

      With regard to the follow-up post from Anonymous Monk, I would concur with the direction suggested of gjb in this post - It sounds as if there *really is* a difference between the files being compared on the two different machines (presumably as a result of the file transfer via FTP), hence the differing character counts.

       

      perl -le 'print+unpack("N",pack("B32","00000000000000000000000111111110"))'

        would something like a md5sum work across systems to tell you if the files have been copied/ftp'ed correctly? I know that md5sum is available for both linux and windows

        The md5sum function computes a 128-bit checksum (or fingerprint or message-digest) for a file. A consistant fingerprint means the files are the same.

        A.A.

        Hi, thanks for your help. I've tried both these methods and the results are still not the same! Linux appears to be counting an extra character per line. If I strip out \n or chomp it makes no difference.

      What about file content and locale? There can be difference if text contain UTF characters and is read once with use bytes; and secondly with use utf8;

      Try to add use bytes; pragma to script and test it again if you are sure the files are the same...

Re: char count windows vs linux
by BrowserUk (Patriarch) on Dec 17, 2002 at 13:48 UTC

    I would like to get the same result for both platforms

    The right answer to your question (if there is one) really depends on why you want to get the same result on both platforms?

    If, for instance, you hope to use the information as the mechanism for some sort of comparison metric of two files on different systems, then counting chars is fundementally the wrong way to do it.

    The only halfway legitimate reason I can think of for wanting to do this, is if your wanting to report the size before and after transfer from one system type to the other, in which case chomping the lines and totaling the line length would probably work.

    However, even for this purpose there are better methods of verifying transfers than just counting bytes as this won't detect corruptions of bytes on route.

    A little more information may get you a better answer.


    Examine what is said, not who speaks.

Re: char count windows vs linux
by mce (Curate) on Dec 17, 2002 at 13:46 UTC
    Hi,
    is it a binary or ascii file?
    (consider binmode in case it is binary
    Is the text file ftp's ascii or binary?
    In NT the $/ is different to unix, but chomp should take care of this.

    ---------------------------
    Dr. Mark Ceulemans
    Senior Consultant
    IT Masters, Belgium
Re: char count windows vs linux
by ph0enix (Friar) on Dec 17, 2002 at 17:06 UTC

    Try following code

    open(FILE, "$file"); $line; { local $/; $line = <FILE>; } @lines = split(/\015?\012/, $line); $count=(); foreach $line (@lines){ $count+=length($line); } print "\n$count";

    Now you should obtain the same values on both systems. I think that the problem is with line endings. Is the crlf counted as one character?

Re: char count windows vs linux
by John M. Dlugosz (Monsignor) on Dec 17, 2002 at 16:34 UTC
    If the file has DOS-style line endings, the Linux code will see an extra \r and count it. Is the number higher in Linux?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://220497]
Approved by rob_au
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2025-07-10 08:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.