Re: char count windows vs linux
by gjb (Vicar) on Dec 17, 2002 at 12:43 UTC
|
If you chomp the lines you read, the number of chars should be the same.
while (<FILE>) {
chomp($_);
$count += length($_) + 1;
}
Hope this helps, -gjb-
| [reply] [d/l] [select] |
|
While this is absolutely correct, it does raise the question - What should be considered "characters" in a text file? To my mind, all characters, including carriage return (\r) and line feed (\n), should be counted as these do contribute to the size of the file. The difference in reported size encountered by the venerable Anonymous Monk, as gjb has rightly alluded to, is due to platform differences in the interpretation of these characters.
An alternate method of counting the number of characters in a file, including carriage return and line feeds, which should return the same result irrelevant of platform, would be:
print length do { local $/; local @ARGV = ( $file ); <> }, "\n";
Where the variable $file contains the text file name whose characters are to be counted.
Update
With regard to the follow-up post from Anonymous Monk, I would concur with the direction suggested of gjb in this post - It sounds as if there *really is* a difference between the files being compared on the two different machines (presumably as a result of the file transfer via FTP), hence the differing character counts.
perl -le 'print+unpack("N",pack("B32","00000000000000000000000111111110"))'
| [reply] [d/l] [select] |
|
| [reply] |
|
Hi, thanks for your help. I've tried both these methods and the results are still not the same! Linux appears to be counting an extra character per line. If I strip out \n or chomp it makes no difference.
| [reply] |
|
|
|
|
What about file content and locale? There can be difference if text contain UTF characters and is read once with use bytes; and secondly with use utf8;
Try to add use bytes; pragma to script and test it again if you are sure the files are the same...
| [reply] [d/l] [select] |
Re: char count windows vs linux
by BrowserUk (Patriarch) on Dec 17, 2002 at 13:48 UTC
|
I would like to get the same result for both platforms
The right answer to your question (if there is one) really depends on why you want to get the same result on both platforms?
If, for instance, you hope to use the information as the mechanism for some sort of comparison metric of two files on different systems, then counting chars is fundementally the wrong way to do it.
The only halfway legitimate reason I can think of for wanting to do this, is if your wanting to report the size before and after transfer from one system type to the other, in which case chomping the lines and totaling the line length would probably work.
However, even for this purpose there are better methods of verifying transfers than just counting bytes as this won't detect corruptions of bytes on route.
A little more information may get you a better answer.
Examine what is said, not who speaks.
| [reply] |
Re: char count windows vs linux
by mce (Curate) on Dec 17, 2002 at 13:46 UTC
|
Hi,
is it a binary or ascii file?
(consider binmode in case it is binary
Is the text file ftp's ascii or binary?
In NT the $/ is different to unix, but chomp should take care of this.
---------------------------
Dr. Mark Ceulemans
Senior Consultant
IT Masters, Belgium
| [reply] |
Re: char count windows vs linux
by ph0enix (Friar) on Dec 17, 2002 at 17:06 UTC
|
open(FILE, "$file");
$line;
{
local $/;
$line = <FILE>;
}
@lines = split(/\015?\012/, $line);
$count=();
foreach $line (@lines){ $count+=length($line); }
print "\n$count";
Now you should obtain the same values on both systems. I think that the problem is with line endings. Is the crlf counted as one character? | [reply] [d/l] |
Re: char count windows vs linux
by John M. Dlugosz (Monsignor) on Dec 17, 2002 at 16:34 UTC
|
If the file has DOS-style line endings, the Linux code will see an extra \r and count it. Is the number higher in Linux? | [reply] |