Count number of lines in a text file

Replies are listed 'Best First'.
Re: Count number of lines in a text file by QM (Parson) on Mar 23, 2006 at 20:31 UTC
What about this oft-quoted golf shot? `perl -pe '}{$_=$.' filename` [download] -QM -- Quantum Mechanics: The dreams stuff is made of	[reply] [d/l]
Re^2: Count number of lines in a text file by ambrus (Abbot) on Mar 24, 2006 at 10:29 UTC
Very tricky. So because of the `}`, the `continue { print $_; }` block is no longer attached to the loop, but to the `{$_=$.}` bare block. I never knew this trick with -p. (I knew about `-ne '...}{...'` but not what it did with -p.)	[reply] [d/l] [select]
Re^3: Count number of lines in a text file by QM (Parson) on Mar 28, 2006 at 15:07 UTC
Yes. But `-p` and `-n` only differ by the continue/print. Compare `-p`: `> perl -MO=Deparse -pe"#stuff#" LINE: while (defined($_ = <ARGV>)) { (); } continue { print $_; }` [download] to `-n`: `> perl -MO=Deparse -ne"#stuff#" LINE: while (defined($_ = <ARGV>)) { (); }` [download] -QM -- Quantum Mechanics: The dreams stuff is made of	[reply] [d/l] [select]
Re: Count number of lines in a text file by aweeraman (Novice) on Mar 23, 2006 at 20:04 UTC
Here's another variation that gets rid of a few lines: `open (FILE, $ARGV[0]) or die "Can't open '$ARGV[0]': $!"; $lines++ while (<FILE>); close FILE; print "$lines\n";` [download]	[reply] [d/l]
Re^2: Count number of lines in a text file by davidrw (Prior) on Mar 23, 2006 at 20:45 UTC
don't need `$lines` -- `$.` does it for you (see perlvar) `perl -le 'open FILE, "/etc/passwd"; @_=<FILE>; print $.'` [download] And (similar to QM's golf reply: `perl -lne 'END{print $.}' /etc/passwd` [download]	[reply] [d/l] [select]
Re: Count number of lines in a text file by davidrw (Prior) on Mar 23, 2006 at 19:10 UTC
different approach using Tie::File (should also work w/large files): `perl -MTie::File -MFcntl=O_RDONLY -le 'tie @array, "Tie::File", shift, + mode => O_RDONLY or die $!; print scalar @array' /etc/hosts` [download] .. and of course there's `wc` on *nix or from ppt	[reply] [d/l] [select]
Re: Count number of lines in a text file by ambrus (Abbot) on Mar 24, 2006 at 10:36 UTC
Here's yet another golf variation, provided there's only one filename and that's a real filename (not a glob or something you can give to the two-arg open): `exec wc,-ll,pop` [download]	[reply] [d/l]
Re: Count number of lines in a text file by Anonymous Monk on Aug 18, 2014 at 23:10 UTC
The simple and efficient mmap version is both efficient and simple: `use File::Map q(map_file); map_file $_, shift, '<'; print tr/\n// + /[^\n]\z/;` [download]	[reply] [d/l]
Re^2: Count number of lines in a text file by Anonymous Monk on Aug 19, 2014 at 14:45 UTC
seems like a lot of stuff to install just to count the lines in a file. File::Map is not installed by default on any of the 3 systems I am currently using. and see http://www.perlmonks.org/?node_id=989383 for the problems with large files. there may be other examples. I sense some confusion about what is being discussed here based on all of the responses. Perl is known for having many ways to do the same thing but it does not mean that every one of those ways is a good way to do it. if all you want to know is how many lines are in a file then why would you want to load the entire file in memory (if it even fits) or go thru the mapping process - seems like overkill to me. I wonder if the original post was meant as a discussion about doing a line count without using the *nix wc command. many of the responses seem to be trying to come up with a command line to do the line count. perhaps we as software developers should remember the KISS method and save ourselves a lot of trouble. CPAN is a source of many good tools to solve many problems but sometimes CPAN may be a sledgehammer when all you really are trying to do is push in a thumbtack.	[reply]
Re: Count number of lines in a text file by ajbryant59 (Initiate) on Sep 12, 2007 at 21:25 UTC
Here's a one-liner inspired by some of the other solutions: $num_lines = scalar(() = ${${new IO::File "<filename"}});	[reply]
Re^2: Count number of lines in a text file by ajbryant59 (Initiate) on Sep 13, 2007 at 20:20 UTC
Sorry to have to submit a retraction but the one-liner I submitted before doesn't work. I was sure I had this one working but c'est la vie. Sorry to clutter the postings.	[reply]
Re^2: Count number of lines in a text file by Anonymous Monk on Aug 14, 2009 at 16:57 UTC
what about my($NumLines)=scalar <$Filename>	[reply]
Re^3: Count number of lines in a text file by JavaFan (Canon) on Aug 14, 2009 at 17:08 UTC
That only works if the current line just happens to contain a number which equals the number of lines. `<>` in scalar context just reads a single line. But this works: `perl -wE'say~~(()=<>)' yourfile` [download]	[reply] [d/l] [select]
Re^4: Count number of lines in a text file by BrowserUk (Patriarch) on Aug 14, 2009 at 17:40 UTC
Re: Count number of lines in a text file by Anonymous Monk on Aug 18, 2014 at 21:34 UTC
the sysread problem can be corrected with a fairly simple change - just check for the last buffer for the end of line. sample program showing some of the variations: # # demonstrate the sysread (and read) problem and correction. we reall +y only need a small # file to demonstrate the problem.... # use strict; my $test_file="sysread_test_file.txt"; # change name to test siz +e/length differences # # create the test file # sub create_test_file { return if -e $test_file; # we do not want to create +if it already exists.... open TOUT,">$test_file"; # # write a small file with an extra line missing the EOL. # for (my $line=0;$line<1000;$line++) { print TOUT "qwertyuiopasdfghjklzxcvbnm1234567890qwertyuiopasdf +ghjklzxcvbnm1234567890qwertyuiopasdfghjklzxcvbnm1234567890\n"; } print TOUT "qwertyuiopasdfghjklzxcvbnm1234567890qwertyuiopasdfghjk +lzxcvbnm1234567890qwertyuiopasdfghjklzxcvbnm1234567890"; # no EO +L! close TOUT; } sub test_while_variable { my $linecount=0; open TIN,"<$test_file"; while (<TIN>) { $linecount++; } close TIN; print "test_while_variable: $linecount\n"; } sub test_block_read($) { my $block_size=$_[0]; open TIN,"<$test_file"; binmode TIN; my ($data, $n); my $newlinecount=0; while ((read TIN, $data, $_[0]) != 0) { $newlinecount+=($data =~ tr/\012//); } close(TIN); print "test_block_read: $newlinecount\n"; } sub test_fixed_block_read($) { my $block_size=$_[0]; open TIN,"<$test_file"; binmode TIN; my ($data, $n); my $newlinecount=0; while ((read TIN, $data, $_[0]) != 0) { $newlinecount+=($data =~ tr/\012//); } close(TIN); $newlinecount++ if $data !~ /\012$/; print "test_fixed_block_read: $newlinecount\n"; } sub test_block_sysread($) { my $block_size=$_[0]; open TIN,"<$test_file"; my ($data, $n); my $newlinecount=0; while ((sysread TIN, $data, $_[0]) != 0) { $newlinecount+=($data =~ tr/\012//); } close(TIN); print "test_block_sysread: $newlinecount\n"; } sub test_fixed_block_sysread($) { my $block_size=$_[0]; open TIN,"<$test_file"; my ($data, $n); my $newlinecount=0; while ((sysread TIN, $data, $_[0]) != 0) { $newlinecount+=($data =~ tr/\012//); } close(TIN); $newlinecount++ if $data !~ /\012$/; print "test_fixed_block_sysread: $newlinecount\n"; } # # do the test # create_test_file; # create the test file if not already presen +t test_while_variable; test_block_read 4096; test_fixed_block_read 4096; test_block_sysread 4096; test_fixed_block_sysread 4096; exit 0; [download]	[reply] [d/l]
Re^2: Count number of lines in a text file by Anonymous Monk on Aug 29, 2014 at 15:37 UTC
correction to the "fixed" routines in the demonstration code - it did not handle the "good" EOL case correctly because the "tr" removed the EOL and the original code thought the (good) EOL was missing as a result (I guess I learn something every day). this fix may not be perfect - I think it will return 1 for an empty file (untested) but I am sure there is a fix for that if someone really wants to do it. I will not be offended if someone finds yet another problem (besides the empty file problem) and comes up with a solution. that is what this site is for - to help people do a better job writing Perl code and to teach people little things about Perl that they may not have known or thought much about. sub test_fixed_block_read($) { my $block_size=$_[0]; open TIN,"<$test_file"; binmode TIN; my ($data, $n); my $newlinecount=0; my $block_ends_with_eol=0; while ((read TIN, $data, $_[0]) != 0) { $block_ends_with_eol=1 if (substr $data,-1,1) eq "\n"; $newlinecount+=($data =~ tr/\012//); } close(TIN); $newlinecount++ if (!$block_ends_with_eol); print ">>>test_fixed_block_read: $newlinecount\n"; } sub test_fixed_block_sysread($) { my $block_size=$_[0]; open TIN,"<$test_file"; my ($data, $n); my $newlinecount=0; my $block_ends_with_eol=0; while ((sysread TIN, $data, $_[0]) != 0) { $block_ends_with_eol=1 if (substr $data,-1,1) eq "\n"; $newlinecount+=($data =~ tr/\012//); } close(TIN); $newlinecount++ if (!$block_ends_with_eol); print ">>>test_fixed_block_sysread: $newlinecount\n"; } [download]	[reply] [d/l]
Re: Count number of lines in a text file by Anonymous Monk on Aug 18, 2014 at 21:57 UTC
if you are counting lines for some kind of a "production" process you probably want to avoid most of the "tricks" mentioned here since they are likely to be harder to maintain. and any variation of for or foreach processing needs to be avoided because you may run out of memory on a large file. I also tend to avoid $. because I have run into cases where the value is not correct when I am using multiple files (I maintain my own count). as noted in a previous comment there is a problem with sysread resulting in an incorrect count (and there is now a fix posted in a separate comment). the best generic method is probably a simple while loop case but if you are processing large files and need to provide some kind of progress indication you may find that a while sysread or read variation is a better choice. this program can be used to test several variations on a file in a single run (edit as needed for file size and/or progress). # # a test program to test the various line count methods under Perl. i +t turns # out that they each may have their own specific issues. # use strict; my $test_file="large_test_file.txt"; # change name to test size/ +length differences # # progress related code # sub quick_touch($) { open TCH,">>$_[0]"; # will create if file does not exist but n +ot destroy existing close TCH; } $\|=1; # so we can provide screen feedback my $last_progress_file; my $progress_interval=3; my $next_progress_time=0; # time greater than this requires feedba +ck - setting to 0 will feedback immediately sub progress_message($$) { return if time()<$next_progress_time; $next_progress_time=time()+$progress_interval; print "$_[0] $_[1]\r"; # # to touch a tag file instead of using screen feedback # #unlink $last_progress_file if defined $last_progress_file && $las +t_progress_file ne ""; #$last_progress_file="$_[0].$_[1]"; #quick_touch $tag_file; } # # create the test file # sub create_test_file { return if -e $test_file; # we do not want to create +if it already exists.... open TOUT,">$test_file"; # # write a file of a specified number of lines. want to have enoug +h to # defeat the cache and/or take long enough to minimize external ra +ndom # effects. # # 10 million lines is just over 1GB with this for output: # # "qwertyuiopasdfghjklzxcvbnm1234567890qwertyuiopasdfghjklzxcv +bnm1234567890qwertyuiopasdfghjklzxcvbnm1234567890\n" # #for (my $line=0;$line<200_000_000;$line++) for (my $line=0;$line<50_000_000;$line++) { print TOUT "qwertyuiopasdfghjklzxcvbnm1234567890qwertyuiopasdf +ghjklzxcvbnm1234567890qwertyuiopasdfghjklzxcvbnm1234567890\n"; progress_message "create_test_file",$line; # show progr +ess when writing since it may be taking a while } close TOUT; } ####sub create_test_file_short_line ####{ #### return if -e $test_file; # we do not want to cre +ate if it already exists.... #### open TOUT,">$test_file"; #### # #### # write a file of a specified number of lines. 100 million lin +es is about 600MB. #### # shorter lines may impact various things. it may be better t +o write many short #### # lines in a single print to save time when writing the file. #### # it can be very slow doing one line at a time so it may make +sense to write several #### # qwert\n strings in a single line (say 10) to speed up the wr +ite process. #### # #### for (my $line=0;$line<1_000_000_000;$line++) #### { #### print TOUT "qwert\n"; #### progress_message "create_test_file",$line; #### } #### close TOUT; ####} sub test_while_dot { open TIN,"<$test_file"; while (<TIN>) { ####progress_message "test_while_dot",$.; } close TIN; return $.; } sub test_while_variable { my $linecount=0; open TIN,"<$test_file"; while (<TIN>) { $linecount++; ####progress_message "test_while_variable",$linecount; } close TIN; return $linecount; } sub test_block_read($) { my $block_size=$_[0]; open TIN,"<$test_file"; binmode TIN; my ($data, $n); my $newlinecount=0; while ((read TIN, $data, $_[0]) != 0) { $newlinecount+=($data =~ tr/\012//); ####progress_message "test_block_read",$newlinecount; } close(TIN); return $newlinecount; # return the line count } # # calling these routines requires loading entire file into memory whic +h crashes # Linux 32 bit on large files. problem occurs with both for and forea +ch. # ###sub test_foreach_dot ###{ ### open TIN,"<$test_file"; ### for (<TIN>) ### { ### ####progress_message "test_foreach_dot $.",$.; ### } ### close TIN; ### return $.; ###} ###sub test_foreach_variable ###{ ### my $linecount=0; ### open TIN,"<$test_file"; ### for (<TIN>) ### { ### $linecount++; ### ####progress_message "test_foreach_variable $.",$linecount; ### } ### close TIN; ### return $linecount; ###} # # do the test # my $start_time; my $delta_time; $start_time=time(); create_test_file; # create the test file if not already presen +t $delta_time=time()-$start_time; print "\ncreate_test_file: $delta_time\n"; $start_time=time(); test_while_dot; $delta_time=time()-$start_time; print "\ntest_while_dot: $delta_time\n"; $start_time=time(); test_while_variable; $delta_time=time()-$start_time; print "\ntest_while_variable: $delta_time\n"; foreach my $block_size (4096,2561024,110241024,410241024,161024* +1024) { $start_time=time(); test_block_read $block_size; $delta_time=time()-$start_time; my $bsizek=$block_size/1024; print "\ntest_block_read ${bsizek}K: $delta_time\n"; } [download]	[reply] [d/l]
Re: Count number of lines in a text file by Anonymous Monk on Jan 23, 2013 at 08:53 UTC
Everybody should be aware that in an ASCII file, the last line may NOT contain a '\n' character. This means the following code: `while (sysread FILE, $buffer, 4096) { $lines += ($buffer =~ tr/\n//); }` [download] will never count the last line! Please do not use the above code !!! Use: `while (<FILE>) { $lines++ }` [download]	[reply] [d/l] [select]
Re: Count number of lines in a text file by gube (Parson) on Mar 28, 2006 at 07:24 UTC
Hi try this, `#! /usr/bin/perl use strict; open(IN, "test.txt"); my @str = <IN>; close(IN); print scalar(@str);` [download]	[reply] [d/l]
Re^2: Count number of lines in a text file by QM (Parson) on Mar 28, 2006 at 14:23 UTC
Excuse my French, but why the censored would you store the whole file in memory just to count the lines? Why hardcode the filename in the script? Why bother to open and close the file, when `<>` is so handy? Sorry, please forgive the tirade, I don't know what came over me. It must be the ghost of Abigail-II...Certainly TIMTOWTDI. (I find many of my cow-orkers skip over the "gather requirements" phase of programming, and jump headfirst into the shallow end of the implementation pool.) While playing with this, I wanted to check how similar schemes work. For instance, don't do this either: `perl -e 'print scalar(()=<>),"\n"' filename` [download] I tried this on a 300MB file, which took a long time (I waited several minutes before killing it), lots of memory, and started swapping to disk. I tried the following on the same 300MB file, which took about 10 seconds, and never went above 2MB memory: `perl -pe "}{$_=$." filename` [download] Inside a script, you could do this: `#!/your/perl/here -p }{$_=$.` [download] (yes, that compiles and runs too) though you may prefer the more conventional `#!/your/perl/here use strict; use warnings; while (<>) {} print "$.\n";` [download] If you want to get fancy, and feed it more than one file at a time, keeping track of each file, try this: `#!/your/perl/here use strict; use warnings; my $file_count = @ARGV; while (<>) {} continue { if (eof) { # print file names for multiple files print "$ARGV: " if ($file_count > 1); print "$.\n"; close ARGV; } }` [download] Someone will ask me for command line arguments to leave off the filenames, and provide summary statistics for multiple files. I'll leave that to OMAR. (Wow, there really is an OMAR! But he doesn't write much :( -QM -- Quantum Mechanics: The dreams stuff is made of	[reply] [d/l] [select]


go ahead... be a heretic
	PerlMonks