Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Count number of lines in a text file

by Scott7477 (Chaplain)
on Mar 23, 2006 at 18:52 UTC ( #538824=snippet: print w/ replies, xml ) Need Help??

Description: Short and simple...I borrowed the counting code from ActiveState's Perl docs and added some to make this accessible from the command line.
#Count number of lines in a file
$filename = <@ARGV>;
 $lines = 0;
    open(FILE, $filename) or die "Can't open `$filename': $!";
    while (sysread FILE, $buffer, 4096) {
        $lines += ($buffer =~ tr/\n//);
    }
    close FILE;
print "The number of lines in $filename is $lines.\n";

Comment on Count number of lines in a text file
Download Code
Re: Count number of lines in a text file
by davidrw (Prior) on Mar 23, 2006 at 19:10 UTC
    different approach using Tie::File (should also work w/large files):
    perl -MTie::File -MFcntl=O_RDONLY -le 'tie @array, "Tie::File", shift, + mode => O_RDONLY or die $!; print scalar @array' /etc/hosts
    .. and of course there's wc on *nix or from ppt
Re: Count number of lines in a text file
by aweeraman (Novice) on Mar 23, 2006 at 20:04 UTC
    Here's another variation that gets rid of a few lines:
    open (FILE, $ARGV[0]) or die "Can't open '$ARGV[0]': $!"; $lines++ while (<FILE>); close FILE; print "$lines\n";
      don't need $lines -- $. does it for you (see perlvar)
      perl -le 'open FILE, "/etc/passwd"; @_=<FILE>; print $.'
      And (similar to QM's golf reply:
      perl -lne 'END{print $.}' /etc/passwd
Re: Count number of lines in a text file
by QM (Vicar) on Mar 23, 2006 at 20:31 UTC
    What about this oft-quoted golf shot?
    perl -pe '}{$_=$.' filename

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

      Very tricky. So because of the }, the continue { print $_; } block is no longer attached to the loop, but to the {$_=$.} bare block. I never knew this trick with -p. (I knew about -ne '...}{...' but not what it did with -p.)
        Yes. But -p and -n only differ by the continue/print. Compare -p:
        > perl -MO=Deparse -pe"#stuff#" LINE: while (defined($_ = <ARGV>)) { (); } continue { print $_; }
        to -n:
        > perl -MO=Deparse -ne"#stuff#" LINE: while (defined($_ = <ARGV>)) { (); }

        -QM
        --
        Quantum Mechanics: The dreams stuff is made of

Re: Count number of lines in a text file
by ambrus (Abbot) on Mar 24, 2006 at 10:36 UTC

    Here's yet another golf variation, provided there's only one filename and that's a real filename (not a glob or something you can give to the two-arg open):

    exec wc,-ll,pop
Re: Count number of lines in a text file
by gube (Parson) on Mar 28, 2006 at 07:24 UTC

    Hi try this,

    #! /usr/bin/perl use strict; open(IN, "test.txt"); my @str = <IN>; close(IN); print scalar(@str);
      Excuse my French, but why the **censored** would you store the whole file in memory just to count the lines? Why hardcode the filename in the script? Why bother to open and close the file, when <> is so handy?

      Sorry, please forgive the tirade, I don't know what came over me. It must be the ghost of Abigail-II...Certainly TIMTOWTDI. (I find many of my cow-orkers skip over the "gather requirements" phase of programming, and jump headfirst into the shallow end of the implementation pool.)

      While playing with this, I wanted to check how similar schemes work. For instance, don't do this either:

      perl -e 'print scalar(()=<>),"\n"' filename
      I tried this on a 300MB file, which took a long time (I waited several minutes before killing it), lots of memory, and started swapping to disk.

      I tried the following on the same 300MB file, which took about 10 seconds, and never went above 2MB memory:

      perl -pe "}{$_=$." filename
      Inside a script, you could do this:
      #!/your/perl/here -p }{$_=$.
      (yes, that compiles and runs too) though you may prefer the more conventional
      #!/your/perl/here use strict; use warnings; while (<>) {} print "$.\n";
      If you want to get fancy, and feed it more than one file at a time, keeping track of each file, try this:
      #!/your/perl/here use strict; use warnings; my $file_count = @ARGV; while (<>) {} continue { if (eof) { # print file names for multiple files print "$ARGV: " if ($file_count > 1); print "$.\n"; close ARGV; } }
      Someone will ask me for command line arguments to leave off the filenames, and provide summary statistics for multiple files. I'll leave that to OMAR. (Wow, there really is an OMAR! But he doesn't write much :(

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

Re: Count number of lines in a text file
by ajbryant59 (Initiate) on Sep 12, 2007 at 21:25 UTC
    Here's a one-liner inspired by some of the other solutions: $num_lines = scalar(() = ${${new IO::File "<filename"}});
      Sorry to have to submit a retraction but the one-liner I submitted before doesn't work. I was sure I had this one working but c'est la vie. Sorry to clutter the postings.
      what about my($NumLines)=scalar <$Filename>
        That only works if the current line just happens to contain a number which equals the number of lines. <> in scalar context just reads a single line.

        But this works:

        perl -wE'say~~(()=<>)' yourfile
Re: Count number of lines in a text file
by Anonymous Monk on Jan 23, 2013 at 08:53 UTC
    Everybody should be aware that in an ASCII file, the last line may NOT contain a '\n' character. This means the following code:
    while (sysread FILE, $buffer, 4096) { $lines += ($buffer =~ tr/\n//); }
    will never count the last line! Please do not use the above code !!!

    Use:

    while (<FILE>) { $lines++ }

Re: Count number of lines in a text file
by Anonymous Monk on Aug 18, 2014 at 21:34 UTC
    the sysread problem can be corrected with a fairly simple change - just check for the last buffer for the end of line. sample program showing some of the variations:
    # # demonstrate the sysread (and read) problem and correction. we reall +y only need a small # file to demonstrate the problem.... # use strict; my $test_file="sysread_test_file.txt"; # change name to test siz +e/length differences # # create the test file # sub create_test_file { return if -e $test_file; # we do not want to create +if it already exists.... open TOUT,">$test_file"; # # write a small file with an extra line missing the EOL. # for (my $line=0;$line<1000;$line++) { print TOUT "qwertyuiopasdfghjklzxcvbnm1234567890qwertyuiopasdf +ghjklzxcvbnm1234567890qwertyuiopasdfghjklzxcvbnm1234567890\n"; } print TOUT "qwertyuiopasdfghjklzxcvbnm1234567890qwertyuiopasdfghjk +lzxcvbnm1234567890qwertyuiopasdfghjklzxcvbnm1234567890"; # no EO +L! close TOUT; } sub test_while_variable { my $linecount=0; open TIN,"<$test_file"; while (<TIN>) { $linecount++; } close TIN; print "test_while_variable: $linecount\n"; } sub test_block_read($) { my $block_size=$_[0]; open TIN,"<$test_file"; binmode TIN; my ($data, $n); my $newlinecount=0; while ((read TIN, $data, $_[0]) != 0) { $newlinecount+=($data =~ tr/\012//); } close(TIN); print "test_block_read: $newlinecount\n"; } sub test_fixed_block_read($) { my $block_size=$_[0]; open TIN,"<$test_file"; binmode TIN; my ($data, $n); my $newlinecount=0; while ((read TIN, $data, $_[0]) != 0) { $newlinecount+=($data =~ tr/\012//); } close(TIN); $newlinecount++ if $data !~ /\012$/; print "test_fixed_block_read: $newlinecount\n"; } sub test_block_sysread($) { my $block_size=$_[0]; open TIN,"<$test_file"; my ($data, $n); my $newlinecount=0; while ((sysread TIN, $data, $_[0]) != 0) { $newlinecount+=($data =~ tr/\012//); } close(TIN); print "test_block_sysread: $newlinecount\n"; } sub test_fixed_block_sysread($) { my $block_size=$_[0]; open TIN,"<$test_file"; my ($data, $n); my $newlinecount=0; while ((sysread TIN, $data, $_[0]) != 0) { $newlinecount+=($data =~ tr/\012//); } close(TIN); $newlinecount++ if $data !~ /\012$/; print "test_fixed_block_sysread: $newlinecount\n"; } # # do the test # create_test_file; # create the test file if not already presen +t test_while_variable; test_block_read 4096; test_fixed_block_read 4096; test_block_sysread 4096; test_fixed_block_sysread 4096; exit 0;

      correction to the "fixed" routines in the demonstration code - it did not handle the "good" EOL case correctly because the "tr" removed the EOL and the original code thought the (good) EOL was missing as a result (I guess I learn something every day). this fix may not be perfect - I think it will return 1 for an empty file (untested) but I am sure there is a fix for that if someone really wants to do it.

      I will not be offended if someone finds yet another problem (besides the empty file problem) and comes up with a solution. that is what this site is for - to help people do a better job writing Perl code and to teach people little things about Perl that they may not have known or thought much about.

      sub test_fixed_block_read($) { my $block_size=$_[0]; open TIN,"<$test_file"; binmode TIN; my ($data, $n); my $newlinecount=0; my $block_ends_with_eol=0; while ((read TIN, $data, $_[0]) != 0) { $block_ends_with_eol=1 if (substr $data,-1,1) eq "\n"; $newlinecount+=($data =~ tr/\012//); } close(TIN); $newlinecount++ if (!$block_ends_with_eol); print ">>>test_fixed_block_read: $newlinecount\n"; } sub test_fixed_block_sysread($) { my $block_size=$_[0]; open TIN,"<$test_file"; my ($data, $n); my $newlinecount=0; my $block_ends_with_eol=0; while ((sysread TIN, $data, $_[0]) != 0) { $block_ends_with_eol=1 if (substr $data,-1,1) eq "\n"; $newlinecount+=($data =~ tr/\012//); } close(TIN); $newlinecount++ if (!$block_ends_with_eol); print ">>>test_fixed_block_sysread: $newlinecount\n"; }
Re: Count number of lines in a text file
by Anonymous Monk on Aug 18, 2014 at 21:57 UTC

    if you are counting lines for some kind of a "production" process you probably want to avoid most of the "tricks" mentioned here since they are likely to be harder to maintain. and any variation of for or foreach processing needs to be avoided because you may run out of memory on a large file. I also tend to avoid $. because I have run into cases where the value is not correct when I am using multiple files (I maintain my own count).

    as noted in a previous comment there is a problem with sysread resulting in an incorrect count (and there is now a fix posted in a separate comment).

    the best generic method is probably a simple while loop case but if you are processing large files and need to provide some kind of progress indication you may find that a while sysread or read variation is a better choice. this program can be used to test several variations on a file in a single run (edit as needed for file size and/or progress).

    # # a test program to test the various line count methods under Perl. i +t turns # out that they each may have their own specific issues. # use strict; my $test_file="large_test_file.txt"; # change name to test size/ +length differences # # progress related code # sub quick_touch($) { open TCH,">>$_[0]"; # will create if file does not exist but n +ot destroy existing close TCH; } $|=1; # so we can provide screen feedback my $last_progress_file; my $progress_interval=3; my $next_progress_time=0; # time greater than this requires feedba +ck - setting to 0 will feedback immediately sub progress_message($$) { return if time()<$next_progress_time; $next_progress_time=time()+$progress_interval; print "$_[0] $_[1]\r"; # # to touch a tag file instead of using screen feedback # #unlink $last_progress_file if defined $last_progress_file && $las +t_progress_file ne ""; #$last_progress_file="$_[0].$_[1]"; #quick_touch $tag_file; } # # create the test file # sub create_test_file { return if -e $test_file; # we do not want to create +if it already exists.... open TOUT,">$test_file"; # # write a file of a specified number of lines. want to have enoug +h to # defeat the cache and/or take long enough to minimize external ra +ndom # effects. # # 10 million lines is just over 1GB with this for output: # # "qwertyuiopasdfghjklzxcvbnm1234567890qwertyuiopasdfghjklzxcv +bnm1234567890qwertyuiopasdfghjklzxcvbnm1234567890\n" # #for (my $line=0;$line<200_000_000;$line++) for (my $line=0;$line<50_000_000;$line++) { print TOUT "qwertyuiopasdfghjklzxcvbnm1234567890qwertyuiopasdf +ghjklzxcvbnm1234567890qwertyuiopasdfghjklzxcvbnm1234567890\n"; progress_message "create_test_file",$line; # show progr +ess when writing since it may be taking a while } close TOUT; } ####sub create_test_file_short_line ####{ #### return if -e $test_file; # we do not want to cre +ate if it already exists.... #### open TOUT,">$test_file"; #### # #### # write a file of a specified number of lines. 100 million lin +es is about 600MB. #### # shorter lines may impact various things. it may be better t +o write many short #### # lines in a single print to save time when writing the file. #### # it can be very slow doing one line at a time so it may make +sense to write several #### # qwert\n strings in a single line (say 10) to speed up the wr +ite process. #### # #### for (my $line=0;$line<1_000_000_000;$line++) #### { #### print TOUT "qwert\n"; #### progress_message "create_test_file",$line; #### } #### close TOUT; ####} sub test_while_dot { open TIN,"<$test_file"; while (<TIN>) { ####progress_message "test_while_dot",$.; } close TIN; return $.; } sub test_while_variable { my $linecount=0; open TIN,"<$test_file"; while (<TIN>) { $linecount++; ####progress_message "test_while_variable",$linecount; } close TIN; return $linecount; } sub test_block_read($) { my $block_size=$_[0]; open TIN,"<$test_file"; binmode TIN; my ($data, $n); my $newlinecount=0; while ((read TIN, $data, $_[0]) != 0) { $newlinecount+=($data =~ tr/\012//); ####progress_message "test_block_read",$newlinecount; } close(TIN); return $newlinecount; # return the line count } # # calling these routines requires loading entire file into memory whic +h crashes # Linux 32 bit on large files. problem occurs with both for and forea +ch. # ###sub test_foreach_dot ###{ ### open TIN,"<$test_file"; ### for (<TIN>) ### { ### ####progress_message "test_foreach_dot $.",$.; ### } ### close TIN; ### return $.; ###} ###sub test_foreach_variable ###{ ### my $linecount=0; ### open TIN,"<$test_file"; ### for (<TIN>) ### { ### $linecount++; ### ####progress_message "test_foreach_variable $.",$linecount; ### } ### close TIN; ### return $linecount; ###} # # do the test # my $start_time; my $delta_time; $start_time=time(); create_test_file; # create the test file if not already presen +t $delta_time=time()-$start_time; print "\ncreate_test_file: $delta_time\n"; $start_time=time(); test_while_dot; $delta_time=time()-$start_time; print "\ntest_while_dot: $delta_time\n"; $start_time=time(); test_while_variable; $delta_time=time()-$start_time; print "\ntest_while_variable: $delta_time\n"; foreach my $block_size (4096,256*1024,1*1024*1024,4*1024*1024,16*1024* +1024) { $start_time=time(); test_block_read $block_size; $delta_time=time()-$start_time; my $bsizek=$block_size/1024; print "\ntest_block_read ${bsizek}K: $delta_time\n"; }
Re: Count number of lines in a text file
by Anonymous Monk on Aug 18, 2014 at 23:10 UTC

    The simple and efficient mmap version is both efficient and simple:

    use File::Map q(map_file); map_file $_, shift, '<'; print tr/\n// + /[^\n]\z/;

      seems like a lot of stuff to install just to count the lines in a file. File::Map is not installed by default on any of the 3 systems I am currently using. and see http://www.perlmonks.org/?node_id=989383 for the problems with large files. there may be other examples.

      I sense some confusion about what is being discussed here based on all of the responses. Perl is known for having many ways to do the same thing but it does not mean that every one of those ways is a good way to do it. if all you want to know is how many lines are in a file then why would you want to load the entire file in memory (if it even fits) or go thru the mapping process - seems like overkill to me.

      I wonder if the original post was meant as a discussion about doing a line count without using the *nix wc command. many of the responses seem to be trying to come up with a command line to do the line count.

      perhaps we as software developers should remember the KISS method and save ourselves a lot of trouble. CPAN is a source of many good tools to solve many problems but sometimes CPAN may be a sledgehammer when all you really are trying to do is push in a thumbtack.

Back to Snippets Section

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: snippet [id://538824]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (7)
As of 2014-10-30 12:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (207 votes), past polls