Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Estimate line count in text file

by xaprb (Scribe)
on Jul 18, 2008 at 12:43 UTC ( #698594=perlquestion: print w/ replies, xml ) Need Help??
xaprb has asked for the wisdom of the Perl Monks concerning the following question:

I have a program that might want to estimate completion on large files. Any thoughts on the best way to quickly estimate the line count in a very large text file? My idea was to get the file size, and if it's less than 100MB just use wc -l. Otherwise take 100 4 KiB (aligned) samples by seeking to pre-calculated offsets in the file and reading 4096 bytes, counting the number of bytes between each newline and taking that as the line length; then the number of lines is $filesize / ($avg_line_len + length("\n")).

Update: replaced "seeking through" with "seeking to pre-calculated offsets in"

Comment on Estimate line count in text file
Re: Estimate line count in text file
by marto (Chancellor) on Jul 18, 2008 at 12:51 UTC
      Sure. I saw all of these. (Though I do not see any reply by davorg). They are all exact, not estimated. The key here is "estimated because the file is Very, Very Large." Reading the whole file may be unacceptable.
        xaprb,

        My mistake, I was referring to davidrw's reply. I would suggest (if you have not already done so) benchmarking their Tie::File solution with some 'large' files, since peoples definition of what constitutes a large file differs :)

        Martin
Re: Estimate line count in text file
by GrandFather (Cardinal) on Jul 18, 2008 at 13:12 UTC

    Why not use -s to find the file size then use tell from time to time to determine how far through you are and make a time remaining estimate from that?


    Perl is environmentally friendly - it saves trees
      That's a great idea!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://698594]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (7)
As of 2014-10-02 07:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (50 votes), past polls