Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Read Some lines in Tera byte file

by cdarke (Prior)
on Oct 13, 2010 at 08:32 UTC ( [id://865042]=note: print w/replies, xml ) Need Help??


in reply to Read Some lines in Tera byte file

Is it possible without reading entire file, only to fetch required lines from the file ?

Well, you do not say what type of file it is, if the lines are a fixed length or not, and which operating system you are on.

Back in the olden days file formats were many and varied, and often supported indexes, even on lines in a file containing text. That is not generally done these days on UNIX or Windows. A text file does not contain physical line records anymore, it is just a stream of bytes. So when a file looks like this in a text editor or file viewer:
This is line 1 This is line 2 This is line 3
in fact the file really looks like this (on UNIX):
This is line 1\nThis is line 2\nThis is line 3\n
where "\n" is a newline character. Windows text files by convention have "\r\n" between each line, and might be terminated by ^Z (control-Z).

So, a text file is just a stream of bytes. Saying that you want to seek to line 100 means that you need the position of the start of line 100 in the file, there is no index of line positions attached to the file unless you construct one yourself. If the lines are of fixed length then it is easy to derive that position. Some log files do have fixed length lines, but most do not.

One possibility to improve performance, particularly if the file is accessed over a network, is to zip it up then use an unzip program to pipe the data to you, for example:
open (my $zip, '-|', 'gzip -dc compressed_file.gz') || die "Can't run gzip: $!"; while (<$zip>) { # do some stuff } close $zip;
There are modules on CPAN that will do this as well, but I don't have any experience of them. How much I/O this will save depends on how much compression can be done, and that is data dependant. It might even be slower, you will have to experiment.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://865042]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2024-04-24 22:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found