Re: Read Some lines in Tera byte file

Is it possible without reading entire file, only to fetch required lines from the file ?

Well, you do not say what type of file it is, if the lines are a fixed length or not, and which operating system you are on.

Back in the olden days file formats were many and varied, and often supported indexes, even on lines in a file containing text. That is not generally done these days on UNIX or Windows. A text file does not contain physical line records anymore, it is just a stream of bytes. So when a file looks like this in a text editor or file viewer:

This is line 1
This is line 2
This is line 3
[download]

in fact the file really looks like this (on UNIX):

This is line 1\nThis is line 2\nThis is line 3\n
[download]

where "\n" is a newline character. Windows text files by convention have "\r\n" between each line, and might be terminated by ^Z (control-Z).

So, a text file is just a stream of bytes. Saying that you want to seek to line 100 means that you need the position of the start of line 100 in the file, there is no index of line positions attached to the file unless you construct one yourself. If the lines are of fixed length then it is easy to derive that position. Some log files do have fixed length lines, but most do not.

One possibility to improve performance, particularly if the file is accessed over a network, is to zip it up then use an unzip program to pipe the data to you, for example:

open (my $zip, '-|', 'gzip -dc compressed_file.gz') ||
      die "Can't run gzip: $!";

while (<$zip>) {
  # do some stuff
}
close $zip;
[download]

There are modules on CPAN that will do this as well, but I don't have any experience of them. How much I/O this will save depends on how much compression can be done, and that is data dependant. It might even be slower, you will have to experiment.

Comment on Re: Read Some lines in Tera byte file Select or Download Code


There's more than one way to do things
	PerlMonks