Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Is it possible without reading entire file, only to fetch required lines from the file ?

Well, you do not say what type of file it is, if the lines are a fixed length or not, and which operating system you are on.

Back in the olden days file formats were many and varied, and often supported indexes, even on lines in a file containing text. That is not generally done these days on UNIX or Windows. A text file does not contain physical line records anymore, it is just a stream of bytes. So when a file looks like this in a text editor or file viewer:
This is line 1 This is line 2 This is line 3
in fact the file really looks like this (on UNIX):
This is line 1\nThis is line 2\nThis is line 3\n
where "\n" is a newline character. Windows text files by convention have "\r\n" between each line, and might be terminated by ^Z (control-Z).

So, a text file is just a stream of bytes. Saying that you want to seek to line 100 means that you need the position of the start of line 100 in the file, there is no index of line positions attached to the file unless you construct one yourself. If the lines are of fixed length then it is easy to derive that position. Some log files do have fixed length lines, but most do not.

One possibility to improve performance, particularly if the file is accessed over a network, is to zip it up then use an unzip program to pipe the data to you, for example:
open (my $zip, '-|', 'gzip -dc compressed_file.gz') || die "Can't run gzip: $!"; while (<$zip>) { # do some stuff } close $zip;
There are modules on CPAN that will do this as well, but I don't have any experience of them. How much I/O this will save depends on how much compression can be done, and that is data dependant. It might even be slower, you will have to experiment.

In reply to Re: Read Some lines in Tera byte file by cdarke
in thread Read Some lines in Tera byte file by Anonymous Monk

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others lurking in the Monastery: (5)
    As of 2018-01-19 21:56 GMT
    Find Nodes?
      Voting Booth?
      How did you see in the new year?

      Results (223 votes). Check out past polls.