Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
Is it possible without reading entire file, only to fetch required lines from the file ?

Well, you do not say what type of file it is, if the lines are a fixed length or not, and which operating system you are on.

Back in the olden days file formats were many and varied, and often supported indexes, even on lines in a file containing text. That is not generally done these days on UNIX or Windows. A text file does not contain physical line records anymore, it is just a stream of bytes. So when a file looks like this in a text editor or file viewer:
This is line 1 This is line 2 This is line 3
in fact the file really looks like this (on UNIX):
This is line 1\nThis is line 2\nThis is line 3\n
where "\n" is a newline character. Windows text files by convention have "\r\n" between each line, and might be terminated by ^Z (control-Z).

So, a text file is just a stream of bytes. Saying that you want to seek to line 100 means that you need the position of the start of line 100 in the file, there is no index of line positions attached to the file unless you construct one yourself. If the lines are of fixed length then it is easy to derive that position. Some log files do have fixed length lines, but most do not.

One possibility to improve performance, particularly if the file is accessed over a network, is to zip it up then use an unzip program to pipe the data to you, for example:
open (my $zip, '-|', 'gzip -dc compressed_file.gz') || die "Can't run gzip: $!"; while (<$zip>) { # do some stuff } close $zip;
There are modules on CPAN that will do this as well, but I don't have any experience of them. How much I/O this will save depends on how much compression can be done, and that is data dependant. It might even be slower, you will have to experiment.

In reply to Re: Read Some lines in Tera byte file by cdarke
in thread Read Some lines in Tera byte file by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others chanting in the Monastery: (8)
    As of 2014-08-23 14:03 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The best computer themed movie is:











      Results (174 votes), past polls