Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Random accessing a gzip file

by anmol.itbhu (Initiate)
on Oct 16, 2012 at 20:18 UTC ( #999418=perlquestion: print w/ replies, xml ) Need Help??
anmol.itbhu has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I want to fetch few lines from the gzip file (of size 1 GB). since file contains lots of data so i have somehow found the start and end byte of the lines which i want to fetch from the file. please suggest how can i randomly access a gzip file when i know the start byte of the line to be fetched..
Is it possible through IO::Uncompress::Gunzip module?
Can it be done using file handling function like seek and tell?

Comment on Random accessing a gzip file
Re: Random accessing a gzip file
by grondilu (Pilgrim) on Oct 17, 2012 at 01:05 UTC

    From the manual page of IO::Uncompress::Gunzip, I read:

    seek
    $z->seek($position, $whence); seek($z, $position, $whence);
    Provides a sub-set of the "seek" functionality, with the restriction that it is only legal to seek forward in the input file/buffer. It is a fatal error to attempt to seek backward.
    tell
    Usage is
    $z->tell() tell $z
    Returns the uncompressed file offset.

    So yes, you can, but of course it can not be as easy as with an uncompressed file.

      Note that the implementation of seek in IO::Uncompress::Gunzip does not provide true random access to a compressed file. It works by uncompressing data from the current offset in the file/buffer until it reaches the ucompressed offset specified in the parameters to seek. For very small files this may be acceptable behaviour. For large files it may cause an unacceptable delay.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://999418]
Approved by toolic
Front-paged by MidLifeXis
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (9)
As of 2014-07-30 18:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (239 votes), past polls