Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Seeking through a large gzipped file

by roysperlarnab (Initiate)
on May 14, 2013 at 15:05 UTC ( #1033500=perlquestion: print w/ replies, xml ) Need Help??
roysperlarnab has asked for the wisdom of the Perl Monks concerning the following question:

I need to seek through a large gzipped data file. As I know, there are two ways that I can read a '.gz' file; those are 'open "zcat $filename|"' and to use Compress::Zlib. I have seen using the later is too much slower than the first. But zcat method doesn't allow me to seek, atleast by 'seek' tool. Is there a fast way by which I can seek through a large '.gz' data file?

Comment on Seeking through a large gzipped file
Re: Seeking through a large gzipped file
by NetWallah (Abbot) on May 14, 2013 at 15:30 UTC
    PerlIO::via::gzip will probably work as fast as "zcat", but without the need to shell out.

    Since it gives you a file handle, "seek" should work.

                 "I'm fairly sure if they took porn off the Internet, there'd only be one website left, and it'd be called 'Bring Back the Porn!'"
            -- Dr. Cox, Scrubs

      I'll try that, thanks

      If you have at least one idle CPU, using open "zcat $file |" will allow you to spread the load across two CPUs, one for the perl process and one for the zcat process. That way, all the CPU cycles you need for decompressing the data get offloaded from perl. This should be at least as fast as using [mod://PerlIO::via::gzip</c> if not faster, as usually transferring the data between processes is fairly fast.

Re: Seeking through a large gzipped file
by vsespb (Hermit) on May 14, 2013 at 15:35 UTC

    Obviously with 'open "zcat $filename|"' you need our own seek mechanism (like caching previously decompressed data).

    Perhaps when you implemented it will be as slow as Compress::Zlib.

      Actually, if we use zcat, then we have to read through the lines and you know what? This is faster than using Compress::Zlib's gzseek!!!
Re: Seeking through a large gzipped file
by karlgoethebier (Curate) on May 14, 2013 at 19:40 UTC
    «...PerlIO::via::gzip will probably work as fast as "zcat"» (NetWallah)
    «If you have at least one idle CPU, using open "zcat $file |" will allow you to spread the load across two CPUs...» (Corion)
    «...Perhaps when you implemented it will be as slow as Compress::Zlib.» (vsespb)
    «...This is faster than using Compress::Zlib's gzseek!!!» (roysperlarnab)
    «Variatio delectat» (Phaedrus Augusti Libertus)

    No volunteers to benchmark this?

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1033500]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2014-08-21 23:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (144 votes), past polls