Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

Re^2: (OT) help reading a bgzip file

by Anonymous Monk
on May 03, 2011 at 14:10 UTC ( #902720=note: print w/replies, xml ) Need Help??

in reply to Re: (OT) help reading a bgzip file
in thread (OT) help reading a bgzip file

HI. no i don't mean bzip2. This program is called bgzip. I have never used it before. I just know it is a compression/decompression tool. This is its page on sourceforge This is the file info. The test..txt.gz file is the file created by 'bgzipping' test.txt
bgzip test.txt file test.txt.gz test.txt.gz: gzip compressed data, extra field

Replies are listed 'Best First'.
Re^3: (OT) help reading a bgzip file
by Utilitarian (Vicar) on May 03, 2011 at 14:50 UTC
    Ah, OK, new one on me - anyway, what does the decompressed file that won't load into your editor say it is when you run file on it?

    print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."
      hi. it says what i put in the last post - that it is a gzip compressed data with extra field but when i read it with zless it is nonsense and contains mostly this string ^@ intermixed with ascii characters
        If file is really just a gzip file then try checking that it isn't corrupt
        gunzip -tv file.bgzip
        Looking at (which I think is the reference for the file format used by bgzip) it looks like the payload data (BAM) in the bgzip file is not ascii text - that would explain why you are seeing non-ascii text when you run it through zless.
Re^3: (OT) help reading a bgzip file
by barvin (Beadle) on Jan 19, 2012 at 20:14 UTC

    Just thought I'd add a note here since this is popping up fairly high on Google searches for bgzip. Bgzip uses the the BGZF format which is a fully backward compliant but application specific extension of gzip. In other words you can unzip a bgzipped file with gunzip, but you can't create one with gzip.

    The addition that bgzip adds is block level compression. You can use the library to compress and uncompress input data in blocks which provides for a level of random access to the compressed file. The format was developed by Bob Handsaker of the Broad Institute for use in genomics/bioinformatics applications. It has been modified and used by Bob and Heng Li (also currently at the Broad) in next-generation sequence alignment and sequence variant analysis tools developed as part of the 1,000 genomes project. Application such as the BAM file format, samtools, and tabix use bgzip/BGZF to compress sequence alignment and sequence variant files and allow rapid random access to the data compressed within those files.

    There are perl libraries that provide an API to files compressed in BAM and to the tabix library.

    For more information see:

    The forum would be a good place for questions about the format and it's applications as the authors and many users are active there.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://902720]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (9)
As of 2018-06-18 13:42 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (109 votes). Check out past polls.