Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Segmenation Fault in IO::Uncompress:Bunzip2

by megaframe (Novice)
on Apr 09, 2013 at 19:14 UTC ( #1027818=perlquestion: print w/replies, xml ) Need Help??
megaframe has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

First let me start by saying the file I have does apear to be damaged.


I have a log file parser that uses IO::Uncompress:Bunzip2 so I can open compressed files without needing to uncompress the entire contents. This works out to be much faster as sometimes I only need a few lines from the files. Problem is I have a ascii text file that has a corrupted line, not sure on the contents, it's very large and VIM stalls and truncates at that line. If I use Bunzip2 module to read the file it stalls there and causes a segmentation fault (see the trace below). If I unzip the file so now its just a regular ascii file and use perl's built in open and step through using while($line = <$file>) everything is fine (perl still seems to "pause" at the broken line but eventually moves past it and parses the rest of the file. I'd love to get that reaction from Bunzip2 but I just keep getting the Segmentation faults.

Any help would be appreciated.

>> /home/utils/perl-5.14/5.14.1-threads-64/lib/5.14.1/IO/Uncompress/Ba if (*$self->{Encoding}) { >> /home/utils/perl-5.14/5.14.1-threads-64/lib/5.14.1/IO/Uncompress/Ba if ($status == STATUS_ENDSTREAM) { >> /home/utils/perl-5.14/5.14.1-threads-64/lib/5.14.1/IO/Uncompress/Ba return $buf_len ; >> /home/utils/perl-5.14/5.14.1-threads-64/lib/5.14.1/IO/Uncompress/Ba return $len ; >> /home/utils/perl-5.14/5.14.1-threads-64/lib/5.14.1/IO/Uncompress/Ba my $offset = index($line, $/); Segmentation fault

Replies are listed 'Best First'.
Re: Segmenation Fault in IO::Uncompress:Bunzip2
by BrowserUk (Pope) on Apr 09, 2013 at 19:57 UTC

    It sounds very much like that one line is very, very long. Hence the attempt to search it for a newline is overrunning some internal buffer.

    As you say Perl can process the decompress file line by line, you should pass it (the decompressed copy) through a Perl script that checks the length of the lines and inserts one or more newlines if they are over some reasonable length. Then you can recompress it.

    Perhaps as a first pass, you could print out any line(s) that are greater than the maximum you might reasonably expect and that way determine whether there isn't some obvious way of 'wrapping' them such that they make sense.

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Segmenation Fault in IO::Uncompress:Bunzip2
by pmqs (Pilgrim) on Apr 09, 2013 at 19:51 UTC
    If you can read the uncompressed version ok, can you find out how long the corrupt line is?

      It doesn't actually dump to the screen perl open handler seems to be doing some magic that other programs can't seem to do. I know the exact line so I tried doing head -n 44816 FILE.txt | tail -n 1 and the whole thing stalls with 100% cpu usage. (Uncompressed file is 2GB and 500k+ lines btw)

      Contents of the line seem to be really messed up, but I'm finding a lot of these corruptions and I need to just try and get past them back to the real content.

Re: Segmenation Fault in IO::Uncompress:Bunzip2
by flexvault (Monsignor) on Apr 10, 2013 at 13:53 UTC

    Welcome megaframe,

    Today this type of error is rare, but I've seen this problem on older systems in the past. My guess is that you have a hardware failure on one of your system disks. Try this,

    • Rename the file.
    • Compress the ascii file to the original name.
    • See if your non-Perl utilities work.
    If they work, then the disk hardware area in the original file(now renamed) has problems. Do not erase the file since you then put the bad area back into the free disk block area. I'd also look at the log files for hardware error messages. Note: Also consider replacing the disk if you have too many of these errors.

    Now if the error moves to the new file, then you've found a real bad software problem.

    As what has already been mentioned by others, you have a long line that is overflowing an internal buffer. When you get that line using Perl, 'print' the length to the console to see if that helps solve the software problem.

    Good Luck...Ed!

    "Well done is better than well said." - Benjamin Franklin


      Seems the issue occured on another system on the original ascii file but got copied into the compressed archive and I ended up with the bad chunk.

      Couldn't figure out a clean way to move past it the perl IO::Uncompress::Bunzip2 is allocating a reasonable amount of memory to store the line but the corrupted line is huge and causes a seg fault. Only way would be some C code level fix in that module.

      So I kind of had to leave it there and hunt down and manually clear the several hundred corrupted files.

        Hello megaframe,

        I'm curious about your answer/comment. If I understand you correctly, It sounds like the problem was created on another system and then transferred to your system where the problem occurred for you. If that is correct then you have found a software bug in the utility that generates the archive.

        That implies that you'll get this problem again, and you'll have to manually fix the corrupted files. That doesn't sound like a good fix, but your time is important and maybe it will be years before it happens again. If not try to change the utility on the offending computer.

        Good Luck...Ed

        "Well done is better than well said." - Benjamin Franklin

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1027818]
Front-paged by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2017-06-23 00:50 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (533 votes). Check out past polls.