Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Re: Reading partial/corrupt zip files

by abcde (Scribe)
on Jan 01, 2006 at 12:11 UTC ( #520230=note: print w/replies, xml ) Need Help??

in reply to Reading partial/corrupt zip files

I am not sure of any library that can read corrupted files, but it might be possible to turn an incomplete file into a complete one, adding a phony footer and removing the last file.

Look at the ZIP file format:
The files in the zip are not connected to each other, so it is possible to read through the file, parsing each file as it comes and bailing out if the file ends unexpectedly:
#!/usr/bin/perl open( Z, "" ); sub error { print "The file is corrupt.\n"; exit; } sub readstr { my $received; error() if eof Z; $received .= getc(Z) . " " for ( 1 .. $_[0] ); return $received; } sub readint { my $received = 0; error() if eof Z; $received = $received * 255 + ord( getc(Z) ) for ( 1 .. $_[0] ); return $received; } while ( !eof Z ) { my $head = readstr(4); # PK^C^D my $versions = readstr(4); # ... my $filenamelength = readint(2); # ... # Parse the rest of this file # Until we get an error or go on to the next file header } close(Z);

My code doesn't actually produce a correct footer for the file, but it should start you off.

Replies are listed 'Best First'.
Re^2: Reading partial/corrupt zip files
by steves (Curate) on Jan 01, 2006 at 15:10 UTC

    You've hit on the key -- that the files are not connected. Looking at the code, Archive::Zip appears to always first access the central directory information, which is at the end of the file. For files that are not fully sent, that never works since it's the last part of the file that's missing. It makes sense to build the code around the central directory -- it's surely faster than parsing the entire zip file to get the pieces that are available. So I think a recovery method would have to try and piece things together the slow way as you state.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://520230]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (12)
As of 2019-05-23 13:23 GMT
Find Nodes?
    Voting Booth?
    Do you enjoy 3D movies?

    Results (144 votes). Check out past polls.

    • (Sep 10, 2018 at 22:53 UTC) Welcome new users!