![]() |
|
Think about Loose Coupling | |
PerlMonks |
Re^3: Have you ever lost your work? (disaster recovery)by afoken (Chancellor) |
on Jan 09, 2024 at 22:53 UTC ( [id://11156824]=note: print w/replies, xml ) | Need Help?? |
if you haven't tested your backup/restore procedure, then it's a little like Schroedinger's Cat. You don't know if you have a backup .. until you actually successfully do a restore. Hey, it's war story time again! ;-) On the last few days of my final year at university, a student-managed little server in my favorite lab had lost a lot of data. I don't remember the exact details, I think it lost an entire harddisk. The server was an old tower PC, build around something like a Pentium-II, with no redundancy at all, all consumer parts, no server parts, filled with old harddisks, and a big fan tied to the front of the case with old wires. I guess all of its parts were picked out of the dumpster. It ran Linux, probably an early version of Debian, and it had a SCSI tape streamer. Actually, two streamers, one online, one "offline" in the spare parts bin. Someone has set up a cron job to use tar to write a backup to tape. Great idea, that's what tar was designed for. One of the students must have swapped the tapes each morning. Larger disks were added, and some day, the tape was full. Backup failed. Some "clever" guy must have found tar's -z option to compress data using gzip, and added that option to the cron job. Backup worked again, tapes had some room again. Nobody verified or tested the backup. Then, data was lost. Restoring the backups failed. The tapes were worn out and had several read errors, streamers were dirty as hell. tar can handle tapes with errors. It uses fixed-size blocks, and if a block is not readable, it can at least find the next file on tape and continue from there. That way, you won't get all of your data back, but probably a lot of it. Remember the -z option? The cron job wrote a gzip compressed byte stream to the tapes. No more fixed blocks, and gzip absolutely does not like I/O errors while decompressing a compressed data stream. All tape-handling advantages of tar were lost. In the end, I had a lot of free time that day, and so I could help recovering data from the tape. We found another large, empty harddisk, and used something like dd if=/dev/tape conv=noerror of=/mnt/tmpdisk/backup.tar.gz to get a damaged, but readable compressed tape archive. It could be decompressed, at least partially, and tar was then able to extract a lot of files. Swapping the streamers allowed to read some more data from the current tape. The other tape could also be read partially, and a few more, but older files were recovered. I left sorting out old and new, damaged and sane files and copying them back to the replacement disk to the admin, and told him to fix some things:
In the end, a lot of data was recovered, some from the tapes, some from student PCs in the lab, some from some old disks in the junk bin. But a lot was lost. Alexander
-- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
In Section
Meditations
|
|