may a corrupted .gz file be repaired?


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers may a corrupted .gz file be repaired?
# 1  
Old 01-11-2006
may a corrupted .gz file be repaired?

Preparing for a move to a new server, I needed to offload about somewhat over a gigabyte of newsfeeds that my website collects, and that I had been saving on the server. I tarred them and zipped them into about a dozen smaller files of about 150Mb each. All seemed well. I downloaded them onto my Windows pc. The website was moved (http://schema-root.org). My plan was to move them back to the new server, strip them from their rss formats and load the news items into a database. However, in my newbieness I managed to transfer the gzipped files in ascii mode (both directions!). So they won't unzip now, either on my pc, or on the server.

Using:

> gunzip < d200512.tar.gz | tar xvf -

I was able to extract a few percent of the files from the first archive I tried, maybe a hundred of five thousand or so.

My question is: Would it be possible to get rid of the linefeed-carriage returns that were inserted into the zip file by being ftp'ed in ascii mode, back to what they were before I screwed them up? In my innocence, I am imagining that every existing linefeed byte in the original zip file (bytes that happened to be linefeeds) had a carrage return byte added after it during the ftp transfer in ascii mode. And so I am wondering whether there might be some utility somewhere that would strip them back out, and if there were such a utility, whether it would be likely to produce a zip file that could be unzipped.

Otherwise I lose a ton of newsfeeds.

Thanks for any help.
John
# 2  
Old 01-11-2006
You do have programs that convert files from MS-DOS/Windows format to Unix. Check dos2unix on Solaris and dos2ux on HP. However, those are meant to work on ascii files transferred from Dos/Win env to Unix - I dont think that they will be of any help to you with zipped files.

You can of course try using the utilities - best of luck!
# 3  
Old 01-12-2006
dos2unix did not help

Thank you blowtorch for the suggestion. I tried dos2unix on the .gz files, and it did produce smaller files. But they were still unacceptable to gunzip. I guess it would only take one "legitimate" occurance of the byte pair 0D0A in the .gz file to make the approach I am taking not work, because dos2unix would replace it with 0A (I think). I would want that to happen most of the time, but not if the original 0D0A were in the original good .gz file.

I appreciate your help, though. You provided information that I had already wanted to try, but I didn't know the name dos2unix.

Since I can extract a few of my original files from the corrupted .gz files, I will keep them around. At this point I think I need to learn more about how .gz files are organized, and about how gunzip works.

(My files contain several million newsfeed blurbs related to over 8,000 topics. They represent a form of current events history, so I really want them back. But, at the same time, there is no real rush. )

Thanks again for your suggestion.
# 4  
Old 03-19-2008
Lightbulb fixing corrupt gzip files

[This forum posting is two years late, but I'm adding it because this is a frequently asked question and somebody out there may search for it again and find some hope.]

It is very difficult to fix gzip files corrupted by FTP ASCII transfer. The problem is that (on average) 1/256 of the bytes have 3/8 of their bits flipped, and there is no way to distinguish whether one of those CR/LF bytes was supposed to be the way it is, or got that way by the ASCII transfer - so it can't be inverted in any simple way.

So-called zip recovery programs only fix CRC/checksum errors (to avoid error messages), they don't actually fix the data. That's only useful if the file is truncated or has a bad block late in the compressed data. It doesn't work for which about 1/256 of their bytes are corrupted, throughout the file.

So for 99% of the cases, give up, find the original if you can, and re-transfer in binary mode.

Despite what I said above, there is a computationally expensive way to search for the necessary repair. However, it requires custom coding of a search heuristic, and lots of computation. Therefore it's not a turn-key process. It's only feasible if you have no other backup and the data is so critical that you're willing to invest in some custom coding. I documented some details on what it took to recover a pile of 20MB gzip files containing 250MB million-line web server logs, at my web site.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

After Ftp'ing file to destination how to check the file if it is in correct ASCII and not corrupted

Hi Folks, While transferring file from FTP software like Filezilla the files gets corrupted. Is there any way I can check if the recently transferred file is in ASCII and not corrupted. I have tried using file -i filename command which does tell if the file character set is ASCII or binary... (6 Replies)
Discussion started by: Khan28
6 Replies

2. Web Development

Mysql table is marked as crashed and should be repaired

140312 13:43:54 /usr/libexec/mysqld: Table './***/phpbb_posts' is marked as crashed and should be repaired Its mysqld.log in var/log alot of messages, but before around hour i tried to "repaid table" from within phpmyadmin, but appears it has no effect.. why? How to fix? (1 Reply)
Discussion started by: postcd
1 Replies

3. Red Hat

.bash_profile file corrupted

Hi, Unexpectedly i entered wrong entries in .bash_profile for my user which has administrative permissions. So, i am getting errors for every command. I dont have backup file also, so any body can help me how to recover it. Regards, Mastan (7 Replies)
Discussion started by: mastansaheb
7 Replies

4. Solaris

WTMPX File corrupted

Hi All I work on solaris 8, 9 and 10 platforms and have encountered an error which is my wtmpx files appear to be corrupted as all entries contain the date 1970 (the birth of unix). Now this is obviously not the case, so my query is: 1 - Can the existing wtmpx files be manipulated to... (6 Replies)
Discussion started by: drestarr96
6 Replies

5. HP-UX

Could be a corrupted file?

Hello! Do you know the meaning of... "crw-rw---- 1 informix informix 64 0x020001 Jan 21 2004 rifxroot" I don't know what the first "c" means. Furthermor, if I try to copy this file (rifxroot) it appears a message: "cp: cannot open rifxroot: No such device or address" I don't... (1 Reply)
Discussion started by: kaugrs
1 Replies

6. UNIX for Dummies Questions & Answers

Help! passwd file corrupted

Hi, I am new to UNIX, and have recently installed Suse 9.3. I have been experimenting with all of the commands and have somehow managed to modify the default shell of the root user to an invalid file. Consequently I cannot su to the root user as I receive the 'no such file or directory' error... (2 Replies)
Discussion started by: Tony Montana
2 Replies

7. HP-UX

passwd file corrupted

Good Day Our HP box was hacked and the passwd file has been altered,there are only 2 user accounts active,and these dont have any administrative rights.I need to edit the passwd file to correct the su and root entries. Does any body have any suggestions as to how i can do this with out the root... (10 Replies)
Discussion started by: cantona7
10 Replies

8. Shell Programming and Scripting

detecting corrupted file

Hello, Newbie question: How can I detect a corrupted file from a script (ksh)? Thank you, Martin (6 Replies)
Discussion started by: starless
6 Replies

9. UNIX for Dummies Questions & Answers

corrupted tar file

Hi all, I've got a corrupted tar file with some filename being like ?a=n is it possible to get them repaired? (1 Reply)
Discussion started by: klintsovi
1 Replies
Login or Register to Ask a Question