cp to copy only non-corrupt files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting cp to copy only non-corrupt files
# 1  
Old 12-21-2011
cp to copy only non-corrupt files

I don't know if I am asking this correctly, but I have a hard drive with some bad sectors and it appears that some of the data is corrupt. I am having allot of trouble copying the data to a new drive. The issue is not in copying files, but that the new drive to which files are copied is not acting in a stable manner after the files are copied to it. Check disk runs every time I restart, but stops with an error before it finishes. Data on the drive will be good, but after a couple of restarts, the same data will be corrupt and files won't open.

I realize that the problem could be the drive, but it seems more complicated than that. There is only one partition on the drive that is causing problems. There is a second partition on the drive that check disk does not run on.

It would be very helpful if I could confirm that all the files I am copying to the drive are non-corrupted files and skip those that are. I don't know if there is any way to test the files before they are copied. I know that sometimes you can't change permissions on corrupt files, or can't open them, but I don't know how that helps.

Suggestions would be appreciated.

LMHmedchem
# 2  
Old 12-21-2011
Ordinary files don't start malfunctioning just because you're copying them from a dying drive. Their contents may be suspect, but they're not magic; their badness doesn't leak into the filesystem at large. Bad files don't have the power to corrupt good filesystems when copied.

This means, I suspect, you've got bigger problems than a dying drive. Your system itself may be corrupting data somewhere along the line.

My approach to rescuing this would be to remove both drives and install them into a scratch computer. Doesn't have to be a good computer, as long as it can boot a rescue CD of some sort. Then block-copy the old drive onto the new one, raw. This will overwrite all current contents, and it must be equal or greater size. Use dd_rescue if you have it, dd conv=noerror,sync if you don't.

If your drive has bad sectors, they'll stick out during this process, but that can't be helped. dd and dd_rescue will fill in bad blocks with pure zeroes when they can't be read. The resulting blind copy may be good enough to mount and recover data from.

If it didn't have any bad sectors, it probably means it was a good drive but being fed mangled data. Bad RAM perhaps, causing operating system malfunctions?

Only then, once your data isn't in danger of flopping over and dying the more you touch it, should you start playing around with it.

How's it supposed to tell "good" files from "bad" ones, by the way?

Last edited by Corona688; 12-21-2011 at 02:06 AM..
This User Gave Thanks to Corona688 For This Post:
# 3  
Old 12-21-2011
Quote:
Originally Posted by Corona688
Ordinary files don't start malfunctioning just because you're copying them from a dying drive. Their contents may be suspect, but they're not magic; their badness doesn't leak into the filesystem at large. Bad files don't have the power to corrupt good filesystems when copied.
That is more or less what I thought, but since the drive seemed to work alright until I transferred allot of data, I wasn't so sure. In the last effort, I did a low level format, and then copied about 1GB of data onto the drive. Then I restarted and check disk ran. It found some errors, fixed them, and then finished. On subsequent restarts, check disk didn't run, so I thought I was in the clear. I was able to open files and use apps in the data I had copied. Then I copied about 50GB more data and restarted. The same check disk cycle started, but this time it wouldn't finish. After restart, some of the original 1GB of data was corrupted and those apps would fail to run. There are many variables here, so the logical thing to do would be to try to insure that the fault was not in the data being moved. The problem is that the data is on a drive with bad sectors, but it works in the main. Check disk does not run on every start up when that drive is in the machine, which it should if the source file system is really borked.

Quote:
Originally Posted by Corona688
This means, I suspect, you've got bigger problems than a dying drive. Your system itself may be corrupting data somewhere along the line.
This is a problem that is proving difficult to diagnose. There are two other platter drives and an SSD in this box and they are not acting up at all. This leads me to believe that the new drive is just bad. The fact that the drive passes WDs diagnostic software makes that a bit less clear. Memory, the motherboard sata controller, sata cables, power supply, operating system, etc, are all other places where the problem could reside. In most of those cases, I would expect the problem to be more wide spread. I moved the drive off of the motherboard sata controller and onto a brand new PCI sata card in case the controller was going.

Quote:
Originally Posted by Corona688
My approach to rescuing this would be to remove both drives and install them into a scratch computer. Doesn't have to be a good computer, as long as it can boot a rescue CD of some sort. Then block-copy the old drive onto the new one, raw. This will overwrite all current contents, and it must be equal or greater size. Use dd_rescue if you have it, dd conv=noerror,sync if you don't.
Is this something that I could do in windows cygwin, or would a flavor of linux be better. I have Cent and Ubuntu on one computer here. I have some live linux CDs, but the computers I could use those on are number crunching servers that don't have space for hard drives. Another issue is that once I have moved data onto these drives, when I delete the partition, I can't create a new one with a quick format. After this blew up again last night, I deleted the partition on the drive. When I replaced it, windows couldn't format the new partition. The format failed. This happened before and I had to do a low level format to get it back. That takes about 6 hours, so it's not a trivial step.

Quote:
Originally Posted by Corona688
Only then, once your data isn't in danger of flopping over and dying the more you touch it, should you start playing around with it.

How's it supposed to tell "good" files from "bad" ones, by the way?
Yea, I'm not sure. I know that I get OS messages about corrupt files from time to time. I guess you could try to open the file with the default app and that would trigger some exceptions if the file is bad. I guess you could try chown or chmod, I have got some error messages about this not working on files when they may be bad. Anything like that would take forever.

At this point, I am inclined to RMA the drive (I have an open ticket on it) and do the dd_rescue copy with the new drive. What do you think about that?

LMHmedchem
# 4  
Old 12-21-2011
Quote:
Originally Posted by LMHmedchem
In the last effort, I did a low level format
'low level formatting' hasn't been possible anywhere but the factory for decades now. What did you actually do?
Quote:
The same check disk cycle started, but this time it wouldn't finish.
Checking what? The bad drive, or the new one?
Quote:
There are many variables here, so the logical thing to do would be to try to insure that the fault was not in the data being moved.
Only you'd know whether your data's any good. If your application can't tell you, then nobody knows. Application errors can't corrupt a filesystem, though. That takes a hardware or kernel fault. (Checking dmesg may be illuminating.)

And if you're getting data corruption on good disks, something in that server must be malfunctioning, therefore any backups you make using that server are suspect. The longer you keep toying with the original disk in the original machine, the more likely it gets that something worse will happen to your data.
Quote:
Memory, the motherboard sata controller, sata cables, power supply, operating system, etc, are all other places where the problem could reside. In most of those cases, I would expect the problem to be more wide spread.
Does your system have lots of free memory? If yes, most of it's going to be used as disk cache. That makes pretty good odds that disk will be the first thing trashed by a bad spot in RAM, in a highly unpredictable way.
Quote:
I moved the drive off of the motherboard sata controller and onto a brand new PCI sata card in case the controller was going.
Which PCI sata card? It's easy to get a lemon.
Quote:
Is this something that I could do in windows cygwin
That basically means doing it in Windows since Cygwin isn't an operating system. It might technically be possible in windows but there'd be lots of hoops do jump through and proprietary software nobody would know how to help you with.

centos or ubuntu should do.
Quote:
Another issue is that once I have moved data onto these drives, when I delete the partition, I can't create a new one with a quick format. After this blew up again last night, I deleted the partition on the drive. When I replaced it, windows couldn't format the new partition.
I wouldn't reccomend using Microsoft Windows to manage partitions for any system except Microsoft Windows.
Quote:
This happened before and I had to do a low level format to get it back. That takes about 6 hours, so it's not a trivial step.
Again, what do you mean by "low level format"?

The form of backup I'm thinking of wouldn't need partitions on the destination disk at all. It'd just be a raw dump of data from one disk to another, sector by sector, which clones all partition layout in the process.

Quote:
At this point, I am inclined to RMA the drive (I have an open ticket on it) and do the dd_rescue copy with the new drive. What do you think about that?
Um, dd_rescue first, then RMA Smilie You kind of need the drive to make a copy of it.

dd_rescue will also tell you whether you get read errors or not.
# 5  
Old 12-21-2011
ckhdisk.exe is a very basic Microsoft program. Unless you run it manually it is triggered by a crude mechanism which decides whether there were incomplete disc writes.

What Operating System did you use to format this disc? Can we assume that this new disc is formatted NTFS rather than basic FAT? If not, it will not be able to deal with large files.
How did you format the disc? Did you run chkdisk.exe on the new disc before using it?

I too am amazed that you have the equipment for a low-level disc format. You will have needed to enter all bad sectors manually.

Because you have posted on unix.com , we must assume that unix is involved somewhere in this process.
Does the source disc belong to the system on which you are trying to do the copy? If not, where did it come from? What is the format of the source disc and what Operating System and software wrote the files on the disc? What proof do you have that the source disc is corrupt? What did you type when trying to copy the files? What error messages do you get? How big is the largest file (especially if bigger than 2 Gb)?
A detailed hardware and software inventory would help. I wonder if you are fitting modern disc drives to an old computer?

Last edited by methyl; 12-21-2011 at 05:50 PM..
# 6  
Old 12-26-2011
Quote:
Originally Posted by methyl
What Operating System did you use to format this disc?
The format was done under windows XP 32-bit. This is a multi boot box, but this data drive is primarily used for windows data. It does have a second NTFS partition that I share with linux installations in other boot partitions. Check disk never ran on that partition.

Quote:
Originally Posted by methyl
Can we assume that this new disc is formatted NTFS rather than basic FAT? If not, it will not be able to deal with large files.
Yes it was NTFS

Quote:
Originally Posted by methyl
How did you format the disc? Did you run chkdisk.exe on the new disc before using it?
Normally I create partitions using EASEUS partition master (v9.1). I believe that this does a quick format by default. After the drive started acting up, I reverted to creating and formatting partitions with windows disk manager. It is an adequate tool if all you need to do is to create or delete partitions. I tried both long an quick formats. One curious things is that after the drive started acting up, I deleted the partition in windows, but I could not quick format a new partition after creating one. I got an error the the format failed. If I did a long format, it would finish and the drive was usable. That suggests bad sectors that a quick format can't work around, but the WDLD tool doesn't find bad sectors, nor does HDtune free.

I did not run checkdisk on the drive before using it. Is that a standard practice? I guess it makes sense, I try to test most of my other components. I have run the WDLD tool on some of my new drives before but I can't remember if I did it this time.

Quote:
Originally Posted by methyl
I too am amazed that you have the equipment for a low-level disc format. You will have needed to enter all bad sectors manually.
A hardware forum suggested to me that a low level formats may fix issues, especially if there was a problem in the partition tables or MBR. It would also work around bad sectors if possible. I just used a software tool called HDD Low Level Format Tool (HDDGURU: Laptop and Desktop Hard Disk Drives, Tests, Software, Firmware, Tools, Data Recovery, HDD Repair). I don't know that this does much of anything different then the windows long format, but it does remove the MBR. I had to activate the drive in windows after running the tool. It also took like 10 hours to run.

Quote:
Originally Posted by methyl
Because you have posted on unix.com , we must assume that unix is involved somewhere in this process.
I have cygwin installed and make significant use of it, so I do allot of things in bash. I copied the files from the bad source disk to the replacement drive using,

cp -Rfp sourceDir/ destinationDir/ >& sourceDir_copylog.txt

This has been my standard procedure for quite a while. It is much faster than using any windows tool and it doesn't quit if it suddenly runs into a file it can't copy. The redirected stderr and stdout gives a record of any files that couldn't be copied. This box also has ubuntu, cent, scientific, and suse installed, so those are available if there are some native linux techniques to try.

Quote:
Originally Posted by methyl
Does the source disc belong to the system on which you are trying to do the copy? If not, where did it come from? What is the format of the source disc and what Operating System and software wrote the files on the disc?
The source disk is a windows NTFS drive of the same make and model (WDCB 1TB 6gb/s). Partitions on the source disk were created and formatted using EASEUS.

Quote:
Originally Posted by methyl
What proof do you have that the source disc is corrupt? What did you type when trying to copy the files? What error messages do you get?
I run rsync every night to backup the data drive to a backup drive in the same box. About a month ago, I noticed the rsync wasn't finishing and it seemed like the issue was with the destination drive. I ran the WDLD tool on it and it failed the short test. I ran the long test and it said there were bad sectors that it tired to fix. The tool failed trying to fix the bad sectors, so I RMAd the drive. I have an external backup of the same drive, so when the replacement arrived, I used cp to restore my internal backup from my external. The internal backup drive has worked well since.

Recently I noticed that there were some issues with the primary data drive, so I ran WDLD, found errors, and RMAd that drive as well. This was the same bad sectors error that I got on the backup drive.

This new problem arose when trying to copy data onto the replacement for the backup drive when it arrived. I did the same file copy with cp -Rfp as before.

Quote:
Originally Posted by methyl
How big is the largest file (especially if bigger than 2 Gb)? A detailed hardware and software inventory would help. I wonder if you are fitting modern disc drives to an old computer?
I'm not sure how bit the largest file is. I have some linux iso files stored on this drive, and those are pushing 5GB. Those are probably the largest thing I have.

The hardware is as follows,
PSU: CORSAIR CMPSU-750TX 750W
MOBO: GA-EP45T-DS3R f3 BIOS
CPU: Q9550
RAM: 2x2GB DDR3 OCZ3RPR13334GK 1333MHz, 6-6-6-20, 1.75v
GPU: EVGA 896-P3-1257-AR GeForce GTX260 Core 216
SSD-OS1: OCZ VertexII 60GB, WinXP-32bit sp3
HDD-OS2: VelociRaptor 150GB, Ubuntu 10.10 64-bit, CentOS 5.1 64-bit, Suse 12.1 64-bit
HDD-Data: 1TB Western Digital Caviar Black
HDD-Backup: 1TB Western Digital Caviar Black (10GB pagefile partition)

As far as software, I'm not sure what is relevant, but I have cygwin installed for the gnu compilers, Zone Alarm ISS, java JRE, eclipse, MS office, Adobe CS, a bunch of chemistry and statistics tools, and various system tools.

I wouldn't call this an old computer by any means, but it is certainly not the most current hardware either. I found on another forum that there is an issue with using the western digital 6GB/s SATA III drives on a SATA II controller. These are all supposed to be back compatible, but apparently you need to add jumper to restrict the drive to 3GB/s. Others have reported bad sectors popping up over time without the jumper. It would have been nice for WD to advertise that a bit better. I am confident that is what was causing the bad sectors to pop up in the first place. But I put a jumper on the replacement drive that is acting up now and that didn't help.

At this point, I am able to use all of the other drives in this box, so I am inclined to think that the problem is not with the SATA controller. I have also run memtest 86+ and it didn't find anything wrong with the memory. I will run a long prime95 later today to see if my system is generally stable.

At this point, I have the data drive with the bad sectors unplugged and I'm waiting for an RMA of the replacement drive.

Corona688, this is the replacement that I did an RMA on, I still have the original bad sector drive that I'm trying to get the data off of. If I plug that drive it, it works and I can open files and such. I don't know what data on it is affected by the bad sectors, so I've left it un plugged for now. The rest of the computer seems to work fine. I can boot in to the OS, run apps, etc, and the other drives in the box aren't triggering checkdisk.

My plan is to load the bad sector hdd and the RMA replacement drive into a new computer I have and use ubuntu live linux to do dd_rescue to try to recover the data. Then I will boot windows in the new computer and see if the new drive is stable. If it is, I will put it into the suspect machine and see what happens. Hopefully the replacement drive I got was just bad and that is all there is too it.

Did I answer everyone's questions? Sorry for the delay in response, it has been a busy weekend.

LMHmedchem

Last edited by LMHmedchem; 12-26-2011 at 02:39 PM..
# 7  
Old 12-27-2011
I have the new replacement drive in a new computer along with the old drive with the bad sectors. I have ubuntu 10 loaded from a flas stick and I installed ddrescue (the gnu version I think).

I'm not sure how to go about a device to device copy. I believe that the new drive (unformatted, no partitions) is sdc and the old drive with the data is sda. It would be nice to confirm this.

Can someone point me to a tutorial on how to do this or post a list of instructions?

The source disk with the bad sectors is NTFS with two partitions. It seems like it should be something like,

ddrescue -f -n /dev/hda /dev/hdc logfile

The example indicates this is for ext2 partitions, so I don't know if you need to do something else for NTFS.

LMHmedchem

Last edited by LMHmedchem; 12-27-2011 at 10:03 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Copy files from one drive to another, keeping most recently modified files

Hi all, I am a bit of a beginner with shell scripting.. What I want to do is merge two drives, for example moving all data from X to Y. If a file in X doesn't exist in Y, it will be moved there. If a file in X also exists in Y, the most recently modified file will be moved to (or kept) in... (5 Replies)
Discussion started by: apocolapse
5 Replies

2. Solaris

Solaris 10 GIF files corrupt during unzip

Problem occurs on one Solaris build. Every time we unzip the Jan CPU, there are several patches that error out (appears to be related to the GIF files). When we unzip the CPU on another Solaris build to a network storage area, we can execute without issue on the original machine. Any ideas? ... (1 Reply)
Discussion started by: grahamr72
1 Replies

3. Red Hat

Unable to copy files due to many files in directory

I have directory that has some billion file inside , i tried copy some files for specific date but it's always did not respond for long time and did not give any result.. i tried everything with find command and also with xargs.. even this command find . -mtime -2 -print | xargs ls -d did not... (2 Replies)
Discussion started by: before4
2 Replies

4. HP-UX

corrupt disk

Hallo Friends, I have application X running on hpux 11.11 and oracle 9i release 2. I recently had a hardware failure on disk /dev/dsk/c2t0d0 Below is the systemlog file : root@a7dmc:/var/adm/syslog > /opt/resmon/bin/resdata -R 155713541 -r /storage/events/enclosures/gazemon/0_1_1_0.0.0... (11 Replies)
Discussion started by: kekanap
11 Replies

5. Shell Programming and Scripting

how to copy files followed by list of names of all the files in /etc?

....... (2 Replies)
Discussion started by: pcbuilder
2 Replies

6. Solaris

How to safely copy full filesystems with large files (10Gb files)

Hello everyone. Need some help copying a filesystem. The situation is this: I have an oracle DB mounted on /u01 and need to copy it to /u02. /u01 is 500 Gb and /u02 is 300 Gb. The size used on /u01 is 187 Gb. This is running on solaris 9 and both filesystems are UFS. I have tried to do it using:... (14 Replies)
Discussion started by: dragonov7
14 Replies

7. Solaris

How to corrupt a superblock?

I need to corrupt a superblock of a mounted device in a soalris m/c and check recovery from an alternate superblock. How can this be done? (2 Replies)
Discussion started by: sujathan
2 Replies

8. HP-UX

Corrupt Member File

I have been fine adding/removing printers up until this week. Now when I go to add a new remote printer I get "corrupted member file". I go to /etc/lp/member and the byte count on the new printer name is 0. I VI the file and put /dev/null in to make it the correct size and it all looks fine and... (2 Replies)
Discussion started by: astout
2 Replies

9. UNIX for Dummies Questions & Answers

I got a corrupt /etc/inittab file....what next?

Hi guys, For some reason a client has given us a Sun Netra T1 with Solaris 8 to administer for them. That's always good business. However, the other day we rebooted the machine and to our amazement, after doing the preliminary hardware tests, we got an error messgae saying that /etc/inittab was... (3 Replies)
Discussion started by: Ivo
3 Replies

10. UNIX for Dummies Questions & Answers

corrupt or lost data

Thank you livin Free for all your help. We removed a lot of spool files and report files. Which should have freed up some space. But now I think a major problem we have is we have lost or corrupt files which are preventing us from coming up correctly. Can we load or can you copy us a directory... (1 Reply)
Discussion started by: NOT A CLUE
1 Replies
Login or Register to Ask a Question