Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[solved] corrupt files
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
dtjohnst
Apprentice
Apprentice


Joined: 23 Apr 2006
Posts: 178

PostPosted: Thu Nov 26, 2009 8:53 pm    Post subject: [solved] corrupt files Reply with quote

I have a strange problem. I currently have Gentoo installed on /dev/sdc. I want to install a new copy on a RAID1. So I emerged mdadm, partitioned sda and sdd (I used type fd), then created my arrays and filesystems. I then mounted them and downloaded the stage3. When I went to untar it, I received a bzip2 error about an incomplete file. So I redownloaded it and checked the md5sum aginst the digest and it failed. So I redownloaded from a different mirror, same problem.

Step 2, I downloaded the files onto a USB key on another machine to try and isolate the problem. The files passed the md5sum check. So I popped the USB key into my server (the one I'm trying to install on) and cp'd the files over. During the untar, I got the bzip2 error again. So I checked the md5sum on the usb key, and suddenly it failed as well. So I thought I must have screwed up somewhere. So I put the USB key back in the desktop, redownloaded the files, and verified the md5. Came back good. Moved the USB key to the server, checked the md5, passed. Copied them to my new md root, checked the md5, failed. Checked the md5 on the USB key, failed.

So somehow, copying the file corrupts it on the original source, which doesn't make sense to me. I've tried this with my root filesystem as ext3, ext4 and xfs with the same result. Is it possible there's something wrong with mdadm? Should I re-emerge it? Or is there something else I'm missing?


Last edited by dtjohnst on Thu Dec 03, 2009 1:23 am; edited 1 time in total
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54831
Location: 56N 3W

PostPosted: Thu Nov 26, 2009 8:58 pm    Post subject: Reply with quote

dtjohnst,

I suspect a hardware error. Try memtest from the liveCD for a few hours.
If memtest reports errors, it does not aways mean its a RAM issue. The tests use your CPU, RAM and motherboard.

A hardware error could cause unexpected writes.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
dtjohnst
Apprentice
Apprentice


Joined: 23 Apr 2006
Posts: 178

PostPosted: Thu Nov 26, 2009 9:19 pm    Post subject: Reply with quote

I had a previous problem and had run memtest for about 4 hours with no errors reported. Turned out to be a power problem, which is why the RAM didn't report errors I don't think. I was only about 10W shy.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54831
Location: 56N 3W

PostPosted: Thu Nov 26, 2009 9:32 pm    Post subject: Reply with quote

dtjohnst,

memtest does not thrash your PSU ans I presume this error is not related.

Run memtest again
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
LesCoke
n00b
n00b


Joined: 02 Jun 2008
Posts: 48
Location: Denton, Tx

PostPosted: Fri Nov 27, 2009 10:02 am    Post subject: Reply with quote

I second the ram. I had a system that would run memtest86(+) for many days without failure. But as soon as I started manipulating files larger than about 20 MB, errors would occur (detected file differences using md5sum). To this day, I keep md5 / sha1 hashes of all my large archived media files because of that experience. The problem would only occur when the ram was being hit using dma. I finally swapped out the memory sticks one at a time to identify which one it was (fortunately I had identical spares).

I wouldn't rule out the hard drive either. Use smartctl (smartmontools) to verify the drive(s) are not creating pending / bad sectors. I always burn in a new drive writing and verifying several patterns before putting it into service. (shred will simply write patterns, but does not verify, I use a custom script that uses badblocks to write and verify).

Your problem, if it hasn't been isolated to the disk subsystem, could be network card related. I had an old SoHo Macronix card that would suddenly drop connections every time I used scp to transfer a large file, but it would stay up with a plain ssh terminal session. Swapped the card with another and the problem went away.

Easiest way to troubleshoot such a problem is to find a repeatable test case that generates the errors, and swap hardware one thing at a time until you find the problem component.

It is strange that you only notice the problem once the raid array was established. It is also strange that your USB key /thumb drive would get corrupted during the copy. You didn't say if you rechecked the md5sum on the other machine after seeing the failure on the server before downloading a new copy. With buffered file-systems and plenty of RAM, a file will not necessarily be re-read from the actual disk, the copy still buffered in RAM will be used to satisfy the repeated reads. This brings me back to a RAM problem. Memtest keeps the RAM busy during it's tests, but a running but idle system will have most of the code and data that is needed by the idle process in cache leaving the RAM largely idle; This is when weak RAM bits will rear their heads.

Les


Last edited by LesCoke on Fri Nov 27, 2009 10:05 am; edited 1 time in total
Back to top
View user's profile Send private message
qubix
Tux's lil' helper
Tux's lil' helper


Joined: 22 Sep 2003
Posts: 146
Location: Warsaw/Poland

PostPosted: Fri Nov 27, 2009 10:05 am    Post subject: Reply with quote

try to put a file at least 2 times larger than the amount of ram that you have on your box on the drive you suspect to have problems - even on each one of them. Than do md5sum of the same file a number of times one after another. If you will get different results each run with no complaints in dmesg, that means that your mainboard might be toast.

It's important for your file to be bigger than ram. If it's smaller, the kernel will cache it and every run of md5sum will use the memory not the data from the drive.

for confirmation do the same on the same machine but use USB storage.

Do you have fujitsu servers? I've had that problem often there.... each box was replaced under warranty.

btw. did you check dmesg for strange messages?
_________________
qubix
Back to top
View user's profile Send private message
dtjohnst
Apprentice
Apprentice


Joined: 23 Apr 2006
Posts: 178

PostPosted: Thu Dec 03, 2009 1:23 am    Post subject: Reply with quote

Sorry it took me so long to reply. Things have been a little hectic.

As you said NeddySeagoon, the PSU shouldn't be affected by a memtest, which is why I assumed if it passed memtest then, it would pass memtest now. Afterall, I only had power problems if my PC tried to access 2 drives at the same time with a USB stick, keyboard and mouse plugged in (I imagine I was only a few W shy). However, when I reran the memtest, it did fail this time. Last time I ran it for 4 hours without an issue. This time my PC reset after 2. I relaxed the timings a bit and then it failed after about 20 mins. I relaxed the timings a bit more and it failed after about 3 mins. At that point the system failed to POST. I tried several timings including BIOS defaults to no effect. I tried both sticks of RAM with the same result. So I ordered a different manufacturer and it works fine now. POSTS fine, memetest ran for 8 hours without an hour (I left it overnight) and without rebooting or crashing, and my files no longer report corrupt. Thanks for your help.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum