Buffer I/O Error on device sda

firejdl · n00b Joined: 09 Apr 2006 Posts: 9

Hello all:

I'm getting some lovely errors with my 300GB SATA drive.

A little back information: My system has been acting a bit weird lately in general. Some programs refuse to run or just hang forever. I had that mostly squared away, but today I was trying to run a bash script and got an error message on /bin/sh...something like permission denied, even when run as root. I tried changing the shebang to /bin/bash, but the same thing. So I just reinstalled bash. There was an update available, anyway, so why not?

Well, first thing I noticed was that my 300GB SATA drive was all of a sudden read only. No remounting or anything, just bam! read only. I rebooted, and all of a sudden couldn't mount the drive. Booting from my liveCD and trying to mount just hangs. So I run fsck.ext3 -v /dev/sda. It's been running for...a good few hours now. And I keep getting this set of error messages popping up:

ata2: command 0x35 timeout, stat 0xd0 host_stat 0x1
ata2: translatd ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata2: status=0xd0 { Busy }
end_request: I/O error, dev sda, sector xxxxxxxx
Buffer I/O error on device sda, logical block yyyyyyyy
lost page write due to I/O error on sda
ATA: abnormal status 0xD0 on port 0xE407
ATA: abnormal status 0xD0 on port 0xE407
ATA: abnormal status 0xD0 on port 0xE407

xxxxxxxx increments by 8 each time
yyyyyyyy increments by 1 each time

Like I said, it's been going for a good few hours now. It's currently on sector 573309072 and block 71663634. Actually, no, it just gave me the same error for sector 0, logical block 0, then 8,1, then 16,2, and so on. Looks like it restarted for some reason. And now it just jumped up to 262144,32768. Okay, enough of the play by play...

I have an MSI K8T Neo-FSR, which has a VIA VT8237 SATA controller. The disk is I believe a WD. About a month or two old. As is my Gentoo 2006.0 install & liveCD. My install was running 2.6.16-gentoo-r4 I believe.

I have a lot of data on that drive - please don't tell me it's dead!

Philantrop · Posted: Thu May 18, 2006 6:33 am Post subject:

The error messages do not indicate a filesystem error so I would not run fsck for now. Do not let it try to repair anything yet because that might really go wrong if the timeouts occur during the repair process.

Do you have a chance to attach the disk to another computer to see if it works there and make a backup?

firejdl · n00b Joined: 09 Apr 2006 Posts: 9

I have the drive in another machine right now, this one with an nvidia SATA chipset (I forgot I had that!). The liveCD recognizes the drive, but when I mount /dev/sda /mnt/gentoo, I get this error message:

EXT3-fs error (device sda): ext3_check_descriptors: Inode table for group 25 not in group (block 786434)!
EXT3-fs: group descriptors corrupted !
mount: wrong fs type, bad option, bad superblock on /dev/sda, missing codepage or other error.
In some cases useful info is found in syslog - try dmesg | tail or so

dmesg only contains the first two lines from above.

This sounds more like a filesystem error, no? Should I try fsck.ext3 on this machine as well? I'll wait for some feedback before I try anything, because I don't want to make things worse.

Thanks a lot!

PS. I have a feeling there are underlying problems with my other motherboard. When I tried to shutdown the liveCD earlier after the fsck, it gave errors on unmounting /proc, /dev, and the liveCD itself...that machine is off until I at least resolve this drive's issues.

Philantrop · Posted: Thu May 18, 2006 8:03 pm Post subject:

Yes, that looks indeed like a filesystem error. It was probably caused by the hardware errors that occurred.

If you have enough disk space somewhere, you might want to try to make a backup:

dd if=<your partition> of=<some file on another disk>

This should work and if all goes wrong during filesystem repair, your data is not totally gone.

First check your partition for the correct type:

fdisk -l /dev/<your hdd>

If the type is correct, go on. Otherwise set it to the correct value (usually 83) and try mounting it again.

If the above stuff doesn't help, I would indeed try an fsck.ext3 now. If it wants to correct the errors, let it do it - you don't have much of a choice anyway until you can mount the drive again.

It might complain about a damaged superblock. In that case, try re-running fsck with the "-b" option to specify an alternative superblock. To find their locations run:

mke2fs -n <your partition>

Don't worry. -n causes mk2fs to NOT do anything but just to say what it *would* do.

If all of this fails, there are still more desperate measures but let's first see what the above does.

I can't guarantee this will help, of course, there's hope now. :)

firejdl · n00b Joined: 09 Apr 2006 Posts: 9

Thanks. I do have another 300GB drive that I can try dding to. Would it be better to use dd or the dd_rescue programs that I've been reading about?

The problem with using fdisk -l is that when I formatted my drive, I did it wrong. I used mkfs.ext3 /dev/sda instead of /dev/sda1. So fdisk -l thinks that it doesn't contain a valid partition table. It's always been like that, though, and it's never caused a problem before.

I'll start running dd/dd_rescue later today; I have some data I need to move off the other 300GB drive before I can. I don't want to do any fsck/mke2fs things until *after* I have this backed up. That way if something goes worse, I at least have where I'm at now...

Thanks again, I appreciate your help.

firejdl · n00b Joined: 09 Apr 2006 Posts: 9

I just tried using dd_rhelp to backup the drive to another 300GB [non SATA] drive, but I started getting more errors on the non-SATA drive! I believe it was an
ata2: status=0x51 { DriveReady SeekComplete Error }

This is in a completely different computer, too.

Is it possible that my Gentoo liveCD is corrupted and is having problems? That's something I can easily fix...

*edit* Oh, and the computer seems to be in constant I/O Wait.

Philantrop · Posted: Fri May 19, 2006 6:16 am Post subject:

There should be a second line below the error you posted. Please post that, too.

I've seen those errors a lot and they often signify DMA problems. You could try disabling DMA (with hdparm -d0) for the backup. For a real system (= no LiveCD) there's a kernel option that *might* help. It's "multi"-something. Ok, found it: CONFIG_IDEDISK_MULTI_MODE

Unfortunately, this is almost definitely not a problem of your Gentoo CD. You could try a Knoppix CD in the hope that its kernel deals more successfully or has that option set. If you decide to try this, provide the following kernel and parameters:

knoppix 2

This boots to a text console only and skips all the stuff you won't need right now anyway.

If this doesn't help either, I (but then I do like living dangerously ;) ) would proceed without a backup. I don't trust a hdd that seems to be failing (the original SATA one) or might be at least.

I'd therefor suggest to give Knoppix a try for the backup and if that fails, go on with repairing the original disk. But that's just what I would do and you *might* loose your data trying. Keep that in mind. It would be unethical not to mention this when giving potentially dangerous advice.

If that data is *really* important and you have the money (over here in Germany it would be at least about EUR 1500), quit right now and give the hdd to a forensic lab to get the data backed up.

firejdl · n00b Joined: 09 Apr 2006 Posts: 9

Okay, good idea. I'm downloading Knoppix now. Thankfully, it's quite fast.

I'm going to try the good old dd_rhelp again with that tomorrow. I'd give you the other line of the error, but that computer is in another room, and it's time for bed. I'll most definitely post it tomorrow if I get it again, though.

Unfortunately, I'm about 1500 EUR short on being able to afford forensics. The data IS pretty important to me, especially since all of my photographs are on that drive. But I happened to copy about 70GB [out of 220GB] of data to another drive for use in another computer the other day, so at least I have that!

Oh, one question. When I run dd_rhelp, should I be sending it to the device [/dev/hdb1 or whatever it is on that machine...], or to an image on that drive's filesystem? Since they're both 300GB drives, would there be enough space on the recieving drive to fit the image? And if I should send it to the device, should I send it to /dev/hdb1 or /dev/hdb since it's /dev/sda and not /dev/sda1?

Philantrop · Posted: Fri May 19, 2006 12:23 pm Post subject:

I couldn't afford forensics either. I just wouldn't have felt comfortable if I hadn't mentioned it. :)

I would put the image into a regular file on another filesystem. As a file you can more easily handle it later on. But that's more of a feeling and belief than hard facts this time. :)

The resulting image *should* fit onto the other device if they're both equal in size but I wouldn't do it that way. If you decide to do it, send it to /dev/hdb (no partition specified) if it's that way on the original disk. Keep as close to the original config as possible if you decide to use devices instead of a regular file.

firejdl · n00b Joined: 09 Apr 2006 Posts: 9

Okay. That'll be the first thing I try when I get home from work. Well, after I burn the Knoppix CD, that is.

Thanks for all your advice! :)

firejdl · n00b Joined: 09 Apr 2006 Posts: 9

It looks like last night's dd_rhelp worked. I tried running fsck -y /dev/hdb [or /dev/hdb1, I can't remember], but the first question is "Abort?" :( So I've been sitting there holding 'y' for the past two hours. I've given up for tonight, because that chair is horrible and it's too hot in that room. Tomorrow I can at least bring out a better chair, when people aren't sleeping.

Most of the errors I was getting were
Inode 139438 has imagic flag set. Clear?
and
Inode #### is in use, but has dtime set. Fix?

There were also some illegal blocks and other errors, but after the first time I saw them I just started holding 'y'.

I'll continue tomorrow. Too bad the KNOPPIX liveCD doesn't recognize the network card in that computer, otherwise I would have done it over ssh....

Philantrop · Posted: Sat May 20, 2006 11:47 am Post subject:

Good luck with repairing. I hope you'll be able to recover your data.