View previous topic :: View next topic |
Author |
Message |
big_gie Apprentice
Joined: 31 Aug 2004 Posts: 158
|
Posted: Tue Apr 05, 2011 8:08 pm Post subject: XFS corruption |
|
|
Hi all,
I had a sytem failure running on a hardware raid1 for the os and raid60 for our simulations data.
Booting from systemrescuecd, I was able to recover the os (even though it does not boot anymore, grub doesn't even show up).
But then I'm trying to recover the raid60 data, or at least part of it. First, I tried mouting the single xfs partion read-only but got a strange error. The error show-up in dmesg and it looked like the kernel crashed. To prevent a kernel problem to further affect the data, I rebooted the livecd before checking the filesystem. But I can't check it:
Quote: | # xfs_check /dev/sdb1
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed. Mount the filesystem to replay the log, and unmount it before
re-running xfs_check. If you are unable to mount the filesystem, then use
the xfs_repair -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this. |
Trying to mount the filesystem gave (probably) the same error as before, this time I saved it. Here it is:
Quote: |
# dmesg > dmesg1.txt
# mount -o ro /dev/sdb1 raid/
mount: Structure needs cleaning
# dmesg > dmesg2.txt
# diff dmesg1.txt dmesg2.txt
1098a1099,1128
> XFS mounting filesystem sdb1
> Starting XFS recovery on filesystem: sdb1 (logdev: internal)
> Filesystem "sdb1": XFS internal error xlog_valid_rec_header(1) at line 3428 of file fs/xfs/xfs_log_recover.c. Caller 0xffffffff812e39cc
>
> Pid: 4125, comm: mount Not tainted 2.6.35-std164-amd64 #2
> Call Trace:
> [<ffffffff812d0772>] xfs_error_report+0x3c/0x3e
> [<ffffffff812e39cc>] ? xlog_do_recovery_pass+0x1b5/0x5ee
> [<ffffffff812e0acf>] xlog_valid_rec_header+0xcb/0xd2
> [<ffffffff812e39cc>] xlog_do_recovery_pass+0x1b5/0x5ee
> [<ffffffff812e3e41>] xlog_do_log_recovery+0x3c/0x75
> [<ffffffff812e3e8d>] xlog_do_recover+0x13/0xd8
> [<ffffffff812e3fce>] xlog_recover+0x7c/0x8a
> [<ffffffff812de5d9>] xfs_log_mount+0xd7/0x143
> [<ffffffff812e68ec>] xfs_mountfs+0x310/0x61c
> [<ffffffff812eefd1>] ? kmem_zalloc+0x11/0x2c
> [<ffffffff812e722a>] ? xfs_mru_cache_create+0x117/0x147
> [<ffffffff812f9b6a>] xfs_fs_fill_super+0x1f8/0x372
> [<ffffffff811052b8>] get_sb_bdev+0x137/0x19a
> [<ffffffff812f9972>] ? xfs_fs_fill_super+0x0/0x372
> [<ffffffff812f7c00>] xfs_fs_get_sb+0x13/0x15
> [<ffffffff81104969>] vfs_kern_mount+0xb8/0x1a2
> [<ffffffff81104ab1>] do_kern_mount+0x48/0xe8
> [<ffffffff81119fe8>] do_mount+0x73c/0x7b2
> [<ffffffff810d1cff>] ? copy_from_user+0x3c/0x44
> [<ffffffff810d24e6>] ? strndup_user+0x58/0x82
> [<ffffffff8113876e>] compat_sys_mount+0x262/0x29c
> [<ffffffff81033fd3>] ia32_sysret+0x0/0x5
> XFS: log mount/recovery failed: error 117
> XFS: log mount failed
|
What's wrong? Is there a bug in the kernel, or is it just its own way of telling me the filesystem is really broken?
Could mouting with "-o ro,norecovery" cause more trouble? There is a couple of files I really need to restore, but don't want to break things even more for not much.
Thanks for your help! |
|
Back to top |
|
|
madchaz l33t
Joined: 01 Jul 2003 Posts: 995 Location: Quebec, Canada
|
Posted: Tue Apr 05, 2011 8:19 pm Post subject: |
|
|
I would suggest you make a copy of your disk to offline media first (use DD)
that way, you can always go back.
However, as long as you are read-only, it "should" be ok, if it works.
You might want to look at the mount.xfs options to force the journal replay as well. (Again, do a backup first)
And when this is all over, I strongly suggest you schedule daily backups _________________ Someone asked me once if I suffered from mental illness. I told him I enjoyed every second of it. |
|
Back to top |
|
|
big_gie Apprentice
Joined: 31 Aug 2004 Posts: 158
|
Posted: Tue Apr 05, 2011 8:22 pm Post subject: |
|
|
Thanks for the suggestion.
This is exactly what I've done for the OS partitions. Unfortunately, I just can't do this, at least yet. The XFS partition is... 40TB. |
|
Back to top |
|
|
madchaz l33t
Joined: 01 Jul 2003 Posts: 995 Location: Quebec, Canada
|
Posted: Tue Apr 05, 2011 8:25 pm Post subject: |
|
|
Yikes. And no backup? You like living dangerously. _________________ Someone asked me once if I suffered from mental illness. I told him I enjoyed every second of it. |
|
Back to top |
|
|
big_gie Apprentice
Joined: 31 Aug 2004 Posts: 158
|
Posted: Tue Apr 05, 2011 8:31 pm Post subject: |
|
|
No, no backup...
We hoped we would be fast enough to change bad drives so the RAID could be rebuilt. But we suspect the controller to be bad... There was around 75% of the drives that failed inside 20 minutes, which crashed the os and made things a lot more complicated then what they should have been...
Backing up 40TB is quite hard. We'll probably explore the different possibilities after this
For now, it's only a couple hunder megs that I need to salvage. Everything else is luxury. |
|
Back to top |
|
|
madchaz l33t
Joined: 01 Jul 2003 Posts: 995 Location: Quebec, Canada
|
Posted: Thu Apr 07, 2011 12:50 pm Post subject: |
|
|
I suggest you take a good look at different differential backup solutions _________________ Someone asked me once if I suffered from mental illness. I told him I enjoyed every second of it. |
|
Back to top |
|
|
trippels Tux's lil' helper
Joined: 24 Nov 2010 Posts: 137 Location: Berlin
|
Posted: Thu Apr 07, 2011 7:25 pm Post subject: |
|
|
In your case I would try the "-L option" of xfs_repair.
You may lose a few seconds of data from shortly before the system failure,
because it will zero the file system log. But most of the data should
still be OK.
You could also contact the friendly xfs developers on their mailing list
and describe your problem @:
linux-xfs@oss.sgi.com |
|
Back to top |
|
|
|