Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Corruption in reiserfs partition
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
jorgepeixoto
Apprentice
Apprentice


Joined: 27 Apr 2006
Posts: 218
Location: São José dos Campos, São Paulo, Brasil

PostPosted: Thu Sep 06, 2007 5:06 pm    Post subject: Corruption in reiserfs partition Reply with quote

Hi. My computer crashed (probably due to hardware problems in my TV capture card) and when it was rebooting, it started the automatic fsck. Since the fsck was taking too long, and I was in a hurry to use the computer, I rebooted it and booted from another partition. AFAIK it is not dangerous to interrupt a fsck, unless of course you are in the middle of an operation like --rebuild-tree. The fsck was checking, not fixing.

From this other parition, I issued fsck and got:

Code:


Will read-only check consistency of the filesystem on /dev/hda2
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
###########
reiserfsck --check started at Thu Sep  6 15:46:23 2007
###########
Replaying journal..
Reiserfs journal '/dev/hda2' in blocks [18..8211]: 0 transactions replayed
Checking internal tree../  2 (of   3)/ 61 (of 114)/ 91 (of 159)block 11173978: The number of items (8) is incorrect, should be (7)
 the problem in the internal node occured (11173978), whole subtree is skipped
finished
Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs.
Bad nodes were found, Semantic pass skipped
1 found corruptions can be fixed only when running with --rebuild-tree
###########
reiserfsck finished at Thu Sep  6 15:49:37 2007
###########



This partition is 79GB out ot a 80GB HD; a full backup is difficult.
I have already mounted it once and read a file from it. I saw no problem with the partition.

So I don't know if I:

1) Should simply try to use the partion even with this corruption (i.e. pretend the corruption does not exist); after backing up the most important files
2) Should use the --rebuild-tree. Is it really dangerous? Is it really likely to destroy my partition?

Within each scenario (1 and 2), how likely is the disaster (loss of the partition)? What if I back up only the most most important files and accept the risk of losing my movies, music and iso's (such risk is acceptable if it is small, since the files could be obtained again)?
Back to top
View user's profile Send private message
Sadako
Advocate
Advocate


Joined: 05 Aug 2004
Posts: 3792
Location: sleeping in the bathtub

PostPosted: Thu Sep 06, 2007 5:33 pm    Post subject: Reply with quote

All I can tell you is that I've used --rebuild-tree more than once, and have never incountered any problems with it (but I have heard about some horror stories when using it).

I'd backup whatever you can and then use --rebuild-tree, I wouldn't even consider continued use of the filesystem in it's current state an option at all.
_________________
"You have to invite me in"
Back to top
View user's profile Send private message
Habbit
Apprentice
Apprentice


Joined: 01 Sep 2007
Posts: 237
Location: 3.7137 W, 40.3873 N

PostPosted: Thu Sep 06, 2007 5:34 pm    Post subject: Reply with quote

ReiserFS is not precisely known for its armouring against on-disk corruptions, so if you have anything like that, it is extremely probable that you will lose some data. If you try to use the partition as it is, you risk that some day, reiser will just refuse to mount the partition at all, locking you away from your data. However, fsck --rebuild-tree is a very risky operation too, as the on-disk format of ReiserFS has nearly zero redundancy and thus, no backups. The whole B+ tree is rebuilt from scratch: if a leaf node is rendered unreadable, all its files will be lost and the filesystem will be nonethewiser. The result can be better if a non-leaf node is lost, as the nodes pointed to by it will be found and reassigned to another parent into the tree, but the risk is high nevertheless.
_________________
Code:
~ $ objdump -d ./habbit_mind
90      xchg %rax, %rax
EB FD   jmp $-3
Back to top
View user's profile Send private message
jorgepeixoto
Apprentice
Apprentice


Joined: 27 Apr 2006
Posts: 218
Location: São José dos Campos, São Paulo, Brasil

PostPosted: Thu Sep 06, 2007 5:57 pm    Post subject: Reply with quote

Will the rebuild tree at least tell me if the operation was not perfect? Or will I be left with the doubt that maybe there is some file out there that disapeared, which I may need later?
Back to top
View user's profile Send private message
jorgepeixoto
Apprentice
Apprentice


Joined: 27 Apr 2006
Posts: 218
Location: São José dos Campos, São Paulo, Brasil

PostPosted: Thu Sep 06, 2007 8:53 pm    Post subject: Reply with quote

Is there a way to backup all the metadata? Since I can't backup the whole disk, I could backup the most important data, and all the metadata. Is that possible?

PS: What I refer as metadata is all the information that reiserfs stores about the disk, minus the actual data in the files. That is, file names, directory structure, where in the disk a file is located... I'm not sure the term "metadata" is accurate here, since I normally see people using this term to refer to file permissions and things like that, but here I mean not only file permissions, but also lower level stuff such as where in the media is a certain file located.


Last edited by jorgepeixoto on Thu Sep 13, 2007 5:58 am; edited 1 time in total
Back to top
View user's profile Send private message
darkphader
Veteran
Veteran


Joined: 09 May 2002
Posts: 1217
Location: Motown

PostPosted: Thu Sep 06, 2007 9:02 pm    Post subject: Reply with quote

The only time --rebuild-tree didn't work for me was when the drive itself was defective.

Chris
_________________
WYSIWYG - What You See Is What You Grep
Back to top
View user's profile Send private message
Habbit
Apprentice
Apprentice


Joined: 01 Sep 2007
Posts: 237
Location: 3.7137 W, 40.3873 N

PostPosted: Thu Sep 06, 2007 11:20 pm    Post subject: Reply with quote

darkphader wrote:
The only time --rebuild-tree didn't work for me was when the drive itself was defective.

Indeed. That is the only case in which --rebuild-tree will fail to recover everything on the disk, as the whole partition is scanned for ReiserFS metadata block by block. That is the reason why we are told not to store an unmodified image of a ReiserFS volume inside another ReiserFS filesystem: should fsck --rebuild-tree be run against it, the metadata, files and directories could be suddenly linked into the main FS.
_________________
Code:
~ $ objdump -d ./habbit_mind
90      xchg %rax, %rax
EB FD   jmp $-3
Back to top
View user's profile Send private message
jorgepeixoto
Apprentice
Apprentice


Joined: 27 Apr 2006
Posts: 218
Location: São José dos Campos, São Paulo, Brasil

PostPosted: Fri Sep 07, 2007 3:29 pm    Post subject: Results Reply with quote

I got problems: see below.
I have saved parts of my home directory and the /etc directory to tarballs.
I have saved the output of the find command.

Then I issued reiserfsck --rebuild-tree.

after that I issued find again:

Code:

jorge@jorge:/media/hdc1/backup$ diff findoutputbefore findoutputafter
223045a223046,223054
> /media/hda2/usr/portage/dev-java/poi
> /media/hda2/usr/portage/dev-java/poi/poi-3.0.1-r1.ebuild
> /media/hda2/usr/portage/dev-java/poi/Manifest
> /media/hda2/usr/portage/dev-java/poi/files
> /media/hda2/usr/portage/dev-java/poi/files/poi-3.0.1-src-isDateFormat.patch
> /media/hda2/usr/portage/dev-java/poi/files/poi-2.5-jikes-fix.patch
> /media/hda2/usr/portage/dev-java/poi/poi-2.5.1-r1.ebuild
> /media/hda2/usr/portage/dev-java/poi/ChangeLog
> /media/hda2/usr/portage/dev-java/poi/metadata.xml
269207d269215
< /media/hda2/home/jorge/ita/catalogo.pdf
289345,289358d289352
< /media/hda2/home/jorge/.claws-mail/actionsrc
< /media/hda2/home/jorge/.claws-mail/tempfolder
< /media/hda2/home/jorge/.claws-mail/tempfolder/processing
< /media/hda2/home/jorge/.claws-mail/tempfolder/processing/core
< /media/hda2/home/jorge/.claws-mail/tempfolder/processing/.claws_mark
< /media/hda2/home/jorge/.claws-mail/tempfolder/processing/.claws_cache
< /media/hda2/home/jorge/.claws-mail/tempfolder/processing/.mh_sequences
< /media/hda2/home/jorge/.claws-mail/tempfolder/processing/IMG0054B.jpg
< /media/hda2/home/jorge/.claws-mail/tempfolder/processing/IMG_3525.JPG
< /media/hda2/home/jorge/.claws-mail/command_history
< /media/hda2/home/jorge/.claws-mail/actionsrc.bak
< /media/hda2/home/jorge/.claws-mail/newscache
< /media/hda2/home/jorge/.claws-mail/clawsrc
< /media/hda2/home/jorge/.claws-mail/templates
325824a325819,325871
> /media/hda2/lost+found
> /media/hda2/lost+found/2257391_11
> /media/hda2/lost+found/2257391_16
> /media/hda2/lost+found/2257391_19
> /media/hda2/lost+found/2257391_20
> /media/hda2/lost+found/2257391_22
> /media/hda2/lost+found/2257391_23
> /media/hda2/lost+found/2257391_24
> /media/hda2/lost+found/2257391_25
> /media/hda2/lost+found/2257391_26
> /media/hda2/lost+found/2257391_29
> /media/hda2/lost+found/2257391_30
> /media/hda2/lost+found/2257391_31
> /media/hda2/lost+found/2257391_32
> /media/hda2/lost+found/2257391_33
> /media/hda2/lost+found/2257391_36
> /media/hda2/lost+found/2257391_38
> /media/hda2/lost+found/2257391_39
> /media/hda2/lost+found/2257391_40
> /media/hda2/lost+found/2257391_41
> /media/hda2/lost+found/2257391_42
> /media/hda2/lost+found/2257391_49
> /media/hda2/lost+found/2257391_51
> /media/hda2/lost+found/2257391_53
> /media/hda2/lost+found/2257391_54
> /media/hda2/lost+found/2257391_55
> /media/hda2/lost+found/2257391_56
> /media/hda2/lost+found/2257391_58
> /media/hda2/lost+found/2257391_59
> /media/hda2/lost+found/2257391_60
> /media/hda2/lost+found/2257391_61
> /media/hda2/lost+found/2257391_62
> /media/hda2/lost+found/2257391_63
> /media/hda2/lost+found/2257391_66
> /media/hda2/lost+found/2257391_67
> /media/hda2/lost+found/2257391_68
> /media/hda2/lost+found/2257391_69
> /media/hda2/lost+found/2257391_71
> /media/hda2/lost+found/2257391_78
> /media/hda2/lost+found/2257391_80
> /media/hda2/lost+found/9076_22978
> /media/hda2/lost+found/9076_41274
> /media/hda2/lost+found/328283_355297
> /media/hda2/lost+found/9076_106368
> /media/hda2/lost+found/9076_1016807
> /media/hda2/lost+found/9076_1250027
> /media/hda2/lost+found/9076_1250027/processing
> /media/hda2/lost+found/9076_1250027/processing/core
> /media/hda2/lost+found/9076_1250027/processing/.claws_mark
> /media/hda2/lost+found/9076_1250027/processing/.claws_cache
> /media/hda2/lost+found/9076_1250027/processing/.mh_sequences
> /media/hda2/lost+found/9076_1250027/processing/IMG0054B.jpg
> /media/hda2/lost+found/9076_1250027/processing/IMG_3525.JPG


I looked into some of the files in lost+found and it looks like part of them are from Firefox cache. Perhaps the reiserfsck undeleted some files from Firefoxcache?
But what really matters is that only the catalogo.pdf and some files in .claws-mail are missing. So far so good. But some files are corrupted, see below.

When I issued reiserfsck --rebuild-tree, I redirected the stdout to a file. This file is 1.8 MB after compressed with gzip, so I'm not pasting here.
However, the information that went to my screen (presumably from stderr) is not that big and I can post here:

Code:


jorge@jorge:/media/hdc1/backup$ sudo reiserfsck --rebuild-tree /dev/hda2 > reiserfsckoutput
reiserfsck 3.6.19 (2003 www.namesys.com)

*************************************************************
** Do not  run  the  program  with  --rebuild-tree  unless **
** something is broken and MAKE A BACKUP  before using it. **
** If you have bad sectors on a drive  it is usually a bad **
** idea to continue using it. Then you probably should get **
** a working hard drive, copy the file system from the bad **
** drive  to the good one -- dd_rescue is  a good tool for **
** that -- and only then run this program.                 **
** If you are using the latest reiserfsprogs and  it fails **
** please  email bug reports to reiserfs-list@namesys.com, **
** providing  as  much  information  as  possible --  your **
** hardware,  kernel,  patches,  settings,  all reiserfsck **
** messages  (including version),  the reiserfsck logfile, **
** check  the  syslog file  for  any  related information. **
** If you would like advice on using this program, support **
** is available  for $25 at  www.namesys.com/support.html. **
*************************************************************

Will rebuild the filesystem (/dev/hda2) tree
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
Replaying journal..
Reiserfs journal '/dev/hda2' in blocks [18..8211]: 0 transactions replayed
###########
reiserfsck --rebuild-tree started at Thu Sep  6 17:50:25 2007
###########

Pass 0:
Loading on-disk bitmap .. ok, 12166329 blocks marked used
Skipping 8799 blocks (super block, journal, bitmaps) 12157530 blocks will be read
0%....20%....40%....60%....80%....100%                       left 0, 10355 /sec
        "r5" hash is selected
Flushing..finished
        Read blocks (but not data blocks) 12157530
                Leaves among those 50333
                        - leaves all contents of which could not be saved and deleted 1
                Objectids found 325612

Pass 1 (will try to insert 50332 leaves):
Looking for allocable blocks .. finished
0%....20%....40%....60%....80%....100%                         left 0, 384 /sec
Flushing..finished
        50332 leaves read
                49716 inserted
                        - pointers in indirect items pointing to metadata 91 (zeroed)
                616 not inserted
        non-unique pointers in indirect items (zeroed) 2528

Pass 2:
0%....20%....40%....60%....80%....100%                         left 0, 616 /sec
Flushing..finished
        Leaves inserted item by item 616
Pass 3 (semantic):
Flushing..finished
        Files found: 269299
        Directories found: 44436
        Symlinks found: 6541
        Others: 5139
        Broken (of files/symlinks/others): 1
        Names pointing to nowhere (removed): 2
Pass 3a (looking for lost dir/files):
Looking for lost directories:
Looking for lost files:2 /sec
Flushing..finishede 0, 0 /sec
        Objects without names 45
        Empty lost dirs removed 4
        Dirs linked to /lost+found: 1
        Files linked to /lost+found 44
        Objects having used objectids: 39
                files fixed 39
Pass 4 - finished done 27951, 681 /sec
        Deleted unreachable items 15
Flushing..finished
Syncing..finished
###########
reiserfsck finished at Thu Sep  6 18:13:45 2007
###########



I have already tried booting into the computer, and it got to the xdm screen. but in the boot process I saw some error messages. It looks like some files like /bin/rm and /bin/true were corrupted.

1) How can I know if other files were corrupted?
2) Do you think I should just use the computer, after reemerging the packages that provide the corrupted files? Or should I reinstall the system from scratch?
3) Should I use the personal data on the partition, or should I restore my backup?
Back to top
View user's profile Send private message
jorgepeixoto
Apprentice
Apprentice


Joined: 27 Apr 2006
Posts: 218
Location: São José dos Campos, São Paulo, Brasil

PostPosted: Thu Sep 13, 2007 6:16 am    Post subject: Reply with quote

Because of the forums outage, I asked for help on the gentoo-users mailing list. A guy named Volker Armin Hemmann thinks that I can use the partition; there will be no more filesystem damage, and once I fix the corrupted files, the system will be OK. Do you agree with him?
Also, it seems to me that there is no more I can do to save data from this partition. The zeroed files *are gone*. If there are any more corrupted files (aparted from the zeroed ones), *they are gone* too. So I should retrieve what I can from the backup, reemerge the affected packages, and go on with my life. Do you agree?

I realized that the damaged files were filled with nulls, so I made a script that detected null-filled files across my filesystem. There are some 250 files like this. The full list is in http://pastebin.com/m60f4ea6e
Here are some (the numbers following each file is the number of nulls):

/media/hda2/bin/du: 59084
/media/hda2/bin/rm: 34100
/media/hda2/bin/tr: 27492
/media/hda2/bin/wc: 23108
/media/hda2/bin/dir: 79420
/media/hda2/bin/cut: 27048
/media/hda2/bin/env: 13436
/media/hda2/bin/seq: 17740
/media/hda2/bin/tty: 12800
/media/hda2/bin/yes: 12876
/media/hda2/bin/expr: 23336
/media/hda2/bin/head: 24516
/media/hda2/bin/sort: 65168
/media/hda2/bin/stty: 37380
/media/hda2/bin/sync: 12584
/media/hda2/bin/true: 12200
/media/hda2/bin/vdir: 79420
/media/hda2/bin/dirname: 13836
/media/hda2/bin/rmdir: 14684
/media/hda2/bin/sleep: 14772
/media/hda2/bin/touch: 34328
/media/hda2/bin/uname: 15316
/media/hda2/bin/chroot: 13500
/media/hda2/bin/mkfifo: 14644
/media/hda2/bin/readlink: 18988
/media/hda2/bin/basename: 13772
/media/hda2/etc/enlightenment/sysactions.conf: 2964
/media/hda2/etc/laptop-mode/laptop-mode.conf: 15916
/media/hda2/etc/laptop-mode/lm-profiler.conf : 1561
/media/hda2/etc/portage/package.keywords/wanted~: 796
/media/hda2/var/lib/scrollkeeper/scrollkeeper_docs: 1812
/media/hda2/usr/lib/aspell-0.60/nroff-filter.info: 232
/media/hda2/usr/lib/aspell- 0.60/iso-8859-1.cset: 13848
/media/hda2/usr/lib/aspell-0.60/iso-8859-2.cset: 14133
/media/hda2/usr/lib/aspell-0.60/cp1250.cmap: 31404
/media/hda2/usr/lib/aspell-0.60/cp1252.cset: 14039
/media/hda2/usr/lib/aspell- 0.60/cp1253.cset: 13682
/media/hda2/usr/lib/aspell-0.60/iso-8859-8.cmap: 27758
/media/hda2/usr/lib/aspell-0.60/iso-8859-8.cset: 12557
/media/hda2/usr/lib/aspell-0.60/cp1255.cmap: 35133
/media/hda2/usr/lib/aspell- 0.60/cp1256.cset: 13307
/media/hda2/usr/lib/aspell-0.60/cp1257.cmap: 31235
/media/hda2/usr/lib/aspell-0.60/cp1258.cset: 13920
/media/hda2/usr/lib/aspell-0.60/context-filter.so: 26976
/media/hda2/usr/lib/aspell- 0.60/iso-8859-10.cmap: 31046
/media/hda2/usr/lib/aspell-0.60/iso-8859-10.cset: 14259
/media/hda2/usr/lib/aspell-0.60/texinfo-filter.info: 914


Except for stuff in /bin, these files are not essential, so it does seem that the system is fixable.

One more question: this backup that I made after the screwup but before rebuild-tree, do you think it is reliable?
I have no prior backups... the *only* information I have came from the filesystem after the screw up...
The "backup" I have was made *after* the screw up but before the rebuild-tree.
I'm asking if this backup is reliable.

I have chosen one of the files that were zeroed in my filesystem and the corresponding file from the backup was OK. So at least in this case, *it was the rebuild-tree that corrupted the file*. This suggests that the *backup is reliable*.

Oh, and I found useful information in the output of rebuild-tree. Turns out that the output was huge because it prints status information in the screen and erases it with ^H. So the information that stays on the screen is far smaller than total information that is printed on the screen. In http://pastebin.com/m319aa81a you find the information that stays in the screen. You can see that the corrupted files are mentioned.

And thank you all for your attention!
Back to top
View user's profile Send private message
Rob1n
l33t
l33t


Joined: 29 Nov 2003
Posts: 714
Location: Cambridge, UK

PostPosted: Thu Sep 13, 2007 8:06 am    Post subject: Reply with quote

jorgepeixoto wrote:
Because of the forums outage, I asked for help on the gentoo-users mailing list. A guy named Volker Armin Hemmann thinks that I can use the partition; there will be no more filesystem damage, and once I fix the corrupted files, the system will be OK. Do you agree with him?

No - unless you know what actually caused the corruption then you can't say it won't happen again. A sudden crash shouldn't cause this sort of damage - at worst the few files/directories being written should be affected (and in most cases there's no real damage at all).

Quote:
Also, it seems to me that there is no more I can do to save data from this partition. The zeroed files *are gone*. If there are any more corrupted files (aparted from the zeroed ones), *they are gone* too. So I should retrieve what I can from the backup, reemerge the affected packages, and go on with my life. Do you agree?

Yup - the only other way you could do it would be if you knew the content of files, in which case you _might_ be able to dig them out from the disk. This'd take an awful lot of work though.

Quote:
I realized that the damaged files were filled with nulls, so I made a script that detected null-filled files across my filesystem. There are some 250 files like this. The full list is in http://pastebin.com/m60f4ea6e

Except for stuff in /bin, these files are not essential, so it does seem that the system is fixable.

Looks reasonable - there appears to be whole directories affected so issues were probably fairly localised.

Quote:
One more question: this backup that I made after the screwup but before rebuild-tree, do you think it is reliable?
I have no prior backups... the *only* information I have came from the filesystem after the screw up...
The "backup" I have was made *after* the screw up but before the rebuild-tree.
I'm asking if this backup is reliable.

I have chosen one of the files that were zeroed in my filesystem and the corresponding file from the backup was OK. So at least in this case, *it was the rebuild-tree that corrupted the file*. This suggests that the *backup is reliable*.

Well, it means that it found some data for the files - whether that's the correct data or not is another matter. You'd have to do a comparison against a known-correct version to be sure. I wouldn't want to rely on the backup anyway, though if it's your only chance to get some files back then you might as well - you're no worse off if they are corrupt.
Back to top
View user's profile Send private message
jorgepeixoto
Apprentice
Apprentice


Joined: 27 Apr 2006
Posts: 218
Location: São José dos Campos, São Paulo, Brasil

PostPosted: Thu Sep 13, 2007 8:23 am    Post subject: Reply with quote

Rob1n wrote:
jorgepeixoto wrote:
Because of the forums outage, I asked for help on the gentoo-users mailing list. A guy named Volker Armin Hemmann thinks that I can use the partition; there will be no more filesystem damage, and once I fix the corrupted files, the system will be OK. Do you agree with him?

No - unless you know what actually caused the corruption then you can't say it won't happen again. A sudden crash shouldn't cause this sort of damage - at worst the few files/directories being written should be affected (and in most cases there's no real damage at all).


Perhaps I caused the damage when I interrupted the fsck. As I said, I thought the it was no crime to interrupt a regular (not rebuild-tree) fsck. It seemed to me that the fsck was only checking, not fixing. Perhaps I was wrong.

Also, this crash was caused by hardware malfunction. It is possible that prior to crashing, the kernel was completely crazy and it can have done anything to my disk.

The hardware problem has been fixed
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum