Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[Solved] fsck seems to complicate an inode problem...
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
cgmd
Veteran
Veteran


Joined: 17 Feb 2005
Posts: 1585
Location: Louisiana

PostPosted: Thu Sep 27, 2007 5:22 pm    Post subject: [Solved] fsck seems to complicate an inode problem... Reply with quote

Hi...

I hate making a vague post, but I'm unable to recreate some details...

During a routine fsck of /dev/sda3, after 30 unchecked reboots, inode errors were reported. I don't recall the specific errors, but /var/log/messages from that date shows:
Code:

Sep 25 02:31:32 cgmd EXT3-fs error (device sda3): htree_dirblock_to_tree: bad entry in directory #2186980: rec_len % 4 != 0 - offset=20480, inode=858860853, rec_len=14641, name_len=53
Sep 25 03:00:01 cgmd EXT3-fs error (device sda3): htree_dirblock_to_tree: bad entry in directory #2186980: rec_len % 4 != 0 - offset=20480, inode=858860853, rec_len=14641, name_len=53
Sep 25 03:00:01 cgmd EXT3-fs error (device sda3): htree_dirblock_to_tree: bad entry in directory #2186980: rec_len % 4 != 0 - offset=20480, inode=858860853, rec_len=14641, name_len=53
Sep 25 03:00:01 cgmd EXT3-fs error (device sda3): empty_dir: bad entry in directory #2186980: rec_len % 4 != 0 - offset=20480, inode=858860853, rec_len=14641, name_len=53


I was basically instructed to log in with my root password and run fsck to fix the problems. This process identified many problematic inodes (don't recall exact terminology) and asked permission to reassign those files to /lost+found, which (not knowing an alternative) I willfully accepted. I now have:
Code:

# ls -l /lost+found/
total 344
-rw-r--r-- 1 root root  513 Sep 27 11:31 #2187051
-rw-r--r-- 1 root root  513 Sep 27 11:31 #2187054
-rw-r--r-- 1 root root  513 Sep 27 11:31 #2187058
-rw-r--r-- 1 root root  621 Sep 27 11:31 #2187062
-rw-r--r-- 1 root root  354 Sep 27 11:31 #2187070
-rw-r--r-- 1 root root  294 Sep 27 11:31 #2187094
-rw-r--r-- 1 root root  352 Sep 27 11:31 #2187097
-rw-r--r-- 1 root root  956 Sep 27 11:31 #2187099
-rw-r--r-- 1 root root  644 Sep 27 11:31 #2187106
-rw-r--r-- 1 root root  639 Sep 27 11:31 #2187108
-rw-r--r-- 1 root root  644 Sep 27 11:31 #2187109
-rw-r--r-- 1 root root  720 Sep 27 11:31 #2187114
-rw-r--r-- 1 root root  325 Sep 27 11:31 #2187122
-rw-r--r-- 1 root root  935 Sep 27 11:31 #2187126
-rw-r--r-- 1 root root  607 Sep 27 11:31 #2187135
-rw-r--r-- 1 root root  611 Sep 27 11:31 #2187136
-rw-r--r-- 1 root root  748 Sep 27 11:31 #2187139
-rw-r--r-- 1 root root  563 Sep 27 11:31 #2187141
-rw-r--r-- 1 root root  791 Sep 27 11:31 #2187150
-rw-r--r-- 1 root root  383 Sep 27 11:31 #2187154
-rw-r--r-- 1 root root  702 Sep 27 11:31 #2187175
-rw-r--r-- 1 root root  658 Sep 27 11:31 #2187179
-rw-r--r-- 1 root root  479 Sep 27 11:31 #2187193
-rw-r--r-- 1 root root  603 Sep 27 11:31 #2187200
-rw-r--r-- 1 root root  351 Sep 27 11:31 #2187203
-rw-r--r-- 1 root root  334 Sep 27 11:31 #2187209
-rw-r--r-- 1 root root  874 Sep 27 11:31 #2187220
-rw-r--r-- 1 root root  619 Sep 27 11:31 #2187232
-rw-r--r-- 1 root root  529 Sep 27 11:31 #2187234
-rw-r--r-- 1 root root  380 Sep 27 11:31 #2187248
-rw-r--r-- 1 root root  382 Sep 27 11:31 #2187254
-rw-r--r-- 1 root root  507 Sep 27 11:31 #2187256
-rw-r--r-- 1 root root 1353 Sep 27 11:31 #2187257
-rw-r--r-- 1 root root 1038 Sep 27 11:31 #2187260
-rw-r--r-- 1 root root  341 Sep 27 11:31 #2187271
-rw-r--r-- 1 root root 3401 Sep 27 11:31 #2187281
-rw-r--r-- 1 root root 1326 Sep 27 11:31 #2187285
-rw-r--r-- 1 root root 3399 Sep 27 11:31 #2187304
-rw-r--r-- 1 root root 1367 Sep 27 11:31 #2187306
-rw-r--r-- 1 root root 1246 Sep 27 11:31 #2187308
-rw-r--r-- 1 root root 3376 Sep 27 11:31 #2187311
-rw-r--r-- 1 root root  503 Sep 27 11:31 #2187317
-rw-r--r-- 1 root root  412 Sep 27 11:31 #2187321
-rw-r--r-- 1 root root  571 Sep 27 11:31 #2187329
-rw-r--r-- 1 root root  586 Sep 27 11:31 #2187336
-rw-r--r-- 1 root root  555 Sep 27 11:31 #2187342
-rw-r--r-- 1 root root  525 Sep 27 11:31 #2187362
-rw-r--r-- 1 root root  649 Sep 27 11:31 #2187372
-rw-r--r-- 1 root root  470 Sep 27 11:31 #2187376
-rw-r--r-- 1 root root  770 Sep 27 11:31 #2187379
-rw-r--r-- 1 root root  546 Sep 27 11:31 #2187387
-rw-r--r-- 1 root root  404 Sep 27 11:31 #2187391
-rw-r--r-- 1 root root  605 Sep 27 11:31 #2187392
-rw-r--r-- 1 root root 1833 Sep 27 11:31 #2187401
-rw-r--r-- 1 root root  382 Sep 27 11:31 #2187405
-rw-r--r-- 1 root root 1670 Sep 27 11:31 #2187410
-rw-r--r-- 1 root root 1870 Sep 27 11:31 #2187414
-rw-r--r-- 1 root root  500 Sep 27 11:31 #2187418
-rw-r--r-- 1 root root  432 Sep 27 11:31 #2187451
-rw-r--r-- 1 root root  542 Sep 27 11:31 #2187453
-rw-r--r-- 1 root root  307 Sep 27 11:31 #2187459
-rw-r--r-- 1 root root  563 Sep 27 11:31 #2187465
-rw-r--r-- 1 root root  621 Sep 27 11:31 #2187467
-rw-r--r-- 1 root root  616 Sep 27 11:31 #2187474
-rw-r--r-- 1 root root  552 Sep 27 11:31 #2187484
-rw-r--r-- 1 root root  577 Sep 27 11:31 #2187487
-rw-r--r-- 1 root root  280 Sep 27 11:31 #2187492
-rw-r--r-- 1 root root  374 Sep 27 11:31 #2187512
-rw-r--r-- 1 root root  548 Sep 27 11:31 #2187515
-rw-r--r-- 1 root root  406 Sep 27 11:31 #2187518
-rw-r--r-- 1 root root  555 Sep 27 11:31 #2187525
-rw-r--r-- 1 root root 1246 Sep 27 11:31 #2187530
-rw-r--r-- 1 root root  419 Sep 27 11:31 #2187535
-rw-r--r-- 1 root root  527 Sep 27 11:31 #2187570
-rw-r--r-- 1 root root  564 Sep 27 11:31 #2187602
-rw-r--r-- 1 root root  744 Sep 27 11:31 #2187603
-rw-r--r-- 1 root root  498 Sep 27 11:31 #2187694
-rw-r--r-- 1 root root  469 Sep 27 11:31 #2187705
-rw-r--r-- 1 root root  422 Sep 27 11:31 #2187707
-rw-r--r-- 1 root root  443 Sep 27 11:31 #2187726
-rw-r--r-- 1 root root  398 Sep 27 11:31 #2187731
-rw-r--r-- 1 root root  388 Sep 27 11:31 #2187733
-rw-r--r-- 1 root root  537 Sep 27 11:31 #2187750
-rw-r--r-- 1 root root  512 Sep 27 11:31 #2187755
-rw-r--r-- 1 root root  359 Sep 27 11:31 #2187767

...And with that fsck fix, the system seems to be repaired.

Does anyone have an idea what might be going on?

If I tamper with those /lost+found/ files (e.g. move them to a different folder), fsck runs cleanly, but other bad things happen, like inactivating net.eth0, for example. If I restore that group of files to /lost+found, full functionality returns... :?

Would someone please help me troubleshoot and correct this??

Thanks!
_________________
"Primum non nocere" ---Galen


Last edited by cgmd on Sat Oct 06, 2007 2:11 pm; edited 1 time in total
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 23193

PostPosted: Thu Sep 27, 2007 11:11 pm    Post subject: Reply with quote

This looks bad. Have you had any crashes, kernel panics, or other events that could have caused filesystem corruption? It appears that a large number of inodes became orphaned. That is, you ended up with files that had no name. Try to analyze the recovered fragments to determine their relation to each other, as well as their purpose on the system. You may find file and strings helpful in this task.

Does the kernel report any disk I/O errors? If your drive supports S.M.A.R.T. or another monitoring technology, check the drive error counters. Use the drive extensively, then recheck the counters. If they have moved by a significant amount, that may indicate that the disk is failing.
Back to top
View user's profile Send private message
cgmd
Veteran
Veteran


Joined: 17 Feb 2005
Posts: 1585
Location: Louisiana

PostPosted: Fri Sep 28, 2007 1:01 am    Post subject: Reply with quote

Hu wrote:
This looks bad. Have you had any crashes, kernel panics, or other events that could have caused filesystem corruption? It appears that a large number of inodes became orphaned. That is, you ended up with files that had no name. Try to analyze the recovered fragments to determine their relation to each other, as well as their purpose on the system. You may find file and strings helpful in this task.

Does the kernel report any disk I/O errors? If your drive supports S.M.A.R.T. or another monitoring technology, check the drive error counters. Use the drive extensively, then recheck the counters. If they have moved by a significant amount, that may indicate that the disk is failing.


I have had no crashes or panics, but one thing that comes to mind that is possibly significant...

This Lenovo x60s Thinkpad is about 6 months old. The original kernel I tried to use was sys-kernel/suspend2-sources but I could not make it suspend properly. I then switched to sys-kernel/gentoo-sources, which I have used ever since then. About 2 weeks ago, when cleaning out old kernel versions, I also un-emerged suspend2-sources. :(

Might that have created this situation??

I haven't tried S.M.A.R.T, but the bios level "Hard disk drive diagnostics program" of this Lenovo laptop gives a "pass" to both read and speed tests of the drive.

Also, re-checking the drive with fsck came out clean. :?

Any other thoughts?
_________________
"Primum non nocere" ---Galen
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 23193

PostPosted: Fri Sep 28, 2007 2:49 am    Post subject: Reply with quote

Unmerging suspend2-sources should not matter, for several reasons. You were not using them. The package only controls the sources, not the kernel built from those sources. Unmerging them only deletes files. The problem fsck reported cannot be caused merely by a user application calling unlink, unless the kernel handling the unlink is buggy.

Try to determine what those recovered files used to be, before they were orphaned. Keep an eye out for disk I/O errors, and maybe try to run fsck more regularly than the automatic checks. Beware that fsck cannot be used safely on a live filesystem. You should bring the system to single user mode or, if possible, use a LiveCD, when you fsck the partition.
Back to top
View user's profile Send private message
cgmd
Veteran
Veteran


Joined: 17 Feb 2005
Posts: 1585
Location: Louisiana

PostPosted: Sun Sep 30, 2007 9:22 pm    Post subject: Reply with quote

Hu wrote:
Unmerging suspend2-sources should not matter, for several reasons. You were not using them. The package only controls the sources, not the kernel built from those sources. Unmerging them only deletes files. The problem fsck reported cannot be caused merely by a user application calling unlink, unless the kernel handling the unlink is buggy.

Try to determine what those recovered files used to be, before they were orphaned. Keep an eye out for disk I/O errors, and maybe try to run fsck more regularly than the automatic checks. Beware that fsck cannot be used safely on a live filesystem. You should bring the system to single user mode or, if possible, use a LiveCD, when you fsck the partition.

OK...

I have seen no I/O errors and fsck continues to run cleanly, without reporting errors.

I had erroneously posted above:
Quote:
If I tamper with those /lost+found/ files (e.g. move them to a different folder), fsck runs cleanly, but other bad things happen, like inactivating net.eth0, for example. If I restore that group of files to /lost+found, full functionality returns...

This is incorrect. I had that impression, but I was later to learn that a simultaneous network failure during a reboot gave me those problems. I can, in fact, remove those orphaned inode files from /lost+found without ill effect.

As suggested above, by Hu, I examined the content of the orphaned files (all of which were ASCII files), and found them all to be of a similar format, exemplified by:
Code:
 
# cat \#2187726
_eclasses_=toolchain-funcs      /usr/portage/eclass     1187346967      multilib        /usr/portage/eclass     1183334777      portability     /usr/portage/eclass     1167690947      eutils /usr/portage/eclass      1188617810
SLOT=0
EAPI=0
DEPEND=virtual/libc
DESCRIPTION=An open-source VLAN management system
LICENSE=GPL-2
PROVIDE=
IUSE=
RESTRICT=
CDEPEND=
PDEPEND=
SRC_URI=mirror://sourceforge/vmps/vmpsd-1.3.tar.gz
KEYWORDS=~x86
HOMEPAGE=http://vmps.sourceforge.net
RDEPEND=virtual/libc

I then listed the related packages which I could discern from the SRC_URI= line in each orphaned inode file:
Code:

#2187051->  chan_sccp-20050922
#2187054->  chan_sccp-20051118
#2187058->  chan_sccp-20050902
#2187062->  rate-engine-0.5.4
#2187070->  autossh-1.2e
#2187094->  bsdwhois-1.43.2.1
#2187097->  cadaver-0.22.3
#2187099->  capi4hylafax_01.03
#2187106->  cfengine-2.1.21
#2187108->  cfengine-2.1.17
#2187109->  cfengine-2.1.22
#2187114->  vpnclient-linux-4.6.0
#2187122->  connect-1.95
#2187126->  curl-7.15.5
#2187135->  dhcdbd-3.0
#2187136->  dhcdbd-2.8
#2187139->  dhcp-3.0.6
#2187141->  dhcpcd-2.0.5                            newer version is installed
#2187150->  dropbear-0.49
#2187154->  elianna-pack_v1.0
#2187175->  gnome-blog-0.9
#2187179->  sitemap_gen-1.4     
#2187193->  gwhois_20061002
#2187200->  howl-0.9.8
#2187203->  hsc-0.935
#2187209->  httptunnel-3.0.5
#2187220->  ICAClient-9.0-1
#2187232->  ifenslave-2.6_1.1.0
#2187234->  ip-sentinel-0.12
#2187248->  ipsorc-2.0.9
#2187254->  ipv6calc-0.51
#2187256->  ipx-1.1
#2187257->  italc-1.0.2
#2187260->  wakeonlan-1.0.0
#2187271->  jwhois-3.2.3
#2187281->  kickpim-0.5.3
#2187285->  knetload-2.9.92
#2187304->  kssh-0.7
#2187306->  ktraynetworker-0.8c
            ktraynetworker_resources_0.2
#2187308->  kvpnc-0.8.8
#2187311->  kwebget-0.8.1
#2187317->  l7-protocols-2007-05-09
#2187321->  l7-protocols-2006-06-03
#2187329->  lksctp-tools-1.0.6
#2187336->  mDNSResponder-98                    newer version is installed
#2187342->  memcached-1.1.11
#2187362->  nemesis-1.4
#2187372->  net-misc/netcomics-cvs-0.14.1
#2187376->  bootpd-2.4
#2187379->  netkit-rsh-0.17                      installed net-misc/netkit-rsh-0.17-r8
            rexec-1.5
            netkit-rsh-0.17-patches-1.0
#2187387->  netkit-timed-0.17
#2187391->  netprofiles-ims-0.1.0134
#2187392->  netsed
#2187401->  NetworkManager-vpnc-0.6.4
#2187405->  nstx-1.1
#2187410->  nx-X11-3.0.0-37
            nxagent-3.0.0-85
            nxproxy-3.0.0-4
            nxauth-3.0.0-6
            nxcompext-3.0.0-18
            nxcompshad-3.0.0-19
            nxcomp-3.0.0-43
#2187414->  linuxterminalserver-1.5.0-server
            linuxterminalserver-1.5.0-common
            linuxterminalserver-1.5.0-client
#2187418->  olsrd-0.5.2
#2187451->  packETH-1.3
#2187453->  pavuk-0.9.34
#2187459->  pipes-1.16
#2187465->  proxytunnel-1.5.0
#2187467->  proxytunnel-1.6.0
#2187474->  putty-0.59
#2187484->  rarpd-1.1
#2187487->  rdate-1.4
#2187492->  redir-2.2.1
#2187512->  shmux-1.0.1
#2187515->  siproxd-0.5.11
#2187518->  sipsak-0.9.5
#2187525->  SJphoneLnx-1.60.2235
#2187530->  smb4k-0.8.4
#2187535->  smbc-1.0.0
#2187570->  sobby-0.3.0
#2187602->  tcpsound-0.3.1
#2187603->  telnet-bsd-1.2
#2187726->  vmpsd-1.3
#2187731->  vncrec-0.2
#2187733->  vncsnapshot-1.1
#2187750->  whois_4.7.12
#2187755->  wput-0.6
#2187767->  xsupplicant-1.0.1

I have none of the listed packages installed, with the exception of the 3 for which a newer version is installed.

I would now presume that fsck actually placed only useless orphaned inode files in /lost+found, and that, perhaps, I can safely remove them.

Are there any additional things I need consider? :?
_________________
"Primum non nocere" ---Galen
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 23193

PostPosted: Sun Sep 30, 2007 11:22 pm    Post subject: Reply with quote

You can remove those files, if they do not contain any data you want to save. They appear to be remnants of the metadata for packages you have unmerged. I am still very concerned about what would cause a system to develop dozens of orphaned inodes for no apparent reason, but I doubt you can gain any further information by examining the orphans.
Back to top
View user's profile Send private message
cgmd
Veteran
Veteran


Joined: 17 Feb 2005
Posts: 1585
Location: Louisiana

PostPosted: Sun Sep 30, 2007 11:49 pm    Post subject: Reply with quote

Hu wrote:
You can remove those files, if they do not contain any data you want to save. They appear to be remnants of the metadata for packages you have unmerged. I am still very concerned about what would cause a system to develop dozens of orphaned inodes for no apparent reason, but I doubt you can gain any further information by examining the orphans.

You have been very instructive...

Thank you! :)
_________________
"Primum non nocere" ---Galen
Back to top
View user's profile Send private message
cgmd
Veteran
Veteran


Joined: 17 Feb 2005
Posts: 1585
Location: Louisiana

PostPosted: Sun Sep 30, 2007 11:49 pm    Post subject: Reply with quote

Hu wrote:
You can remove those files, if they do not contain any data you want to save. They appear to be remnants of the metadata for packages you have unmerged. I am still very concerned about what would cause a system to develop dozens of orphaned inodes for no apparent reason, but I doubt you can gain any further information by examining the orphans.

You have been very instructive...

Thank you! :)
_________________
"Primum non nocere" ---Galen
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum