Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
/dev/md0 nach jedem Neustart Dateisystem defekt
View unanswered posts
View posts from last 24 hours
View posts from last 7 days

 
Reply to topic    Gentoo Forums Forum Index Deutsches Forum (German)
View previous topic :: View next topic  
Author Message
Tinitus
Veteran
Veteran


Joined: 20 Sep 2004
Posts: 1754

PostPosted: Sat Mar 14, 2009 8:05 am    Post subject: /dev/md0 nach jedem Neustart Dateisystem defekt Reply with quote

Hallo,

nach jedem Neustart ist mein Softwareraid nur readonly gemounted, weil Dateisystemfehler zu finden sind.
Ich verwende ext3. Mounten tue ich über /etc/fstab.

Eintag in der fstab
Code:

/dev/md0      /home           ext3            noatime         0 1


Was kann man da machen, damit das besser klappt.

G. R.
Back to top
View user's profile Send private message
tamiko
Developer
Developer


Joined: 02 Sep 2006
Posts: 96

PostPosted: Sat Mar 14, 2009 10:27 am    Post subject: Reply with quote

Bist du dir sicher, dass nicht evtl. eine der Festplatten defekt ist?

Lass mal einen SMART-Selbsttest laufen (smartmontools) und schau mal in deinem Kernellog nach, ob dort verdächtige Einträge stehen.
Back to top
View user's profile Send private message
Tinitus
Veteran
Veteran


Joined: 20 Sep 2004
Posts: 1754

PostPosted: Sat Mar 14, 2009 10:42 am    Post subject: Reply with quote

tamiko wrote:
Bist du dir sicher, dass nicht evtl. eine der Festplatten defekt ist?

Lass mal einen SMART-Selbsttest laufen (smartmontools) und schau mal in deinem Kernellog nach, ob dort verdächtige Einträge stehen.


Hallo ich denke nicht das da Fehler drauf sind, oder?:

Code:
smartctl -a /dev/sde
smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Smartctl open device: /dev/sde failed: No such file or directory
Linuxserver ~ # smartctl -a /dev/sdc
smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     SAMSUNG HD103UJ
Serial Number:    S13PJDWQ306857
Firmware Version: 1AA01109
User Capacity:    1.000.204.886.016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 3b
Local Time is:    Sat Mar 14 11:41:06 2009 CET

==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details.

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)   Offline data collection activity
               was never started.
               Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)   The previous self-test routine completed
               without error or no self-test has ever
               been run.
Total time to complete Offline
data collection:        (11658) seconds.
Offline data collection
capabilities:           (0x7b) SMART execute Offline immediate.
               Auto Offline data collection on/off support.
               Suspend Offline collection upon new
               command.
               Offline surface scan supported.
               Self-test supported.
               Conveyance Self-test supported.
               Selective Self-test supported.
SMART capabilities:            (0x0003)   Saves SMART data before entering
               power-saving mode.
               Supports SMART auto save timer.
Error logging capability:        (0x01)   Error logging supported.
               General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   2) minutes.
Extended self-test routine
recommended polling time:     ( 195) minutes.
Conveyance self-test routine
recommended polling time:     (  21) minutes.
SCT capabilities:           (0x003f)   SCT Status supported.
               SCT Feature Control supported.
               SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   077   077   011    Pre-fail  Always       -       7870
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       154
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       1824
 10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       153
 13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age   Always       -       0
183 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
184 Unknown_Attribute       0x0033   100   100   099    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   078   068   000    Old_age   Always       -       22 (Lifetime Min/Max 14/22)
194 Temperature_Celsius     0x0022   077   068   000    Old_age   Always       -       23 (Lifetime Min/Max 14/24)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       118844112
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 0
Warning: ATA Specification requires self-test log structure revision number = 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1
SMART Selective self-test log data structure revision number 0
Warning: ATA Specification requires selective self-test log data structure revision number = 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


G. R.
Back to top
View user's profile Send private message
Tinitus
Veteran
Veteran


Joined: 20 Sep 2004
Posts: 1754

PostPosted: Sat Mar 14, 2009 3:15 pm    Post subject: Reply with quote

So ich habe jetzt mal folgendes gemacht:

Code:

 dd if=/dev/md0 of=/dev/null
1953519872+0 Datensätze ein
1953519872+0 Datensätze aus
1000202174464 Bytes (1,0 TB) kopiert, 10850,7 s, 92,2 MB/s


Das heißt doch, daß es keine Lesefehler gibt oder?

Im Log habe ich folgendes gefunden:
Code:

Jun 14 10:32:04 Linuxserver ext3_abort called.
Jun 14 10:32:04 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Jun 14 10:32:13 Linuxserver EXT3-fs error (device md0): ext3_find_entry: reading directory #10543105 offset 0
Jun 14 10:32:34 Linuxserver EXT3-fs error (device md0): ext3_find_entry: reading directory #10543164 offset 0
Jun 14 10:32:48 Linuxserver EXT3-fs error (device md0): ext3_find_entry: reading directory #10543105 offset 0
Jun 14 10:33:20 Linuxserver EXT3-fs error (device md0): ext3_find_entry: reading directory #10543105 offset 0
Jun 14 10:38:14 Linuxserver EXT3-fs error (device md0): ext3_find_entry: reading directory #2 offset 0
Jun 14 10:39:24 Linuxserver EXT3-fs error (device md0): ext3_find_entry: reading directory #2 offset 0
Jun 14 10:39:24 Linuxserver [<ffffffff802d279f>] ext3_count_free_inodes+0x2a/0x43
Jun 14 10:39:24 Linuxserver [<ffffffff802dad40>] ext3_commit_super+0x49/0x65
Jun 14 10:39:24 Linuxserver [<ffffffff802db85c>] ext3_handle_error+0x83/0xaa
Jun 14 10:39:24 Linuxserver [<ffffffff802db967>] ext3_error+0x83/0x90
Jun 14 10:39:24 Linuxserver [<ffffffff802d9186>] ext3_find_entry+0x413/0x5c4
Jun 14 10:39:24 Linuxserver [<ffffffff802daab5>] ext3_lookup+0x31/0x120
Jun 14 10:42:28 Linuxserver EXT3-fs error (device md0): ext3_find_entry: reading directory #2 offset 0
Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 260862
Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 261616
Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 261630
Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 260837
Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 260857
Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 261628
Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 261627
Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 261626
Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 261610
Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 261624
Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 260764
Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 260768
Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 261571
Jun 27 00:02:17 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 10543206
Jun 27 14:18:02 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 10543348
Jul  7 10:31:23 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 10543249
Jan 24 15:50:42 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 11010060
Jan 24 15:50:42 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 10567834
Jan 24 15:50:42 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 10567986
Feb  1 13:02:41 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 8503298
Feb  1 13:02:41 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 8503297
Feb  1 13:02:41 Linuxserver ext3_orphan_cleanup: deleting unreferenced inode 11010065
Feb 21 21:21:57 Linuxserver ext3_abort called.
Feb 21 21:21:57 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Feb 21 22:04:14 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
Feb 21 22:04:14 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Marking fs in need of filesystem check.
Feb 24 09:23:19 Linuxserver ext3_abort called.
Feb 24 09:23:19 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Feb 24 09:23:19 Linuxserver EXT3-fs error (device md0) in ext3_ordered_write_end: IO failure
Feb 24 09:28:17 Linuxserver ext3_abort called.
Feb 24 09:28:17 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal
Feb 24 09:29:43 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
Feb 24 09:29:43 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Marking fs in need of filesystem check.
Feb 25 07:23:15 Linuxserver ext3_abort called.
Feb 25 07:23:15 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Feb 25 07:25:53 Linuxserver ext3_abort called.
Feb 25 07:25:53 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal
Feb 25 07:25:56 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
Feb 25 07:25:56 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Marking fs in need of filesystem check.
Feb 26 08:59:09 Linuxserver ext3_abort called.
Feb 26 08:59:09 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Feb 26 09:29:13 Linuxserver ext3_abort called.
Feb 26 09:29:13 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal
Feb 26 09:29:38 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
Feb 26 09:29:38 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Marking fs in need of filesystem check.
Feb 28 09:28:01 Linuxserver ext3_abort called.
Feb 28 09:28:01 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Feb 28 09:34:51 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
Feb 28 09:34:51 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Marking fs in need of filesystem check.
Feb 28 09:36:54 Linuxserver ext3_abort called.
Feb 28 09:36:54 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Mar  4 22:23:58 Linuxserver ext3_abort called.
Mar  4 22:23:58 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Mar  5 12:57:50 Linuxserver ext3_abort called.
Mar  5 12:57:50 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Mar  5 12:59:44 Linuxserver ext3_abort called.
Mar  5 12:59:44 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal
Mar  5 12:59:48 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
Mar  5 12:59:48 Linuxserver EXT3-fs warning (device md0): ext3_clear_journal_err: Marking fs in need of filesystem check.
Mar  7 11:08:32 Linuxserver ext3_abort called.
Mar  7 11:08:32 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Mar  7 11:10:10 Linuxserver ext3_abort called.
Mar  7 11:10:10 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal
Mar  9 09:54:51 Linuxserver ext3_abort called.
Mar  9 09:54:51 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Mar  9 09:56:17 Linuxserver ext3_abort called.
Mar  9 09:56:17 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal
Mar 12 07:55:35 Linuxserver ext3_abort called.
Mar 12 07:55:35 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Mar 12 07:58:32 Linuxserver ext3_abort called.
Mar 12 07:58:32 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal
Mar 14 08:56:35 Linuxserver ext3_abort called.
Mar 14 08:56:35 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Mar 14 08:58:14 Linuxserver ext3_abort called.
Mar 14 08:58:14 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal
Mar 14 11:50:32 Linuxserver EXT3-fs error (device md0) in ext3_free_blocks_sb: Journal has aborted
Mar 14 11:50:32 Linuxserver EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted
Mar 14 11:50:32 Linuxserver EXT3-fs error (device md0) in ext3_truncate: Journal has aborted
Mar 14 11:50:32 Linuxserver EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted
Mar 14 11:50:32 Linuxserver EXT3-fs error (device md0) in ext3_orphan_del: Journal has aborted
Mar 14 11:50:32 Linuxserver EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted
Mar 14 11:50:32 Linuxserver EXT3-fs error (device md0) in ext3_delete_inode: Journal has aborted
Mar 14 11:50:32 Linuxserver ext3_abort called.
Mar 14 11:50:32 Linuxserver EXT3-fs error (device md0): ext3_journal_st



bedeutet das, daß nur mein ext3 Journal defekt ist?
Back to top
View user's profile Send private message
Tinitus
Veteran
Veteran


Joined: 20 Sep 2004
Posts: 1754

PostPosted: Sat Mar 14, 2009 8:16 pm    Post subject: Reply with quote

So jetzt habe ich mal:

Code:
fsck.ext3 -p -v -c /dev/md0
/dev/md0: stelle das Journal wieder her

  424357 inodes used (0.70%)
   30345 non-contiguous inodes (7.2%)
         # von Inodes mit ind/dind/tind Blöcken: 83306/18000/21
179961114 blocks used (73.70%)
       0 bad blocks
      56 large files

  387304 regular files
   36998 directories
       0 character device files
       0 block device files
       2 fifos
       0 links
      43 symbolic links (43 fast symbolic links)
       1 socket
--------
  424348 files


laufen lassen.

Jetzt checke ich auf badblocks von /dev/md0...

macht das überhaupt Sinn, oder sollte ich lieber die Platten testen?

Kennt sich da jemand aus?

G. R.
Back to top
View user's profile Send private message
tamiko
Developer
Developer


Joined: 02 Sep 2006
Posts: 96

PostPosted: Sat Mar 14, 2009 8:58 pm    Post subject: Reply with quote

Mhm.

Zu allererst folgendes: Erfolgreich von einem Raid lesen heißt nicht, dass beide Platten in Ordnung sind. Daher müsstest du badblocks direkt auf die Platten loslassen.
Mit Selbsttest meinte ich eigentlich
Code:
smartctl -t ...
und dann nach x Std. nachschauen. *pfeif*
(Wenn dies tatsächlich einen Fehler liefert, ist die Platte hin. Falls dies durchläuft kann man hoffen, dass die Platte in Ordnung ist.)

Zu den Fehlermeldungen: Das sieht nicht gut aus.
Die Dateisystem-Fehler werden nicht korrigiert, wegen IO-Fehlern.
Code:
Mar  7 11:08:32 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Mar  7 11:10:10 Linuxserver ext3_abort called.
Mar  7 11:10:10 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal
Mar  9 09:54:51 Linuxserver ext3_abort called.
Mar  9 09:54:51 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Mar  9 09:56:17 Linuxserver ext3_abort called.
Mar  9 09:56:17 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal
Mar 12 07:55:35 Linuxserver ext3_abort called.
Mar 12 07:55:35 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Mar 12 07:58:32 Linuxserver ext3_abort called.
Mar 12 07:58:32 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal
Mar 14 08:56:35 Linuxserver ext3_abort called.
Mar 14 08:56:35 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Mar 14 08:58:14 Linuxserver ext3_abort called.

Dies spricht für einen Hardwaredefekt.

/edit:
Wenn man nach "ext3_abort called" sucht, findet man häufiger Hardwaredefekts a la "Controller defekt" oder "Kabel defekt." Vllt. möchtest du diese möglichen Fehlerquellen ausschließen.
Bitte man auf jeden Fall ersteinmal einen SMART-Selbsttest. Ich vermute, dass dieser bei einer Platte nicht erfolgreich durchlaufen wird.
Back to top
View user's profile Send private message
Tinitus
Veteran
Veteran


Joined: 20 Sep 2004
Posts: 1754

PostPosted: Sat Mar 14, 2009 9:34 pm    Post subject: Reply with quote

tamiko wrote:
Mhm.

Zu allererst folgendes: Erfolgreich von einem Raid lesen heißt nicht, dass beide Platten in Ordnung sind. Daher müsstest du badblocks direkt auf die Platten loslassen.
Mit Selbsttest meinte ich eigentlich
Code:
smartctl -t ...
und dann nach x Std. nachschauen. *pfeif*
(Wenn dies tatsächlich einen Fehler liefert, ist die Platte hin. Falls dies durchläuft kann man hoffen, dass die Platte in Ordnung ist.)

Zu den Fehlermeldungen: Das sieht nicht gut aus.
Die Dateisystem-Fehler werden nicht korrigiert, wegen IO-Fehlern.
Code:
Mar  7 11:08:32 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Mar  7 11:10:10 Linuxserver ext3_abort called.
Mar  7 11:10:10 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal
Mar  9 09:54:51 Linuxserver ext3_abort called.
Mar  9 09:54:51 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Mar  9 09:56:17 Linuxserver ext3_abort called.
Mar  9 09:56:17 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal
Mar 12 07:55:35 Linuxserver ext3_abort called.
Mar 12 07:55:35 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Mar 12 07:58:32 Linuxserver ext3_abort called.
Mar 12 07:58:32 Linuxserver EXT3-fs error (device md0): ext3_put_super: Couldn't clean up the journal
Mar 14 08:56:35 Linuxserver ext3_abort called.
Mar 14 08:56:35 Linuxserver EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Mar 14 08:58:14 Linuxserver ext3_abort called.

Dies spricht für einen Hardwaredefekt.

/edit:
Wenn man nach "ext3_abort called" sucht, findet man häufiger Hardwaredefekts a la "Controller defekt" oder "Kabel defekt." Vllt. möchtest du diese möglichen Fehlerquellen ausschließen.
Bitte man auf jeden Fall ersteinmal einen SMART-Selbsttest. Ich vermute, dass dieser bei einer Platte nicht erfolgreich durchlaufen wird.


Hi ich habe jetzt seit ca. 10 Stunden den smartd laufen gehabt. Keine Hinweise, außer Temperaturmeldungen.

reicht das auch?
Danke schon mal!

G. R.
Back to top
View user's profile Send private message
tamiko
Developer
Developer


Joined: 02 Sep 2006
Posts: 96

PostPosted: Sat Mar 14, 2009 9:48 pm    Post subject: Reply with quote

Nein das reicht nicht.
Mach mir den gefallen und starte über Nacht ein
Code:
smartctl --test=long /dev/*
auf allen Platten. Danach kannst du am nächsten Tag via
Code:
smartctl -a
nachschauen, wie der Selbsttest verlaufen ist.
Back to top
View user's profile Send private message
Scorpion_DE
n00b
n00b


Joined: 22 Apr 2006
Posts: 10

PostPosted: Sun Mar 15, 2009 9:15 pm    Post subject: Reply with quote

Hi,

entweder ich habe es übersehen oder es wurde noch nicht gesagt: was ist das für ein RAID-Level (0,1)? Wieviele Disks gehören zu md0 - nur /dev/sde? Die Ausgabe von "cat /proc/mdstat" wäre für mich auch interessant.

Handelt es sich um RAID1, dann wäre auch bei Ausfall einer Disk die Integrität des darüberliegenden Dateisystems (bei dir ext3) nicht gefährdet. Bei einem Stripe (RAID0) und dem Ausfall einer Disk bzw. von Teilen, würde ich eher erwarten, daß du mit dem Dateisystem überhaupt nichts mehr anfangen kannst.

Als weitere Ursachen kämen in Betracht:

- Temperaturprobleme der Disks (Raptoren oder VelociRaptoren in schlecht belüfteten Gehäusen z.B.)
- Minderwertige Kabel
- Aggressiv übertaktetes System
- Aggressive oder exotische CFLAGS
- Sonstige Probleme mit Mainboard, Speicher, CPU
- Im Kernel falschen Treiber für Disk Controller gewählt

Gruß Scorpion
Back to top
View user's profile Send private message
hitachi
Guru
Guru


Joined: 20 Feb 2006
Posts: 478
Location: Freiburg / Deutschland

PostPosted: Mon Mar 16, 2009 12:12 pm    Post subject: Reply with quote

Du kannst auch als root mal folgendes machen:
Code:
echo check >> /sys/block/md0/md/sync_action

Dabei kannst Du auf einer zweiten Konsole mit watch cat /proc/mdstad zuscheuen. Danach dann das wichtige:
Code:
cat /sys/block/md0/md/mismatch_cnt

Da müsste es auch irgendwo ein man zu geben. Bei mir Raid 5 dauert das für 50 GB etwa 8 Minuten. Das sollte man recht regelmäßig machen. Leider dauert das dann bei größeren Partitionen entsprechend länger. Wenn man gleichzeitig viel auf die Festplatte zugreift wird dann alles noch länger.
Back to top
View user's profile Send private message
Tinitus
Veteran
Veteran


Joined: 20 Sep 2004
Posts: 1754

PostPosted: Thu Mar 19, 2009 8:27 am    Post subject: Reply with quote

tamiko wrote:
Nein das reicht nicht.
Mach mir den gefallen und starte über Nacht ein
Code:
smartctl --test=long /dev/*
auf allen Platten. Danach kannst du am nächsten Tag via
Code:
smartctl -a
nachschauen, wie der Selbsttest verlaufen ist.


So dann man die Ausgaben:

Code:
# smartctl -a /dev/sdc
smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     SAMSUNG HD103UJ
Serial Number:    S13PJDWQ306857
Firmware Version: 1AA01109
User Capacity:    1.000.204.886.016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 3b
Local Time is:    Thu Mar 19 09:22:56 2009 CET

==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details.

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)   Offline data collection activity
               was never started.
               Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)   The previous self-test routine completed
               without error or no self-test has ever
               been run.
Total time to complete Offline
data collection:        (11658) seconds.
Offline data collection
capabilities:           (0x7b) SMART execute Offline immediate.
               Auto Offline data collection on/off support.
               Suspend Offline collection upon new
               command.
               Offline surface scan supported.
               Self-test supported.
               Conveyance Self-test supported.
               Selective Self-test supported.
SMART capabilities:            (0x0003)   Saves SMART data before entering
               power-saving mode.
               Supports SMART auto save timer.
Error logging capability:        (0x01)   Error logging supported.
               General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   2) minutes.
Extended self-test routine
recommended polling time:     ( 195) minutes.
Conveyance self-test routine
recommended polling time:     (  21) minutes.
SCT capabilities:           (0x003f)   SCT Status supported.
               SCT Feature Control supported.
               SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   077   077   011    Pre-fail  Always       -       7710
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       158
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail  Offline      -       13088
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       1933
 10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       157
 13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age   Always       -       0
183 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
184 Unknown_Attribute       0x0033   100   100   099    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   076   068   000    Old_age   Always       -       24 (Lifetime Min/Max 21/27)
194 Temperature_Celsius     0x0022   076   068   000    Old_age   Always       -       24 (Lifetime Min/Max 21/28)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       147807954
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 0
Warning: ATA Specification requires self-test log structure revision number = 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      1930         -

SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1
SMART Selective self-test log data structure revision number 0
Warning: ATA Specification requires selective self-test log data structure revision number = 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


und die 2 Festplatte:
Code:

 smartctl -a /dev/sdd
smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     SAMSUNG HD103UJ
Serial Number:    S13PJDWQ306856
Firmware Version: 1AA01109
User Capacity:    1.000.204.886.016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 3b
Local Time is:    Thu Mar 19 09:24:23 2009 CET

==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details.

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)   Offline data collection activity
               was never started.
               Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)   The previous self-test routine completed
               without error or no self-test has ever
               been run.
Total time to complete Offline
data collection:        (11429) seconds.
Offline data collection
capabilities:           (0x7b) SMART execute Offline immediate.
               Auto Offline data collection on/off support.
               Suspend Offline collection upon new
               command.
               Offline surface scan supported.
               Self-test supported.
               Conveyance Self-test supported.
               Selective Self-test supported.
SMART capabilities:            (0x0003)   Saves SMART data before entering
               power-saving mode.
               Supports SMART auto save timer.
Error logging capability:        (0x01)   Error logging supported.
               General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   2) minutes.
Extended self-test routine
recommended polling time:     ( 191) minutes.
Conveyance self-test routine
recommended polling time:     (  20) minutes.
SCT capabilities:           (0x003f)   SCT Status supported.
               SCT Feature Control supported.
               SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   077   077   011    Pre-fail  Always       -       7610
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       153
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       2
  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail  Offline      -       10921
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       1932
 10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       152
 13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age   Always       -       0
183 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
184 Unknown_Attribute       0x0033   100   100   099    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   075   066   000    Old_age   Always       -       25 (Lifetime Min/Max 21/28)
194 Temperature_Celsius     0x0022   075   067   000    Old_age   Always       -       25 (Lifetime Min/Max 21/29)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       61469876
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 0
Warning: ATA Specification requires self-test log structure revision number = 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      1926         -
# 2  Extended offline    Aborted by host               90%      1923         -

SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1
SMART Selective self-test log data structure revision number 0
Warning: ATA Specification requires selective self-test log data structure revision number = 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.




Was mir aber nach dem neuerlichen Problem aufgefallen ist:
Code:

 fsck.ext3 -p -v -c /dev/md0
/dev/md0: stelle das Journal wieder her
/dev/md0: Der Zeitpunkt des letzten Einhängens von SuperBlock liegt in der Zukunft REPARIERT.
/dev/md0: Updating bad block inode.
/dev/md0: Inode 8, i_Blocks ist 262416, sollte sein 256272.  REPARIERT.

  419656 inodes used (0.69%)
   30239 non-contiguous inodes (7.2%)
         # von Inodes mit ind/dind/tind Blöcken: 82340/18001/21
181120987 blocks used (74.17%)
       0 bad blocks
      56 large files

  382057 regular files
   37536 directories
       0 character device files
       0 block device files
       2 fifos
       0 links
      51 symbolic links (51 fast symbolic links)
       1 socket
--------
  419647 files


Mein System läuft aber mit ntpd. Die Zeit der Hardware Uhr ist fast identisch mit der Internetzeit
Code:
 hwclock --show
Do 19 Mär 2009 09:26:50 CET  -0.537553 Sekunden


Das kann doch nicht sein oder?

G. R.
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6780

PostPosted: Thu Mar 19, 2009 9:31 am    Post subject: Reply with quote

Das klingt so, als wenn aus irgendeinem Grund bei Dir clock (bzw. - benutzt Du baselayout2? - hwclock) nicht vor fsck ausgeführt wird.
Back to top
View user's profile Send private message
Tinitus
Veteran
Veteran


Joined: 20 Sep 2004
Posts: 1754

PostPosted: Thu Mar 19, 2009 11:21 am    Post subject: Reply with quote

mv wrote:
Das klingt so, als wenn aus irgendeinem Grund bei Dir clock (bzw. - benutzt Du baselayout2? - hwclock) nicht vor fsck ausgeführt wird.


benutze keine 2er Version... jedenfalls nicht wissentlich!?

Was kann man da noch machen?
Bzw. wie kann ich das verifizieren?
G. R.
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6780

PostPosted: Thu Mar 19, 2009 2:02 pm    Post subject: Reply with quote

Ganz einfach: Kommt die Meldung "Setting System Clock using Hardware Clock" (oder wie das bei baselayout-1 hieß - ist schon lange her) bevor er meldet, dass die Partition defekt sei?
Back to top
View user's profile Send private message
Tinitus
Veteran
Veteran


Joined: 20 Sep 2004
Posts: 1754

PostPosted: Thu Mar 19, 2009 3:27 pm    Post subject: Reply with quote

mv wrote:
Ganz einfach: Kommt die Meldung "Setting System Clock using Hardware Clock" (oder wie das bei baselayout-1 hieß - ist schon lange her) bevor er meldet, dass die Partition defekt sei?


Nein beim booten ist noch alles OK dann plötzlich nach ein paar Minuten kommt die Meldung, daß das Dateisystem im readonly Modus remounted wird.

Aber irgendwo muß man das doch in den Logfiles checken können, oder?


MfG
R. May
Back to top
View user's profile Send private message
Tinitus
Veteran
Veteran


Joined: 20 Sep 2004
Posts: 1754

PostPosted: Thu Mar 19, 2009 7:29 pm    Post subject: Reply with quote

Vielleicht ist auch das ext3 im Kernel buggy? Kann das sein? Vielleicht kommt es ja nicht mit 1TB Partitionen klar? Obwohl ja für wesentlich mehr spezifiziert....

Vielleicht sollte ich mal ein anderes FS probieren? Reiserfs hatte ich bisher ...bloß das wird ja nun wohl nicht mehr weiterentwickelt, oder?

G. R.
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6780

PostPosted: Thu Mar 19, 2009 8:20 pm    Post subject: Reply with quote

Tinitus wrote:
Vielleicht ist auch das ext3 im Kernel buggy? Kann das sein?

Das halte ich für unwahrscheinlich.
Quote:
Vielleicht kommt es ja nicht mit 1TB Partitionen klar? Obwohl ja für wesentlich mehr spezifiziert....

Bei einem 32-Bit Kernel solltest Du vielleicht die Configure-Option LBD (large block device) und LSF (large single file) aktivieren: Die findest Du unter "Enable Block Layer". Angeblich ist die kritische Grenze dafür zwar erst bei 2 Terrabyte und nicht schon bei einem, aber da könnte ich mir vorstellen, dass irgendwo versehentlich mit "signed" statt "unsigned" gerechnet wurde.
Back to top
View user's profile Send private message
hitachi
Guru
Guru


Joined: 20 Feb 2006
Posts: 478
Location: Freiburg / Deutschland

PostPosted: Thu Mar 19, 2009 9:33 pm    Post subject: Reply with quote

hast Du mal im dead.letter oder mit cat /proc/mdstat geschaut was er über das raid sagt, wenn die Probleme aufgetaucht sind? was ist mit echo check >> /sys/block/md0/md/sync_action && cat /sys/block/md0/md/mismatch_cnt (selbstverständlich musst Du mit dem zweiten warten bis der erste Befehl fertig ausgeführt wurde).
Back to top
View user's profile Send private message
Tinitus
Veteran
Veteran


Joined: 20 Sep 2004
Posts: 1754

PostPosted: Fri Mar 20, 2009 7:28 am    Post subject: Reply with quote

hitachi wrote:
hast Du mal im dead.letter oder mit cat /proc/mdstat geschaut was er über das raid sagt, wenn die Probleme aufgetaucht sind? was ist mit echo check >> /sys/block/md0/md/sync_action && cat /sys/block/md0/md/mismatch_cnt (selbstverständlich musst Du mit dem zweiten warten bis der erste Befehl fertig ausgeführt wurde).


Hallo,

keine Probleme. Irgendwie hat er immer wieder Probleme mit dem Journal. Ich schaufel jetzt alles von den Raid Festplatten...dann mache ich ein neues Dateisystem drauf.

G. R.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Deutsches Forum (German) All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum