Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
is my hd failing? raid5
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
oneeyedelf1
Tux's lil' helper
Tux's lil' helper


Joined: 04 Feb 2004
Posts: 124

PostPosted: Sun Oct 09, 2005 8:22 pm    Post subject: is my hd failing? raid5 Reply with quote

well I have had my computer all setup, all I do is turn it off when I go back home for breaks. Now I get this upon boot...
Basically Disk hda is going and protecting part of itself. When I boot the drive in another computer I dont get this agrivation. Note no real configuration change has been made(since it worked) other then I was seting up mdadm to monitor the status of my raid5 set(which is minor and didnt involve touching the set). Is it the disk, my computer, the chipset, the kernel, a status flag that I can reset? please help. Oh yeah its part of a raid5 data set, and has seperately the root and boot on the drive. Below is dmesg output, up until the point where it stops to boot, and asks for my password because md fails.
Code:

Linux version 2.6.12-rc6 (root@localhost) (gcc version 3.3.5 (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3, pie-8.7.7.1)) #3 SMP Mon Jun 20 14:27:18 EDT 2005
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000002fff0000 (usable)
 BIOS-e820: 000000002fff0000 - 000000002fff8000 (ACPI data)
 BIOS-e820: 000000002fff8000 - 0000000030000000 (ACPI NVS)
 BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
767MB LOWMEM available.
found SMP MP-table at 000fb9b0
On node 0 totalpages: 196592
  DMA zone: 4096 pages, LIFO batch:1
  Normal zone: 192496 pages, LIFO batch:31
  HighMem zone: 0 pages, LIFO batch:1
DMI 2.3 present.
ACPI: RSDP (v000 AMI                                   ) @ 0x000fa800
ACPI: RSDT (v001 AMIINT AMIINI09 0x00000010 MSFT 0x0100000d) @ 0x2fff0000
ACPI: FADT (v001 AMIINT AMIINI09 0x00000011 MSFT 0x0100000d) @ 0x2fff0030
ACPI: MADT (v001 AMIINT AMIINI09 0x00000011 MSFT 0x0100000d) @ 0x2fff00c0
ACPI: DSDT (v001    VIA   VIA_K7 0x00001000 MSFT 0x0100000d) @ 0x00000000
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 6:4 APIC version 16
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 2, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 30000000 (gap: 30000000:cec00000)
Built 1 zonelists
Kernel command line: root=/dev/hda3
mapped APIC to ffffd000 (fee00000)
mapped IOAPIC to ffffc000 (fec00000)
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 65536 bytes)
Detected 1406.876 MHz processor.
Using tsc for high-res timesource
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 774116k/786368k available (2888k kernel code, 11740k reserved, 1116k data, 240k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay loop... 2768.89 BogoMIPS (lpj=1384448)
Mount-cache hash table entries: 512
CPU: After generic identify, caps: 0183fbff c1c7fbff 00000000 00000000 00000000 00000000 00000000
CPU: After vendor identify, caps: 0183fbff c1c7fbff 00000000 00000000 00000000 00000000 00000000
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 256K (64 bytes/line)
CPU: After all inits, caps: 0183fbff c1c7fbff 00000000 00000020 00000000 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Enabling fast FPU save and restore... done.
Checking 'hlt' instruction... OK.
CPU0: AMD Athlon(tm) Processor stepping 04
Total of 1 processors activated (2768.89 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 pin1=2 pin2=-1
Brought up 1 CPUs
CPU0 attaching sched-domain:
 domain 0: span 01
  groups: 01
  domain 1: span 01
   groups: 01
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xfdb01, last bus=1
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20050309
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Probing PCI hardware (bus 00)
Boot video device is 0000:01:00.0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: Power Resource [URP1] (off)
ACPI: Power Resource [URP2] (off)
ACPI: Power Resource [FDDP] (off)
ACPI: Power Resource [LPTP] (off)
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 10 *11 12 14 15)
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 13 devices
SCSI subsystem initialized
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
Machine check exception polling timer started.
audit: initializing netlink socket (disabled)
audit(1128874355.788:0): initialized
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
Initializing Cryptographic API
ACPI: Power Button (FF) [PWRF]
lp: driver loaded but no devices found
Linux agpgart interface v0.101 (c) Dave Jones
[drm] Initialized drm 1.0.0 20040925
PNP: PS/2 Controller [PNP0303:PS2K,PNP0f03:PS2M] at 0x60,0x64 irq 1,12
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing disabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378 (0x778), irq 7 [PCSPP(,...)]
lp0: using parport0 (interrupt-driven).
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
ACPI: PCI Interrupt 0000:00:09.0[A] -> GSI 17 (level, low) -> IRQ 17
3c59x: Donald Becker and others. www.scyld.com/network/vortex.html
0000:00:09.0: 3Com PCI 3c905C Tornado at 0xd800. Vers LK1.1.19
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PDC20268: IDE controller at PCI slot 0000:00:0a.0
ACPI: PCI Interrupt 0000:00:0a.0[A] -> GSI 18 (level, low) -> IRQ 18
PDC20268: chipset revision 2
PDC20268: ROM enabled at 0xdffe0000
PDC20268: 100% native mode on irq 18
    ide2: BM-DMA at 0xdc00-0xdc07, BIOS settings: hde:pio, hdf:pio
    ide3: BM-DMA at 0xdc08-0xdc0f, BIOS settings: hdg:pio, hdh:pio
Probing IDE interface ide2...
hde: Maxtor 6Y160P0, ATA DISK drive
ide2 at 0xec00-0xec07,0xe802 on irq 18
Probing IDE interface ide3...
hdg: WDC WD1600JB-00DUA3, ATA DISK drive
ide3 at 0xe400-0xe407,0xe002 on irq 18
PDC20268: IDE controller at PCI slot 0000:00:0c.0
ACPI: PCI Interrupt 0000:00:0c.0[A] -> GSI 16 (level, low) -> IRQ 16
PDC20268: chipset revision 2
PDC20268: ROM enabled at 0xdffb0000
PDC20268: 100% native mode on irq 16
    ide4: BM-DMA at 0xc400-0xc407, BIOS settings: hdi:pio, hdj:pio
    ide5: BM-DMA at 0xc408-0xc40f, BIOS settings: hdk:pio, hdl:pio
Probing IDE interface ide4...
hdi: WDC WD1600JB-00FUA0, ATA DISK drive
ide4 at 0xd400-0xd407,0xd002 on irq 16
Probing IDE interface ide5...
hdk: WDC WD1600JB-00EVA0, ATA DISK drive
ide5 at 0xcc00-0xcc07,0xc802 on irq 16
VP_IDE: IDE controller at PCI slot 0000:00:11.1
ACPI: PCI Interrupt 0000:00:11.1[A]: no GSI
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
VP_IDE: VIA vt8233 (rev 00) IDE UDMA100 controller on pci0000:00:11.1
    ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:DMA, hdd:pio
Probing IDE interface ide0...
hda: probing with STATUS(0x50) instead of ALTSTATUS(0x7f)
hda: Maxtor 6L200P0, ATA DISK drive
hdb: probing with STATUS(0x00) instead of ALTSTATUS(0x7f)
hdb: probing with STATUS(0x00) instead of ALTSTATUS(0x7f)
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: Maxtor 6Y160P0, ATA DISK drive
ide1 at 0x170-0x177,0x376 on irq 15
hde: max request size: 1024KiB
hde: 320173056 sectors (163928 MB) w/7936KiB Cache, CHS=19929/255/63, UDMA(100)
hde: cache flushes supported
 hde: hde1
hdg: max request size: 1024KiB
hdg: 312581808 sectors (160041 MB) w/8192KiB Cache, CHS=19457/255/63, UDMA(100)
hdg: cache flushes supported
 hdg: hdg1
hdi: max request size: 1024KiB
hdi: 312581808 sectors (160041 MB) w/8192KiB Cache, CHS=19457/255/63, UDMA(100)
hdi: cache flushes supported
 hdi: hdi1
hdk: max request size: 1024KiB
hdk: 312581808 sectors (160041 MB) w/8192KiB Cache, CHS=19457/255/63, UDMA(100)
hdk: cache flushes supported
 hdk: hdk1
hda: max request size: 1024KiB
hda: Host Protected Area detected.
        current capacity is 398297088 sectors (203928 MB)
        native  capacity is 208391808845824 sectors (106696606129 MB)
hda: task_no_data_intr: status=0x51 { DriveReady SeekComplete Error }
hda: task_no_data_intr: error=0x04 { DriveStatusError }
ide: failed opcode was: 0x37
hda: 398297088 sectors (203928 MB) w/8192KiB Cache, CHS=24792/255/63, UDMA(100)
hda: cache flushes supported
 hda: hda1 hda2 hda3 hda4
hdc: max request size: 1024KiB
hdc: 320173056 sectors (163928 MB) w/7936KiB Cache, CHS=19929/255/63, UDMA(100)
hdc: cache flushes supported
 hdc: hdc1
libata version 1.11 loaded.
ieee1394: raw1394: /dev/raw1394 device initialized
mice: PS/2 mouse device common for all mice
md: raid5 personality registered as nr 4
raid5: measuring checksumming speed
   8regs     :  1880.000 MB/sec
   8regs_prefetch:  1780.000 MB/sec
   32regs    :  1316.000 MB/sec
   32regs_prefetch:  1388.000 MB/sec
   pII_mmx   :  3760.000 MB/sec
   p5_mmx    :  5044.000 MB/sec
raid5: using function: p5_mmx (5044.000 MB/sec)
md: md driver 0.90.1 MAX_MD_DEVS=256, MD_SB_DISKS=27
device-mapper: 4.4.0-ioctl (2005-01-12) initialised: dm-devel@redhat.com
Advanced Linux Sound Architecture Driver Version 1.0.9rc2  (Thu Mar 24 10:33:39 2005 UTC).
ALSA device list:
  No soundcards found.
oprofile: using NMI interrupt.
NET: Registered protocol family 2
IP: routing cache hash table of 4096 buckets, 64Kbytes
TCP established hash table entries: 131072 (order: 9, 2097152 bytes)
TCP bind hash table entries: 65536 (order: 7, 786432 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
ip_conntrack version 2.1 (6143 buckets, 49144 max) - 220 bytes per conntrack
ip_tables: (C) 2000-2002 Netfilter core team
input: AT Translated Set 2 keyboard on isa0060/serio0
ipt_recent v0.3.1: Stephen Frost <sfrost@snowman.net>.  http://snowman.net/projects/ipt_recent/
arp_tables: (C) 2002 David S. Miller
NET: Registered protocol family 1
NET: Registered protocol family 17
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 240k freed
kjournald starting.  Commit interval 5 seconds
input: ImExPS/2 Generic Explorer Mouse on isa0060/serio1
Adding 1469936k swap on /dev/hda2.  Priority:-1 extents:1
EXT3 FS on hda3, internal journal
md: md0 stopped.
md: raidstart(pid 4118) used deprecated START_ARRAY ioctl. This will not be supported beyond 2.6
md: autorun ...
md: considering hdk1 ...
md:  adding hdk1 ...
md:  adding hdi1 ...
md:  adding hdg1 ...
md:  adding hde1 ...
md:  adding hdc1 ...
md:  adding hda4 ...
md: created md0
md: bind<hda4>
md: bind<hdc1>
md: bind<hde1>
md: bind<hdg1>
md: bind<hdi1>
md: bind<hdk1>
md: running: <hdk1><hdi1><hdg1><hde1><hdc1><hda4>
raid5: device hdk1 operational as raid disk 5
raid5: device hdi1 operational as raid disk 4
raid5: device hdg1 operational as raid disk 3
raid5: device hde1 operational as raid disk 2
raid5: device hdc1 operational as raid disk 1
raid5: device hda4 operational as raid disk 0
raid5: allocated 6292kB for md0
raid5: raid level 5 set md0 active with 6 out of 6 devices, algorithm 3
RAID5 conf printout:
 --- rd:6 wd:6 fd:0
 disk 0, o:1, dev:hda4
 disk 1, o:1, dev:hdc1
 disk 2, o:1, dev:hde1
 disk 3, o:1, dev:hdg1
 disk 4, o:1, dev:hdi1
 disk 5, o:1, dev:hdk1
md: ... autorun DONE.
Back to top
View user's profile Send private message
flybynite
l33t
l33t


Joined: 06 Dec 2002
Posts: 620

PostPosted: Mon Oct 10, 2005 3:32 am    Post subject: Reply with quote

I am not a raid expert!! I have had disks fail however. You can use mdadm to fail this disk and then run the raid array degraded if you need to. Then you can treat this disk like any other.

This is the first I've seen of the host protected area. Look at this page to see some stuff about HPA down a little from the top.
http://www.sleuthkit.org/informer/sleuthkit-informer-17.html


It seems some software can detect it and some can't. That maybe why it shows now after a kernel or other software upgrade?


You need to run smartctl to check the status of the disk. This will check the builtin failure detection of the disk.

* sys-apps/smartmontools
Available versions: 5.33
Installed: 5.33
Homepage: http://smartmontools.sourceforge.net/
Description: control and monitor storage systems using the Self-Monitoring, Analysis and Reporting Technology



emerge -va smartmontools

Then

smartctl -a /dev/hda

Which will show something like this:

Code:

gate1 ~ # smartctl -a /dev/hdc
smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     ST3200822A
Serial Number:    3LJ06ZE9
Firmware Version: 3.01
User Capacity:    200,049,647,616 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 2
Local Time is:    Sun Oct  9 22:28:26 2005 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 430) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 111) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   051   047   006    Pre-fail  Always       -       72127917
  3 Spin_Up_Time            0x0003   096   096   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       90
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   087   060   030    Pre-fail  Always       -       568101779
  9 Power_On_Hours          0x0032   090   090   000    Old_age   Always       -       9521
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       112
194 Temperature_Celsius     0x0022   044   052   000    Old_age   Always       -       44
195 Hardware_ECC_Recovered  0x001a   051   046   000    Old_age   Always       -       72127917
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      6427         -
# 2  Short offline       Completed without error       00%      6185         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum