View previous topic :: View next topic |
Author |
Message |
Anard Apprentice
![Apprentice Apprentice](/images/ranks/rank_rect_2.gif)
![](images/avatars/1830926595630360f574054.jpg)
Joined: 01 Oct 2020 Posts: 244
|
Posted: Sat Nov 11, 2023 11:42 am Post subject: [ZFS] Error on exiting Suspend to RAM (partially solved) |
|
|
Hi.
I just installed a new hard drive to replace older one formatted in HFS+. I installed sys-fs/zfs and made a zpool on this new disk. I copied all my data from older disk and now stopped using older one.
I want to validate this new filesystem before going further. In future, I first want to attach my old disk as a ZFS mirror, and next buy a 3rd disk to use a 3 disks zraid system. This to securize my data and extend my virtual disk capacity without the need of buying very large hard drives.
Anyway. I enconter a problem when my system goes in sleep mode. ZFS sees I/O errors on my drive and stops using it as it can't restore it from another drive. For the moment my zpool contains only 1 drive.
Before erasing my old HFS disk to use it in my ZFS mirror, I'd like to be sure it won't corrupt any data.
Do you know why it doesn't support hibernation and how to solve this problem ?
Thanks for help. _________________ "iMack" : GA-H97M-D3H, Intel i7 4790, 16Go DDR3, Sapphire RX570 4Go, 2x SSD 256Go, HDD 500Go + Zpool 3x2To / Clover - macOS Mojave / Gentoo-Xfce
Last edited by Anard on Sat Nov 11, 2023 7:07 pm; edited 2 times in total |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
alamahant Advocate
![Advocate Advocate](/images/ranks/rank-G-1-advocate.gif)
Joined: 23 Mar 2019 Posts: 3950
|
Posted: Sat Nov 11, 2023 1:58 pm Post subject: |
|
|
Hibernation means sleep to swap,no?
Do you have a swap partition?
Do you have a resume=UUID=<uuid of swap partition> in grub cmdline?
Quote: |
ZFS sees I/O errors on my drive and stops using it as it can't restore it from another drive
|
What kind of errors?
How do you attempt to hibernate? _________________
![Smile :)](images/smiles/icon_smile.gif) |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
Anard Apprentice
![Apprentice Apprentice](/images/ranks/rank_rect_2.gif)
![](images/avatars/1830926595630360f574054.jpg)
Joined: 01 Oct 2020 Posts: 244
|
Posted: Sat Nov 11, 2023 2:36 pm Post subject: |
|
|
alamahant wrote: | Hibernation means sleep to swap,no? |
Not sure, I think yes. It's managed via Xfce's power manager (set to sleep after 45 minutes of inactivity). I would say "suspend to RAM" as it's what is asked by xfce4-session when trying to enter sleep mode manually.
Quote: | Do you have a swap partition? |
Yes, about same size as my RAM, on another drive.
Code: | $ swapon -s
$ swapon -s
Nom fichier Type Taille Utilisé Priorité
/dev/sdc4 partition 15749116 0 -2
/dev/zram0 partition 4194300 0 10000
$ free
total utilisé libre partagé tamp/cache disponible
Mem: 15813364 1981776 11561732 325964 2269856 13169988
Partition d'échange: 19943416 0 19943416 |
Quote: | Do you have a resume=UUID=<uuid of swap partition> in grub cmdline? |
No.
Code: | #GRUB_CMDLINE_LINUX=""
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash" |
Quote: | What kind of errors?
How do you attempt to hibernate? |
Here is status of zpool before hibernation.
Code: | $ zpool status
pool: medias_zfs
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
medias_zfs ONLINE 0 0 0
sde ONLINE 0 0 0
errors: No known data errors |
I enter sleep mode (not hybrid nor deep sleep mode) via Xfce's panel (it says "do you want to enter sleep to RAM ?") or let Xfce's power manager to enter sleep mode automatically. After what I resume from sleep. I precise that ZFS filesystem isn't used when I enter sleep mode.
Nothing changed :
Code: | $ zpool status
pool: medias_zfs
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
medias_zfs ONLINE 0 0 0
sde ONLINE 0 0 0
errors: No known data errors |
But if I try to open the zpool filesystem to read any document, Thunar or any other tool can't load nothing. And zpool is suspended :
Code: | $ zpool status -x
pool: medias_zfs
state: SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-HC
config:
NAME STATE READ WRITE CKSUM
medias_zfs UNAVAIL 0 0 0 insufficient replicas
sde FAULTED 3 0 0 too many errors
errors: List of errors unavailable: pool I/O is currently suspended
errors: 8 data errors, use '-v' for a list |
It seems that the only solution to restore ONLINE status is to reboot the system.
Code: | # zpool clear medias_zfs
cannot clear errors for medias_zfs: I/O error
# zpool export medias_zfs
cannot unmount '/media/Medias_ZFS': pool or dataset is busy |
_________________ "iMack" : GA-H97M-D3H, Intel i7 4790, 16Go DDR3, Sapphire RX570 4Go, 2x SSD 256Go, HDD 500Go + Zpool 3x2To / Clover - macOS Mojave / Gentoo-Xfce |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
Hu Administrator
![Administrator Administrator](/images/ranks/rank-admin.gif)
Joined: 06 Mar 2007 Posts: 23100
|
Posted: Sat Nov 11, 2023 4:16 pm Post subject: |
|
|
Anard: you keep referring to hibernation, but as alamahant indicated, "hibernation" is typically understood to mean that the system writes RAM to swap, then turns off power. Later, when power is turned on, a new kernel boots, notices the hibernation image, and copies that into RAM, putting the applications back in the state they were in at hibernate time. Your second post states that you are using suspend-to-RAM, often known as "S3". This is a different power-saving mode, and has different potential failure modes. Therefore, you have a suspend-to-RAM problem, not a hibernation problem.
Your problem looks to me like your drive does not recover after S3, and the kernel passes through that I/O no longer works. Does dmesg after resume agree with that? After this failure, what is the output of dd if=/dev/device-for-zfs of=/dev/null iflag=direct bs=4K count=1? |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
Anard Apprentice
![Apprentice Apprentice](/images/ranks/rank_rect_2.gif)
![](images/avatars/1830926595630360f574054.jpg)
Joined: 01 Oct 2020 Posts: 244
|
Posted: Sat Nov 11, 2023 4:26 pm Post subject: |
|
|
OK sorry for this approximation (and bad english )
here is :
Code: | imack ~ # zpool status
pool: medias_zfs
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
medias_zfs ONLINE 0 0 0
sde ONLINE 0 0 0
errors: No known data errors
________ SUSPEND TO RAM NOW AND WAKE UP ________
imack ~ $ dmesg
[ 138.789505] PM: suspend exit
[ 140.890466] r8169 0000:02:00.0 enp2s0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 143.125994] ata3: link is slow to respond, please be patient (ready=0)
[ 143.129350] ata4: link is slow to respond, please be patient (ready=0)
[ 144.096001] ata11: link is slow to respond, please be patient (ready=0)
[ 144.099346] ata8: link is slow to respond, please be patient (ready=0)
[ 144.102684] ata14: link is slow to respond, please be patient (ready=0)
[ 144.106015] ata9: link is slow to respond, please be patient (ready=0)
[ 144.106024] ata10: link is slow to respond, please be patient (ready=0)
[ 144.112683] ata7: link is slow to respond, please be patient (ready=0)
[ 144.146045] ata12: link is slow to respond, please be patient (ready=0)
[ 144.149355] ata13: link is slow to respond, please be patient (ready=0)
[ 144.489338] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 144.533479] ata3.00: configured for UDMA/133
[ 145.009398] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 145.012731] ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 145.022734] ata8: found unknown device (class 0)
[ 145.022754] ata8: SATA link down (SStatus 0 SControl 310)
[ 145.026066] ata10: found unknown device (class 0)
[ 145.026086] ata10: SATA link down (SStatus 0 SControl 310)
[ 145.026112] ata7: found unknown device (class 0)
[ 145.026130] ata7: SATA link down (SStatus 0 SControl 310)
[ 145.052739] ata12: found unknown device (class 0)
[ 145.052760] ata12: SATA link down (SStatus 0 SControl 310)
[ 145.056043] ata13: found unknown device (class 0)
[ 145.056062] ata13: SATA link down (SStatus 0 SControl 310)
[ 145.062691] ata11: found unknown device (class 0)
[ 145.062710] ata11: SATA link down (SStatus 0 SControl 310)
[ 146.385991] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 146.388506] ata4.00: configured for UDMA/133
[ 148.622671] hid-generic 0005:05AC:0220.0004: unknown main item tag 0x0
[ 148.622832] input: XM-KEY-BT4 Keyboard as /devices/pci0000:00/0000:00:14.0/usb3/3-11/3-11:1.0/bluetooth/hci0/hci0:12/0005:05AC:0220.0004/input/input19
[ 148.623096] hid-generic 0005:05AC:0220.0004: input,hidraw2: BLUETOOTH HID v1.1b Keyboard [XM-KEY-BT4] on 5c:f3:70:65:12:51
[ 150.099367] ata9.00: qc timeout after 5000 msecs (cmd 0xec)
[ 150.099388] ata14.00: qc timeout after 5000 msecs (cmd 0xa1)
[ 150.100484] ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 150.100494] ata9.00: revalidation failed (errno=-5)
[ 150.100511] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 150.100522] ata14.00: revalidation failed (errno=-5)
[ 150.416447] ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 150.416491] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
imack ~ # dd if=/dev/sde1 of=/dev/null iflag=direct bs=4K count=1
dd: erreur de lecture dans '/dev/sde1': Erreur d'entrée/sortie
0+0 enregistrements lus
0+0 enregistrements écrits
0 octet copié, 0,000323672 s, 0,0 kB/s
imack ~ # zpool status
pool: medias_zfs
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
medias_zfs ONLINE 0 0 0
sde ONLINE 0 0 0
errors: No known data errors
__________ TRY TO OPEN DISK IN THUNAR (/media/Medias_ZFS) ________
imack ~ # zpool status
pool: medias_zfs
state: SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-HC
config:
NAME STATE READ WRITE CKSUM
medias_zfs ONLINE 0 0 0
sde ONLINE 3 8 0
errors: List of errors unavailable: pool I/O is currently suspended
errors: 4 data errors, use '-v' for a list
imack ~ # zpool clear medias_zfs
cannot clear errors for medias_zfs: I/O error
imack ~ # zpool status
pool: medias_zfs
state: SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-HC
config:
NAME STATE READ WRITE CKSUM
medias_zfs UNAVAIL 0 0 0 insufficient replicas
sde FAULTED 0 0 0 too many errors
errors: List of errors unavailable: pool I/O is currently suspended
errors: 4 data errors, use '-v' for a list
|
Could it be a problem on sector size ?
I didn't set it explicitly on creating the zpool.
Code: | # fdisk -l
Disque /dev/sde : 1,82 TiB, 2000398934016 octets, 3907029168 secteurs
Modèle de disque : WDC WD20EFRX-68E
Unités : secteur de 1 × 512 = 512 octets
Taille de secteur (logique / physique) : 512 octets / 4096 octets
taille d'E/S (minimale / optimale) : 4096 octets / 4096 octets
Type d'étiquette de disque : gpt
Identifiant de disque : 0AF8896E-61A5-ED42-8BD6-7BEE8D610DD5
Périphérique Début Fin Secteurs Taille Type
/dev/sde1 2048 3907012607 3907010560 1,8T /usr Solaris et ZFS Apple
/dev/sde9 3907012608 3907028991 16384 8M Réservé 1 Solaris |
_________________ "iMack" : GA-H97M-D3H, Intel i7 4790, 16Go DDR3, Sapphire RX570 4Go, 2x SSD 256Go, HDD 500Go + Zpool 3x2To / Clover - macOS Mojave / Gentoo-Xfce |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
Hu Administrator
![Administrator Administrator](/images/ranks/rank-admin.gif)
Joined: 06 Mar 2007 Posts: 23100
|
Posted: Sat Nov 11, 2023 4:46 pm Post subject: |
|
|
Anard wrote: | OK sorry for this approximation (and bad english ) | That is fine. I mentioned it with the idea that you should change the thread subject, which I see you have now done. Anard wrote: | Code: | imack ~ $ dmesg
[ 138.789505] PM: suspend exit
[ 150.099367] ata9.00: qc timeout after 5000 msecs (cmd 0xec)
[ 150.099388] ata14.00: qc timeout after 5000 msecs (cmd 0xa1)
[ 150.100484] ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 150.100494] ata9.00: revalidation failed (errno=-5)
[ 150.100511] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 150.100522] ata14.00: revalidation failed (errno=-5)
[ 150.416447] ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 150.416491] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) |
| This looks worrisome to me, though I do not deal with drive errors routinely, so it is possible that I am being overly cautious about a harmless error. Anard wrote: | Code: | imack ~ # dd if=/dev/sde1 of=/dev/null iflag=direct bs=4K count=1
dd: erreur de lecture dans '/dev/sde1': Erreur d'entrée/sortie |
| Could you rerun with LC_ALL=C dd if=/dev/sde1 of=/dev/null iflag=direct bs=4K count=1, so that it is not translated? I can tell that this is an error condition. I can say that I hoped there would be no error. However, I am unsure whether this error is that you invoked it wrong (such as by providing the wrong device for if=) or if the system is in a bad state. I suspect this is Input/output error. If so, then it tells us that you cannot get any data off this drive at this time, even when bypassing ZFS. Therefore, this is not a ZFS vs suspend-to-RAM problem. It is a problem that suspend-to-RAM makes your disk unusable after resume, regardless of what filesystem you put on the disk. That does not solve your problem, but it does suggest that we need to examine the lower levels first to understand why the disk becomes unusable. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
Anard Apprentice
![Apprentice Apprentice](/images/ranks/rank_rect_2.gif)
![](images/avatars/1830926595630360f574054.jpg)
Joined: 01 Oct 2020 Posts: 244
|
Posted: Sat Nov 11, 2023 5:15 pm Post subject: |
|
|
OK !!
Not completely solved :
Drive was plugged via a SATA extension PCI card (Marvell 9230). This to open the ability of adding drives in future.
As I have a free SATA port on my MB, I just tried to switch the connection of the disk to this free port (PCI SATA expander is not used at all now).
Seems to be OK, but I still have errors in dmesg :
Code: | # LC_ALL=C dd if=/dev/sde1 of=/dev/null iflag=direct bs=4K count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000249805 s, 16.4 MB/s
__________ SUSPEND AND WAKE UP NOW ___________
$ dmesg
[ 83.474279] PM: suspend exit
[ 85.322688] ata13: found unknown device (class 0)
[ 85.322709] ata13: SATA link down (SStatus 0 SControl 310)
[ 85.322741] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 85.322772] ata8: found unknown device (class 0)
[ 85.322790] ata8: SATA link down (SStatus 0 SControl 310)
[ 85.322809] ata7: found unknown device (class 0)
[ 85.322823] ata7: SATA link down (SStatus 0 SControl 310)
[ 85.322839] ata9: found unknown device (class 0)
[ 85.322852] ata9: SATA link down (SStatus 0 SControl 310)
[ 85.322870] ata11: found unknown device (class 0)
[ 85.322883] ata11: SATA link down (SStatus 0 SControl 310)
[ 85.329361] ata12: found unknown device (class 0)
[ 85.329379] ata12: SATA link down (SStatus 0 SControl 310)
[ 85.372701] ata10: found unknown device (class 0)
[ 85.372723] ata10: SATA link down (SStatus 0 SControl 310)
[ 85.632236] r8169 0000:02:00.0 enp2s0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 87.842646] ata3: link is slow to respond, please be patient (ready=0)
[ 87.842659] ata6: link is slow to respond, please be patient (ready=0)
[ 87.856001] ata4: link is slow to respond, please be patient (ready=0)
[ 89.292651] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 89.338422] ata3.00: configured for UDMA/133
[ 89.339305] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 89.510331] ata6.00: ACPI cmd f5/00:00:00:00:00:00(SECURITY FREEZE LOCK) filtered out
[ 89.510337] ata6.00: ACPI cmd b1/c1:00:00:00:00:00(DEVICE CONFIGURATION OVERLAY) filtered out
[ 89.511678] ata6.00: ACPI cmd f5/00:00:00:00:00:00(SECURITY FREEZE LOCK) filtered out
[ 89.511681] ata6.00: ACPI cmd b1/c1:00:00:00:00:00(DEVICE CONFIGURATION OVERLAY) filtered out
[ 89.512143] ata6.00: configured for UDMA/133
[ 90.366050] ata14.00: qc timeout after 5000 msecs (cmd 0xa1)
[ 90.367153] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 90.367160] ata14.00: revalidation failed (errno=-5)
[ 90.682795] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 91.232720] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 91.235352] ata4.00: configured for UDMA/133
[ 93.712094] hid-generic 0005:05AC:0220.0004: unknown main item tag 0x0
[ 93.712253] input: XM-KEY-BT4 Keyboard as /devices/pci0000:00/0000:00:14.0/usb3/3-11/3-11:1.0/bluetooth/hci0/hci0:12/0005:05AC:0220.0004/input/input19
[ 93.712393] hid-generic 0005:05AC:0220.0004: input,hidraw2: BLUETOOTH HID v1.1b Keyboard [XM-KEY-BT4] on 5c:f3:70:65:12:51
# LC_ALL=C dd if=/dev/sde1 of=/dev/null iflag=direct bs=4K count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.0128701 s, 318 kB/s
_____ I CAN STILL READ IT IN THUNAR ________
# zpool status
pool: medias_zfs
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
medias_zfs ONLINE 0 0 0
sde ONLINE 0 0 0
errors: No known data errors |
So, I understand that my Marvell PCI expander don't wake up properly, regardless filesystem used.
How should I add more disks to my zpool ? As my SATA ports on MB are all used.
About dd command translation, sorry, it have no more errors, and I prefer not forcing them
But I can translate backwards previous commands (which don't appear any more for the moment) :
Code: | imack ~ # dd if=/dev/sde1 of=/dev/null iflag=direct bs=4K count=1
dd: read error in '/dev/sde1': I/O error |
_________________ "iMack" : GA-H97M-D3H, Intel i7 4790, 16Go DDR3, Sapphire RX570 4Go, 2x SSD 256Go, HDD 500Go + Zpool 3x2To / Clover - macOS Mojave / Gentoo-Xfce |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
Hu Administrator
![Administrator Administrator](/images/ranks/rank-admin.gif)
Joined: 06 Mar 2007 Posts: 23100
|
Posted: Sat Nov 11, 2023 5:49 pm Post subject: |
|
|
It may be possible to get the Marvell PCI extender to work, but I cannot tell you how to do so, or even if it can be done.
That "slow to respond" is unfortunate, but if everything else works, I think it can be ignored. Is ata14 the Marvell extender? Even though you have no disks on it, if the card is still connected, it might be getting detected and probed, then triggering this warning when the probe fails.
Omitting the error text reproduction is fine, since you lost access to it by fixing the underlying problem. ![Smile :)](images/smiles/icon_smile.gif) |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
Anard Apprentice
![Apprentice Apprentice](/images/ranks/rank_rect_2.gif)
![](images/avatars/1830926595630360f574054.jpg)
Joined: 01 Oct 2020 Posts: 244
|
Posted: Sat Nov 11, 2023 6:11 pm Post subject: |
|
|
It's possible. As you said, the extender is still connected, but without disks attached to it.
Maybe should it be possible to ask my system to wait several seconds before trying to wake up these ports ?
Else, I'll try to connect my DVD writer & my /home disk (ext4) and see how it works. Which would free 2 SATA ports from my motherboard.
Anyway, thanks you 2 for help. _________________ "iMack" : GA-H97M-D3H, Intel i7 4790, 16Go DDR3, Sapphire RX570 4Go, 2x SSD 256Go, HDD 500Go + Zpool 3x2To / Clover - macOS Mojave / Gentoo-Xfce |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|