Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[ZFS] Error on exiting Suspend to RAM (partially solved)
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Anard
Apprentice
Apprentice


Joined: 01 Oct 2020
Posts: 244

PostPosted: Sat Nov 11, 2023 11:42 am    Post subject: [ZFS] Error on exiting Suspend to RAM (partially solved) Reply with quote

Hi.

I just installed a new hard drive to replace older one formatted in HFS+. I installed sys-fs/zfs and made a zpool on this new disk. I copied all my data from older disk and now stopped using older one.
I want to validate this new filesystem before going further. In future, I first want to attach my old disk as a ZFS mirror, and next buy a 3rd disk to use a 3 disks zraid system. This to securize my data and extend my virtual disk capacity without the need of buying very large hard drives.

Anyway. I enconter a problem when my system goes in sleep mode. ZFS sees I/O errors on my drive and stops using it as it can't restore it from another drive. For the moment my zpool contains only 1 drive.
Before erasing my old HFS disk to use it in my ZFS mirror, I'd like to be sure it won't corrupt any data.

Do you know why it doesn't support hibernation and how to solve this problem ?

Thanks for help.
_________________
"iMack" : GA-H97M-D3H, Intel i7 4790, 16Go DDR3, Sapphire RX570 4Go, 2x SSD 256Go, HDD 500Go + Zpool 3x2To / Clover - macOS Mojave / Gentoo-Xfce


Last edited by Anard on Sat Nov 11, 2023 7:07 pm; edited 2 times in total
Back to top
View user's profile Send private message
alamahant
Advocate
Advocate


Joined: 23 Mar 2019
Posts: 3950

PostPosted: Sat Nov 11, 2023 1:58 pm    Post subject: Reply with quote

Hibernation means sleep to swap,no?
Do you have a swap partition?
Do you have a resume=UUID=<uuid of swap partition> in grub cmdline?
Quote:

ZFS sees I/O errors on my drive and stops using it as it can't restore it from another drive

What kind of errors?
How do you attempt to hibernate?
_________________
:)
Back to top
View user's profile Send private message
Anard
Apprentice
Apprentice


Joined: 01 Oct 2020
Posts: 244

PostPosted: Sat Nov 11, 2023 2:36 pm    Post subject: Reply with quote

alamahant wrote:
Hibernation means sleep to swap,no?

Not sure, I think yes. It's managed via Xfce's power manager (set to sleep after 45 minutes of inactivity). I would say "suspend to RAM" as it's what is asked by xfce4-session when trying to enter sleep mode manually.

Quote:
Do you have a swap partition?

Yes, about same size as my RAM, on another drive.
Code:
$ swapon -s
$ swapon -s
Nom fichier            Type      Taille      Utilisé      Priorité
/dev/sdc4                               partition   15749116   0      -2
/dev/zram0                              partition   4194300      0      10000
$ free
               total       utilisé      libre     partagé tamp/cache   disponible
Mem:        15813364     1981776    11561732      325964     2269856    13169988
Partition d'échange:   19943416           0    19943416


Quote:
Do you have a resume=UUID=<uuid of swap partition> in grub cmdline?

No.
Code:
#GRUB_CMDLINE_LINUX=""
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"


Quote:
What kind of errors?
How do you attempt to hibernate?


Here is status of zpool before hibernation.
Code:
$ zpool status
  pool: medias_zfs
 state: ONLINE
config:

   NAME        STATE     READ WRITE CKSUM
   medias_zfs  ONLINE       0     0     0
     sde       ONLINE       0     0     0

errors: No known data errors


I enter sleep mode (not hybrid nor deep sleep mode) via Xfce's panel (it says "do you want to enter sleep to RAM ?") or let Xfce's power manager to enter sleep mode automatically. After what I resume from sleep. I precise that ZFS filesystem isn't used when I enter sleep mode.
Nothing changed :

Code:
$ zpool status
  pool: medias_zfs
 state: ONLINE
config:

   NAME        STATE     READ WRITE CKSUM
   medias_zfs  ONLINE       0     0     0
     sde       ONLINE       0     0     0

errors: No known data errors


But if I try to open the zpool filesystem to read any document, Thunar or any other tool can't load nothing. And zpool is suspended :
Code:
$ zpool status -x
  pool: medias_zfs
 state: SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-HC
config:

   NAME        STATE     READ WRITE CKSUM
   medias_zfs  UNAVAIL      0     0     0  insufficient replicas
     sde       FAULTED      3     0     0  too many errors
errors: List of errors unavailable: pool I/O is currently suspended

errors: 8 data errors, use '-v' for a list


It seems that the only solution to restore ONLINE status is to reboot the system.

Code:
# zpool clear medias_zfs
cannot clear errors for medias_zfs: I/O error
# zpool export medias_zfs
cannot unmount '/media/Medias_ZFS': pool or dataset is busy

_________________
"iMack" : GA-H97M-D3H, Intel i7 4790, 16Go DDR3, Sapphire RX570 4Go, 2x SSD 256Go, HDD 500Go + Zpool 3x2To / Clover - macOS Mojave / Gentoo-Xfce
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 23100

PostPosted: Sat Nov 11, 2023 4:16 pm    Post subject: Reply with quote

Anard: you keep referring to hibernation, but as alamahant indicated, "hibernation" is typically understood to mean that the system writes RAM to swap, then turns off power. Later, when power is turned on, a new kernel boots, notices the hibernation image, and copies that into RAM, putting the applications back in the state they were in at hibernate time. Your second post states that you are using suspend-to-RAM, often known as "S3". This is a different power-saving mode, and has different potential failure modes. Therefore, you have a suspend-to-RAM problem, not a hibernation problem.

Your problem looks to me like your drive does not recover after S3, and the kernel passes through that I/O no longer works. Does dmesg after resume agree with that? After this failure, what is the output of dd if=/dev/device-for-zfs of=/dev/null iflag=direct bs=4K count=1?
Back to top
View user's profile Send private message
Anard
Apprentice
Apprentice


Joined: 01 Oct 2020
Posts: 244

PostPosted: Sat Nov 11, 2023 4:26 pm    Post subject: Reply with quote

OK sorry for this approximation (and bad english :) )

here is :

Code:
imack ~ # zpool status
  pool: medias_zfs
 state: ONLINE
config:

   NAME        STATE     READ WRITE CKSUM
   medias_zfs  ONLINE       0     0     0
     sde       ONLINE       0     0     0

errors: No known data errors

________ SUSPEND TO RAM NOW AND WAKE UP ________

imack ~ $ dmesg
[  138.789505] PM: suspend exit
[  140.890466] r8169 0000:02:00.0 enp2s0: Link is Up - 1Gbps/Full - flow control rx/tx
[  143.125994] ata3: link is slow to respond, please be patient (ready=0)
[  143.129350] ata4: link is slow to respond, please be patient (ready=0)
[  144.096001] ata11: link is slow to respond, please be patient (ready=0)
[  144.099346] ata8: link is slow to respond, please be patient (ready=0)
[  144.102684] ata14: link is slow to respond, please be patient (ready=0)
[  144.106015] ata9: link is slow to respond, please be patient (ready=0)
[  144.106024] ata10: link is slow to respond, please be patient (ready=0)
[  144.112683] ata7: link is slow to respond, please be patient (ready=0)
[  144.146045] ata12: link is slow to respond, please be patient (ready=0)
[  144.149355] ata13: link is slow to respond, please be patient (ready=0)
[  144.489338] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  144.533479] ata3.00: configured for UDMA/133
[  145.009398] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[  145.012731] ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  145.022734] ata8: found unknown device (class 0)
[  145.022754] ata8: SATA link down (SStatus 0 SControl 310)
[  145.026066] ata10: found unknown device (class 0)
[  145.026086] ata10: SATA link down (SStatus 0 SControl 310)
[  145.026112] ata7: found unknown device (class 0)
[  145.026130] ata7: SATA link down (SStatus 0 SControl 310)
[  145.052739] ata12: found unknown device (class 0)
[  145.052760] ata12: SATA link down (SStatus 0 SControl 310)
[  145.056043] ata13: found unknown device (class 0)
[  145.056062] ata13: SATA link down (SStatus 0 SControl 310)
[  145.062691] ata11: found unknown device (class 0)
[  145.062710] ata11: SATA link down (SStatus 0 SControl 310)
[  146.385991] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  146.388506] ata4.00: configured for UDMA/133
[  148.622671] hid-generic 0005:05AC:0220.0004: unknown main item tag 0x0
[  148.622832] input: XM-KEY-BT4 Keyboard as /devices/pci0000:00/0000:00:14.0/usb3/3-11/3-11:1.0/bluetooth/hci0/hci0:12/0005:05AC:0220.0004/input/input19
[  148.623096] hid-generic 0005:05AC:0220.0004: input,hidraw2: BLUETOOTH HID v1.1b Keyboard [XM-KEY-BT4] on 5c:f3:70:65:12:51
[  150.099367] ata9.00: qc timeout after 5000 msecs (cmd 0xec)
[  150.099388] ata14.00: qc timeout after 5000 msecs (cmd 0xa1)
[  150.100484] ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[  150.100494] ata9.00: revalidation failed (errno=-5)
[  150.100511] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[  150.100522] ata14.00: revalidation failed (errno=-5)
[  150.416447] ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  150.416491] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

imack ~ # dd if=/dev/sde1 of=/dev/null iflag=direct bs=4K count=1
dd: erreur de lecture dans '/dev/sde1': Erreur d'entrée/sortie
0+0 enregistrements lus
0+0 enregistrements écrits
0 octet copié, 0,000323672 s, 0,0 kB/s
imack ~ # zpool status
  pool: medias_zfs
 state: ONLINE
config:

   NAME        STATE     READ WRITE CKSUM
   medias_zfs  ONLINE       0     0     0
     sde       ONLINE       0     0     0

errors: No known data errors

__________ TRY TO OPEN DISK IN THUNAR (/media/Medias_ZFS) ________

imack ~ # zpool status
  pool: medias_zfs
 state: SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-HC
config:

   NAME        STATE     READ WRITE CKSUM
   medias_zfs  ONLINE       0     0     0
     sde       ONLINE       3     8     0
errors: List of errors unavailable: pool I/O is currently suspended

errors: 4 data errors, use '-v' for a list
imack ~ # zpool clear medias_zfs
cannot clear errors for medias_zfs: I/O error
imack ~ # zpool status
  pool: medias_zfs
 state: SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-HC
config:

   NAME        STATE     READ WRITE CKSUM
   medias_zfs  UNAVAIL      0     0     0  insufficient replicas
     sde       FAULTED      0     0     0  too many errors
errors: List of errors unavailable: pool I/O is currently suspended

errors: 4 data errors, use '-v' for a list


Could it be a problem on sector size ?
I didn't set it explicitly on creating the zpool.

Code:
 # fdisk -l
Disque /dev/sde : 1,82 TiB, 2000398934016 octets, 3907029168 secteurs
Modèle de disque : WDC WD20EFRX-68E
Unités : secteur de 1 × 512 = 512 octets
Taille de secteur (logique / physique) : 512 octets / 4096 octets
taille d'E/S (minimale / optimale) : 4096 octets / 4096 octets
Type d'étiquette de disque : gpt
Identifiant de disque : 0AF8896E-61A5-ED42-8BD6-7BEE8D610DD5

Périphérique      Début        Fin   Secteurs Taille Type
/dev/sde1          2048 3907012607 3907010560   1,8T /usr Solaris et ZFS Apple
/dev/sde9    3907012608 3907028991      16384     8M Réservé 1 Solaris

_________________
"iMack" : GA-H97M-D3H, Intel i7 4790, 16Go DDR3, Sapphire RX570 4Go, 2x SSD 256Go, HDD 500Go + Zpool 3x2To / Clover - macOS Mojave / Gentoo-Xfce
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 23100

PostPosted: Sat Nov 11, 2023 4:46 pm    Post subject: Reply with quote

Anard wrote:
OK sorry for this approximation (and bad english :) )
That is fine. I mentioned it with the idea that you should change the thread subject, which I see you have now done.
Anard wrote:
Code:
imack ~ $ dmesg
[  138.789505] PM: suspend exit
[  150.099367] ata9.00: qc timeout after 5000 msecs (cmd 0xec)
[  150.099388] ata14.00: qc timeout after 5000 msecs (cmd 0xa1)
[  150.100484] ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[  150.100494] ata9.00: revalidation failed (errno=-5)
[  150.100511] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[  150.100522] ata14.00: revalidation failed (errno=-5)
[  150.416447] ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  150.416491] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
This looks worrisome to me, though I do not deal with drive errors routinely, so it is possible that I am being overly cautious about a harmless error.
Anard wrote:
Code:
imack ~ # dd if=/dev/sde1 of=/dev/null iflag=direct bs=4K count=1
dd: erreur de lecture dans '/dev/sde1': Erreur d'entrée/sortie
Could you rerun with LC_ALL=C dd if=/dev/sde1 of=/dev/null iflag=direct bs=4K count=1, so that it is not translated? I can tell that this is an error condition. I can say that I hoped there would be no error. However, I am unsure whether this error is that you invoked it wrong (such as by providing the wrong device for if=) or if the system is in a bad state. I suspect this is Input/output error. If so, then it tells us that you cannot get any data off this drive at this time, even when bypassing ZFS. Therefore, this is not a ZFS vs suspend-to-RAM problem. It is a problem that suspend-to-RAM makes your disk unusable after resume, regardless of what filesystem you put on the disk. That does not solve your problem, but it does suggest that we need to examine the lower levels first to understand why the disk becomes unusable.
Back to top
View user's profile Send private message
Anard
Apprentice
Apprentice


Joined: 01 Oct 2020
Posts: 244

PostPosted: Sat Nov 11, 2023 5:15 pm    Post subject: Reply with quote

OK !!
Not completely solved :

Drive was plugged via a SATA extension PCI card (Marvell 9230). This to open the ability of adding drives in future.
As I have a free SATA port on my MB, I just tried to switch the connection of the disk to this free port (PCI SATA expander is not used at all now).
Seems to be OK, but I still have errors in dmesg :

Code:
# LC_ALL=C dd if=/dev/sde1 of=/dev/null iflag=direct bs=4K count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000249805 s, 16.4 MB/s

__________ SUSPEND AND WAKE UP NOW ___________

$ dmesg
[   83.474279] PM: suspend exit
[   85.322688] ata13: found unknown device (class 0)
[   85.322709] ata13: SATA link down (SStatus 0 SControl 310)
[   85.322741] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[   85.322772] ata8: found unknown device (class 0)
[   85.322790] ata8: SATA link down (SStatus 0 SControl 310)
[   85.322809] ata7: found unknown device (class 0)
[   85.322823] ata7: SATA link down (SStatus 0 SControl 310)
[   85.322839] ata9: found unknown device (class 0)
[   85.322852] ata9: SATA link down (SStatus 0 SControl 310)
[   85.322870] ata11: found unknown device (class 0)
[   85.322883] ata11: SATA link down (SStatus 0 SControl 310)
[   85.329361] ata12: found unknown device (class 0)
[   85.329379] ata12: SATA link down (SStatus 0 SControl 310)
[   85.372701] ata10: found unknown device (class 0)
[   85.372723] ata10: SATA link down (SStatus 0 SControl 310)
[   85.632236] r8169 0000:02:00.0 enp2s0: Link is Up - 1Gbps/Full - flow control rx/tx
[   87.842646] ata3: link is slow to respond, please be patient (ready=0)
[   87.842659] ata6: link is slow to respond, please be patient (ready=0)
[   87.856001] ata4: link is slow to respond, please be patient (ready=0)
[   89.292651] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   89.338422] ata3.00: configured for UDMA/133
[   89.339305] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   89.510331] ata6.00: ACPI cmd f5/00:00:00:00:00:00(SECURITY FREEZE LOCK) filtered out
[   89.510337] ata6.00: ACPI cmd b1/c1:00:00:00:00:00(DEVICE CONFIGURATION OVERLAY) filtered out
[   89.511678] ata6.00: ACPI cmd f5/00:00:00:00:00:00(SECURITY FREEZE LOCK) filtered out
[   89.511681] ata6.00: ACPI cmd b1/c1:00:00:00:00:00(DEVICE CONFIGURATION OVERLAY) filtered out
[   89.512143] ata6.00: configured for UDMA/133
[   90.366050] ata14.00: qc timeout after 5000 msecs (cmd 0xa1)
[   90.367153] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   90.367160] ata14.00: revalidation failed (errno=-5)
[   90.682795] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[   91.232720] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   91.235352] ata4.00: configured for UDMA/133
[   93.712094] hid-generic 0005:05AC:0220.0004: unknown main item tag 0x0
[   93.712253] input: XM-KEY-BT4 Keyboard as /devices/pci0000:00/0000:00:14.0/usb3/3-11/3-11:1.0/bluetooth/hci0/hci0:12/0005:05AC:0220.0004/input/input19
[   93.712393] hid-generic 0005:05AC:0220.0004: input,hidraw2: BLUETOOTH HID v1.1b Keyboard [XM-KEY-BT4] on 5c:f3:70:65:12:51

# LC_ALL=C dd if=/dev/sde1 of=/dev/null iflag=direct bs=4K count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.0128701 s, 318 kB/s

_____ I CAN STILL READ IT IN THUNAR ________

# zpool status
  pool: medias_zfs
 state: ONLINE
config:

   NAME        STATE     READ WRITE CKSUM
   medias_zfs  ONLINE       0     0     0
     sde       ONLINE       0     0     0

errors: No known data errors


So, I understand that my Marvell PCI expander don't wake up properly, regardless filesystem used.
How should I add more disks to my zpool ? As my SATA ports on MB are all used.

About dd command translation, sorry, it have no more errors, and I prefer not forcing them :P
But I can translate backwards previous commands (which don't appear any more for the moment) :
Code:
 imack ~ # dd if=/dev/sde1 of=/dev/null iflag=direct bs=4K count=1
dd: read error in '/dev/sde1': I/O error

_________________
"iMack" : GA-H97M-D3H, Intel i7 4790, 16Go DDR3, Sapphire RX570 4Go, 2x SSD 256Go, HDD 500Go + Zpool 3x2To / Clover - macOS Mojave / Gentoo-Xfce
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 23100

PostPosted: Sat Nov 11, 2023 5:49 pm    Post subject: Reply with quote

It may be possible to get the Marvell PCI extender to work, but I cannot tell you how to do so, or even if it can be done.

That "slow to respond" is unfortunate, but if everything else works, I think it can be ignored. Is ata14 the Marvell extender? Even though you have no disks on it, if the card is still connected, it might be getting detected and probed, then triggering this warning when the probe fails.

Omitting the error text reproduction is fine, since you lost access to it by fixing the underlying problem. :)
Back to top
View user's profile Send private message
Anard
Apprentice
Apprentice


Joined: 01 Oct 2020
Posts: 244

PostPosted: Sat Nov 11, 2023 6:11 pm    Post subject: Reply with quote

It's possible. As you said, the extender is still connected, but without disks attached to it.
Maybe should it be possible to ask my system to wait several seconds before trying to wake up these ports ?
Else, I'll try to connect my DVD writer & my /home disk (ext4) and see how it works. Which would free 2 SATA ports from my motherboard.
Anyway, thanks you 2 for help.
_________________
"iMack" : GA-H97M-D3H, Intel i7 4790, 16Go DDR3, Sapphire RX570 4Go, 2x SSD 256Go, HDD 500Go + Zpool 3x2To / Clover - macOS Mojave / Gentoo-Xfce
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum