View previous topic :: View next topic |
Author |
Message |
gtwrek Tux's lil' helper
Joined: 10 Mar 2017 Posts: 112 Location: San Jose, CA
|
Posted: Tue Dec 08, 2020 9:28 pm Post subject: PI SDCard reliability |
|
|
Not really specific to Gentoo, but does anyone have any experience with SDCards and reliability - I'm thinking specific to the PI.
I've burned out two new 64GB San Disk Ultras in as many months on my Raspberry PI's running Gentoo. Note, I'm fairly careful on the write load for these cards - If I ever do a source emerge on the actual Pi, I make sure /var/tmp is NFS mounted to a spinning rust server. (Most of the time, I chroot / binfmt my PI portage ebuilds on my X86 box, only going native for those builds that fail in the chroot - and then I use distcc too).
But, as mentioned, I've had two of these cards go dead in as many months. One while I was doing a backup of the system! The failure mode is the card (silently) becomes read-only. Writes appear to work, but don't actually happen. A reboot has the system is some static state in the past when the "Read Only" mode was somehow stuck - no new data is persistent across a reboot. Reformatting/partitioning the card are a no-go (again, the operation appears to work, but doesn't survive a reboot)
For the first failure, I wasn't being careful with heat on the system. I was creating a swapfile (dd if=/dev/zero ...), but noticed things were acting weird. Check the temp - was in upper 50s C. And then things died. I'm more careful with temperature now.
For the second failure, it was in a temp controlled box with a fan. The only load was the backup of the system)
I use f2fs as my root disk format. As I recall, when researching this years ago, this was supposed to be the "safe" file system for flash storage.
Anyone have any ideas/suggestions for more reliable brands?
My most stable PI, (I think a Pi2 - BCM2835) is one I setup for root to be NFS mounted. Only /boot was used on the SD Card. That box's been running for many years, with no troubles at all.... |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54747 Location: 56N 3W
|
Posted: Tue Dec 08, 2020 10:41 pm Post subject: |
|
|
gtwrek,
f2fs is good on raw FLASH storage lacking wear levelling.
SD cards hove provided wear levelling and trim for some time now, so it would not be my choice.
Since the Pi1, when trashing the SD filesystem was an everyday hazard, I've not had any SD issues.
How does f2fs do garbage collection and what happens to the free space.
How and when is it erased by the memory controller ready for reuse.
I have a SSD connected to my Pi via a USB to SATA bridge.
If you reboot in the middle of it trimming, it will not go ready until the trim has completed. That can be half an hour or so.
So, are your SD cards trimming rather than failed.
Leave them powered up but doing nothing, at least overnight. They may be OK in the morning.
Erase is a very slow operation and the memory controller will try to move things around for wear levelling, so it will erase a lot more that just the free space. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
Etal Veteran
Joined: 15 Jul 2005 Posts: 1932
|
Posted: Tue Dec 08, 2020 10:51 pm Post subject: |
|
|
NeddySeagoon wrote: | f2fs is good on raw FLASH storage lacking wear levelling.
SD cards hove provided wear levelling and trim for some time now, so it would not be my choice. |
That's not true, F2FS was designed precisely with things like SD cards in mind.
https://f2fs.wiki.kernel.org wrote: | The motive for F2FS was to build a file system that, from the start, takes into account the characteristics of NAND flash memory-based storage devices (such as solid-state disks, eMMC, and SD cards), which are widely used in computer systems ranging from mobile devices to servers. |
https://lwn.net/Articles/518988/ wrote: | As the FTL typically uses a log-structured design to provide the wear-leveling and write-gathering that flash requires, this means that there are two log structures active on the device — one in the firmware and one in the operating system. f2fs is explicitly designed to make use of this fact and leaves a number of tasks to the FTL while focusing primarily on those tasks that it is well positioned to perform. So, for example, f2fs makes no effort to distribute writes evenly across the address space to provide wear-leveling. |
|
|
Back to top |
|
|
gtwrek Tux's lil' helper
Joined: 10 Mar 2017 Posts: 112 Location: San Jose, CA
|
Posted: Tue Dec 08, 2020 11:16 pm Post subject: |
|
|
NeddySeagoon wrote: |
I have a SSD connected to my Pi via a USB to SATA bridge.
If you reboot in the middle of it trimming, it will not go ready until the trim has completed. That can be half an hour or so.
So, are your SD cards trimming rather than failed.
Leave them powered up but doing nothing, at least overnight. They may be OK in the morning.
Erase is a very slow operation and the memory controller will try to move things around for wear levelling, so it will erase a lot more that just the free space. |
One wonders - I suppose the filesystem needs to be mounted for this (trimming?) operation to work? Is there any indication that it's occuring?
(I'm already starting down the rabbit hole of googling variations of "trim|sdcard|gentoo"...) |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54747 Location: 56N 3W
|
Posted: Tue Dec 08, 2020 11:56 pm Post subject: |
|
|
gtwrek,
trim is indeed a rabbit hole.
discard is supposed to be a background opeation for FLASH based devices.
When blocks are discarded, its up to the device what it does and when it does it.
Badly implemented discard treats it as a command and dos the erase immediately. This gives rise to a lot of unnessacary 'write amplification' that is bad for the media life.
Well implemented systems, make notes and anticipate the need to do the erase, so that hey minimise wear levelling.
You can mount filesystems with the discard option, which passes discard information to the block device with every filesystem change.
I advise against that now as its not possible to tell which discard model an individual device follows.
The alternative is fstrim, run on a regular basis as a cron job. This tell the block device about all the blocks freed since it was run last.
If its run monthly or even weekly, it will still be often enough to stop you waiting for erase cycles.
It can be run manually any time you like too.
There is no indication that trim/wear levelling is in progress. Erase is power hungry, so I suspect that the drive power consumption will be high during trimming but I've not tried to measue it.
I have a 256G drive that takes up to half an hour, judging from the delay in reboots and a 512G drive in my main PC that has taken up to 2 days to sort out the mess after a power fail.
I've almost given them up as scrap on a few occasions as they do not appear in /dev while this is going on. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
gtwrek Tux's lil' helper
Joined: 10 Mar 2017 Posts: 112 Location: San Jose, CA
|
Posted: Wed Dec 09, 2020 12:01 am Post subject: |
|
|
I tried a manual fstrim (found from another post you made!):
Code: | % fstrim -v /mnt/pi
fstrim: /mnt/pi: the discard operation is not supported |
(Edit to add, googling around, this may be because I'm doing this from an USB->SDCARD reader. Some indications here that this may interfere with fstrim...)
Still need to do more reading.
Hard to believe such a critical operation has NO indications that it's occurring? But I can't judge too much here - it's way above my head. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54747 Location: 56N 3W
|
Posted: Wed Dec 09, 2020 10:35 am Post subject: |
|
|
gtwrek,
Trim has to be supported all down the chain. That includes the USB/SD interface too.
Trim can only work with USB3. USB2 uses 'bulk' mode which is rather like PIO mode in early IDE drives.
It can't pass the trim command.
This pagel is a good read.
Its written around USB SSDs bit its equally applicable to any FLASH block storage device. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
gtwrek Tux's lil' helper
Joined: 10 Mar 2017 Posts: 112 Location: San Jose, CA
|
Posted: Wed Dec 09, 2020 7:10 pm Post subject: |
|
|
Thanks for the pointers. I think I've got a 10,000ft level understanding now of what's going on with TRIM. Unfortunately it doesn't really help me at the moment. I'd like to see if those cards are still recoverable. But it doesn't appear as I have any means to allow it to try and finish the trim (as you've suggested). I don't have other devices that support TRIM, other than the PI itself. And since I've successfully pulled from backup (yea for backups!) my sdcard image and written it to a new card, the PI's now running again and unavailable for this experiment.
I'm just now left with 2 San Disk sd cards that appear dead. They're cheap, I'm not interested in recovering them for monetary reasons. I'm solely interested in creating reliable systems, and understanding this failure mechanism would be helpful.
I've added another sd card reader to my wish list - they're cheap. But it's hard to tell from online reviews / even manufacture documentation just whether or not any particular device would support trimming...(At least USB3, as you've indicated is required)
I'm beginning to question the viability of just relying on the SDcard on my PIs. I have the one successful data point of making the root disk of my PI NFS - but it's in an application that for me, performance doesn't matter (it's running my sprinkler system).
I think I've got things fairly well setup that I can quickly convert my new Pi4 to NFS root (and go back). I may go back and forth and try some benchmarks....(emerge rust! - no that's too long... Have to find a package that natively takes < 1 hour) |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54747 Location: 56N 3W
|
Posted: Wed Dec 09, 2020 8:37 pm Post subject: |
|
|
gtwrek,
After trim is instructed, all the SD card needs is power. Its an internal operation.
Any USB to SD converter will do for power.
Its needs to be USB3 to issue trim commands.
Power up the card in whatever USB to SD converter that's not in use and wait ...
== Edit ==
I've just tested my Sandisk USB3 to microSD converter for trim support.
Trim is not supported. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
ShorTie Tux's lil' helper
Joined: 12 Feb 2006 Posts: 101
|
Posted: Wed Dec 09, 2020 10:30 pm Post subject: |
|
|
Our you sure you have no power problems ??
That is the #1 killer of sdcards.
I've hammered sdcard's hard for years with compiling stuff, with no problems.
The only trashed sdcards I've made is either from power problems or not shutting down properly.
But a ssd is truely worth it speed wize.
Unpacking a stage3 tarball is like 20 minutes vs. 2 hours.
And yes, that is 2 hours of solid green light, no blinking. |
|
Back to top |
|
|
gtwrek Tux's lil' helper
Joined: 10 Mar 2017 Posts: 112 Location: San Jose, CA
|
Posted: Thu Dec 10, 2020 12:09 am Post subject: |
|
|
My PI is powered by a 3 AMP USB-C as recommended and sold from my resaler (pishop.us). One hopes that this 15 Watts would be sufficient, but I've not done a detailed power analysis...
One of the reasons I liked the PI was it's nice compact form. Most of my applications for the PI will be headless. So the only connections will be Ethernet and power (PoE infrastructure too $$$). To somehow bolt on a USB SSD drive, well, it ruins the elegance of the solution at the very least. (I understand there are some Pi HATS with some custom enclosure that might make a still elegant solution ...)
I've no desire to strive for performance just for the sake of performance. If the solution fits my needs, then I'm usually done. My main PI4 serving for my media player (media coming in via NFS) works fine, as is, running from the SD card.
I don't mind waiting some time for portage world updates. I'll set things up such that these things happen automatically, and in the middle of the night.. (And be binhost supported from a beefier machine..)
Back to the sdcard diagnostics. I've surfed too many web-sites today searching for some sort of solution that would enable me to start a "Trim" on my sd card from my X86 box. From all the research I've done, it appears none of the USB->Sdcard adapters out there will support this. I might even be willing to purchase some sort of other internal-bay device the would do the trick, but darned if I can find any documentation indicating just what sort of device supports this. Even with the relatively open documentation for the PI, I'm not sure what would indicate that SD card Trim is supported. Which currently is the only (non-SATA) device I know that does support it. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54747 Location: 56N 3W
|
Posted: Thu Dec 10, 2020 10:55 am Post subject: |
|
|
gtwrek,
If you have /boot on the SD card and everything else on say NFS, so that the Pi SD card is not used after root is mounted, you can swap SD cards in the Pi. Do ensure that all filesystems on the SD card are unmounted before you swap cards.
Its unlikely to be a power problem. If it were, it wouldn't just be you. There were any number of SD card issues with early PI 1s. So much so, it was a feature. There is nothing like that with later Pis.
Its possible that the Pi or PSU is faulty but that would just be unlucky. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
gtwrek Tux's lil' helper
Joined: 10 Mar 2017 Posts: 112 Location: San Jose, CA
|
Posted: Sat Dec 12, 2020 4:15 pm Post subject: Another SDCard fails |
|
|
Ok, another SDcard just failed. Very similar in environment (I was running a backup again). On the same PI4 where it failed last time.
This time, instead of grabbing a new SDcard and imaging from backup, I'm going to let this one just run for a while as Neddy suggested, and see if a background "trim" operation could actually resurrect things.
I already see many messages in /var/log/messages:
Code: | Dec 12 08:09:20 clove kernel: blk_update_request: I/O error, dev mmcblk0, sector 36213472 op 0x3:(DISCARD) flags 0x0 phys_seg 1 prio class 0
Dec 12 08:09:20 clove kernel: F2FS-fs: Issue discard(4461148, 4461148, 515) failed, ret: -5
Dec 12 08:09:20 clove kernel: error 0 requesting status 0x80900
Dec 12 08:09:20 clove kernel: blk_update_request: I/O error, dev mmcblk0, sector 36219368 op 0x3:(DISCARD) flags 0x0 phys_seg 1 prio class 0
Dec 12 08:09:20 clove kernel: F2FS-fs: Issue discard(4461885, 4461885, 2667) failed, ret: -5
Dec 12 08:09:20 clove kernel: error 0 requesting status 0x80900
Dec 12 08:09:20 clove kernel: blk_update_request: I/O error, dev mmcblk0, sector 38051472 op 0x3:(DISCARD) flags 0x0 phys_seg 1 prio class 0
Dec 12 08:09:20 clove kernel: F2FS-fs: Issue discard(4690898, 4690898, 484) failed, ret: -5
Dec 12 08:09:20 clove kernel: error 0 requesting status 0x80900
Dec 12 08:09:20 clove kernel: blk_update_request: I/O error, dev mmcblk0, sector 38242120 op 0x3:(DISCARD) flags 0x0 phys_seg 1 prio class 0
Dec 12 08:09:20 clove kernel: error 0 requesting status 0x80900
Dec 12 08:09:20 clove kernel: blk_update_request: I/O error, dev mmcblk0, sector 38620072 op 0x3:(DISCARD) flags 0x0 phys_seg 1 prio class 0
Dec 12 08:09:20 clove kernel: error 0 requesting status 0x80900
Dec 12 08:09:20 clove kernel: blk_update_request: I/O error, dev mmcblk0, sector 35949960 op 0x3:(DISCARD) flags 0x0 phys_seg 1 prio class 0
Dec 12 08:09:20 clove kernel: F2FS-fs: Issue discard(4428209, 4428209, 384) failed, ret: -5
Dec 12 08:09:20 clove kernel: F2FS-fs: Issue discard(4714729, 4714729, 371) failed, ret: -5
Dec 12 08:09:20 clove kernel: F2FS-fs: Issue discard(4761973, 4761973, 301) failed, ret: -5
|
I've not issued a fstrim as yet. I'm going to just let it go for a whlie... |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54747 Location: 56N 3W
|
Posted: Sat Dec 12, 2020 4:41 pm Post subject: |
|
|
gtwrek,
discard is another word for trim.
It looks like the card is trying to do some trimming but there is a problem.
Its not clear to me if that's a filesystem problem or an underlying block device problem.
Code: | dev mmcblk0, sector 36213472 | is a location on the block device.
Was the SD card being read or written?
Reads are free, writes may force erase cycles. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
gtwrek Tux's lil' helper
Joined: 10 Mar 2017 Posts: 112 Location: San Jose, CA
|
Posted: Sat Dec 12, 2020 5:10 pm Post subject: |
|
|
Quote: | Was the SD card being read or written?
Reads are free, writes may force erase cycles. |
Well at the time of the failure a backup was running. Curious, I had edited my backup script just before starting it.
After the failure (the main backup was successful, the following prune operation died, and I was forced to reboot).
The edited backup script is NOT on the disk now - the script contains the version before my changes, but it ran with the edits in place.
So it must have run from some sort of cache, and the persistent data never made it to the card's main storage.
The main backup operation is read heavy of course, but there's going to be temp data. (The backup is over ssh to an archive server)
At this point, I've just rebooted the machine. The card is root, so there's going to be (at least) writes happening to /var/log.
I'm not actively doing anything other than a few sudo tail -100 /var/log/messages... |
|
Back to top |
|
|
erm67 l33t
Joined: 01 Nov 2005 Posts: 653 Location: EU
|
Posted: Sat Dec 12, 2020 9:15 pm Post subject: |
|
|
In case you decide to buy a sata-usb adapter and a cheap ssd consider instead an nvme-usb3.1 adapter, 15$ shipped from china and a cheap old/slow nvme, the usb3 on the raspi is "slow", probably ~800MBs but a lot faster than an ssd. It's apparently well supported on the raspi as well. _________________ Ok boomer
True ignorance is not the absence of knowledge, but the refusal to acquire it.
Ab esse ad posse valet, a posse ad esse non valet consequentia
My fediverse account: @erm67@erm67.dynu.net |
|
Back to top |
|
|
gtwrek Tux's lil' helper
Joined: 10 Mar 2017 Posts: 112 Location: San Jose, CA
|
Posted: Sat Dec 12, 2020 10:44 pm Post subject: |
|
|
Thanks for the recommendation erm67. If I end up looking for an alternative to the SD card, the nvme adatper looks like a very reasonable solution.
The current bad card is still pushing out IO errors at 12/minute. None of the addresses match, so it seems to be working through a list? Anyway, I'm going to let it go overnight. |
|
Back to top |
|
|
ShorTie Tux's lil' helper
Joined: 12 Feb 2006 Posts: 101
|
|
Back to top |
|
|
erm67 l33t
Joined: 01 Nov 2005 Posts: 653 Location: EU
|
Posted: Sun Dec 13, 2020 6:43 am Post subject: |
|
|
The adata is a USB 3.2, so is capable of 20Gb/s, while the USB3 port of the raspy can't reach the 10Gb/s of usb 3.1. For the price is great of course but a 10Gb/s or a 20Gb/s adapter will work exactly the same on most ARM boards for the moment.
I read that the next raspy will expose the PCIe lanes provided by most ARM SoC and provide a NVMe socket .... that would be great. _________________ Ok boomer
True ignorance is not the absence of knowledge, but the refusal to acquire it.
Ab esse ad posse valet, a posse ad esse non valet consequentia
My fediverse account: @erm67@erm67.dynu.net |
|
Back to top |
|
|
Etal Veteran
Joined: 15 Jul 2005 Posts: 1932
|
Posted: Sun Dec 13, 2020 1:34 pm Post subject: |
|
|
As far as SD Cards, I heard "industrial" SD Cards are an option. They're small and expensive but supposedly much more reliable. Never tried one though.
I also would suggest not buying SD Cards from Amazon. Counterfeit SD Cards are very common, and due to Amazon's inventory commingling there's no way to know if you're getting a genuine one. |
|
Back to top |
|
|
gtwrek Tux's lil' helper
Joined: 10 Mar 2017 Posts: 112 Location: San Jose, CA
|
Posted: Sun Dec 13, 2020 3:58 pm Post subject: |
|
|
Well next day it's still running, still generating similar errors from the SD card. I think the error rate has slowed a bit, but it's still increasing. (Sector addresses are mostly unique - so it's working through a list, or slowly just chewing the whole card up).
I'm thinking of letting it go for another day just for curiosity sake.
The note on possible counterfeit cards from Amazon is an interesting take - these were Amazon purchases. The SanDisk packaging looks legit (darned thing had so much packaging it was difficult to get the SdCard out without worrying about physically damaging the card..). But faking packaging probably isn't all that hard for miscreants...
Anyway, for my next experiment, I'm ditching f2fs - Neddy seems to think it's not a good idea. Etal states the counterpoint. But something's unique to my setup that's eating these SD cards up. Time to remove variables. I'll just format the next one to ext4 when I pull from backup. Then start a lot of backup operations to see if I can get another failure... |
|
Back to top |
|
|
gtwrek Tux's lil' helper
Joined: 10 Mar 2017 Posts: 112 Location: San Jose, CA
|
Posted: Mon Dec 14, 2020 11:50 pm Post subject: |
|
|
For anyone still following along...
Leaving the SD card in for 2.5+days - lots of I/O errors continued, but nothing improved.
The card is still bad, and a reboot "turned the clock back" to the time right when the failure occured. i.e. writes are silently failing to the persistent memory on the SD card.
I don't think just letting it do its thing is going to improve anything. I now have a stack of (3) bad San Disk SD cards.
Pulled my image from backup. Reformatted root to ext4 - editted a few /etc, and /boot files to reflect this change and started the Pi4 back up. Runs fine.
I've restarted by backup operation. I'm going to see if I can entice a failure with root formatted as ext4 instead of f2fs. The backup is, obviously, a heavy read operation. The prune operation follows the backup (which is actually where things failed both times, I believe). After a few manual iterations, I may just stick the thing in a loop and let it rip for a day to see if I can reproduce a failure.
Temp seems to be hovering around 48 C while the backup is running - on the hot side, but should be ok. (The system has heat sinks installed on critical components, and a fan on top blowing out of the box..) |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54747 Location: 56N 3W
|
Posted: Tue Dec 15, 2020 12:25 am Post subject: |
|
|
gtwrek,
48C is fine. My Pi4 has the PoE HAT and the fan is set for 60C. The fan never runs, even when building stuff.
It goes into thermal throttling at 80C so you won't damage it.
The flash translation layer in solid state storage is not a constant. Its been developed to do more and more over the years.
Using f2fs would worry me in case I had two flash translation layers fighting it out due to overlaps.
I tend to use ext4 without a journal on small SD Cards and with a journal of bigger ones as there are more blocks to share the wear levellng.
That saves the journal writes in exchanges for a full fsck in the face of an unclean shutdown. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
gtwrek Tux's lil' helper
Joined: 10 Mar 2017 Posts: 112 Location: San Jose, CA
|
Posted: Tue Dec 15, 2020 1:56 am Post subject: |
|
|
Quote: | The flash translation layer in solid state storage is not a constant. Its been developed to do more and more over the years.
Using f2fs would worry me in case I had two flash translation layers fighting it out due to overlaps. |
That's kind-of what I'm thinking - whatever f2fs is doing to be "flash friendly" is competing with some other, automatic "flash friendly" things happening somewhere else. Note to those that know these things better than I - this is just coming from someone with a 10,000 ft understanding of the underlying issues. The good news is, it seems I have a (fairly reliable) way to make it fail. I can now play with variables (hopefully one at a time) to see what affects the failure rates. (Might need to order more SD cards. I'm thinking I can go with smaller cards - my "heavy" gentoo installs are only running around 12 GB. But changing card sizes introduces another variable...)
(On iteration 4 of my "make-it-fail" backup operation now using ext4 (journal turned on) - no failures yet...) |
|
Back to top |
|
|
Irre Guru
Joined: 09 Nov 2013 Posts: 434 Location: Stockholm
|
Posted: Tue Dec 15, 2020 9:27 am Post subject: |
|
|
I bought a "SanDisk Extreme microSD card" with unlimited warranty! However in some contries limited to ONLY 30 years!
(It is fast but, I prefer to have root file system mounted on faster external usb3 disks.) |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|