Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Are SSD drives suitable for RAID-1 use these days?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
ipic
Guru
Guru


Joined: 29 Dec 2003
Posts: 400
Location: UK

PostPosted: Wed Aug 07, 2024 7:53 pm    Post subject: Are SSD drives suitable for RAID-1 use these days? Reply with quote

On my Gentoo main system, I have four 2TB mechanical hard drives configured into two RAID-1 mirrors. I then add the physical drive partitions to Logical Volume manager, which I can use to create partitions as and when needed, as well as change partition sizes easily.

This configuration has served me well over many years. Copied from common practice where I worked - provides good resilience to single point failure in the storage systems.

The time to consider replacements is coming around again, and SSD drives have caught my eye. Last time I looked, the price premium for 2TB SSD was too high, now it is in the region of price parity with 2TB mechanical drives.

But - I recall that I also formed an opinion (some years ago) that SSD drives would not work well with a Gentoo system configured like mine, on account of the SSD drives having a very limited lifespan if written to regularly.

How do people view this these days? Have the manufacturers mitigated the problem with high write rates to SSD drives?

As an off the wall follow up question - what would people think of a SSD-Mechanical pairing in a RAID-0 mirror? Stupid idea, or some merit for faster read times?

Thanks in advance for any insights.

(Not sure this is the correct forum for this sort of discussion - hopefully the mods will move the post if not)
(Edit: RAID-0 to RAID-1, correcting a senior moment, sorry)


Last edited by ipic on Fri Aug 09, 2024 11:41 am; edited 1 time in total
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3407

PostPosted: Wed Aug 07, 2024 8:53 pm    Post subject: Reply with quote

Quote:
But - I recall that I also formed an opinion (some years ago) that SSD drives would not work well with a Gentoo system configured like mine, on account of the SSD drives having a very limited lifespan if written to regularly.

Dunno, my Gentoo is doing fine. The vast majority of compilation happens in RAM anyway, and hitting swap once in a while is not a big deal.

Quote:
As an off the wall follow up question - what would people think of a SSD-Mechanical pairing in a RAID-0 mirror? Stupid idea, or some merit for faster read times?
It should give you faster reads, but write performance will be limited by hdd.
Depending on the internal optimizations, I can imagine this setup failing slow HDD out of the raid set during a big write (or at any time really), so you'd have to really really stress test it.
Bcache and similar probably won't survive the backing device failing, even if cache is big enough to keep all data forever.
Maybe you could abuse DRBD in some creative way to flag HDD as write-mostly, if you're -brave-enough- totally insane. I mean, this idea definitely sucks, but I am now curious if it's possible :lol:

One thing that did actually work well when mixing HDD and SSD was LVM snapshots. Full snapshots on a single HDD degrade performance like 10 times. LVM allows you to specify physical volume where a logical volume should be created. Having both, HDD and SSD in one group allows you to put an LV on HDD and then snapshot is to SSD, which avoids write amplification penalty.
_________________
Make Computing Fun Again
Back to top
View user's profile Send private message
ipic
Guru
Guru


Joined: 29 Dec 2003
Posts: 400
Location: UK

PostPosted: Fri Aug 09, 2024 8:43 am    Post subject: Reply with quote

szatox wrote:
Quote:
But - I recall that I also formed an opinion (some years ago) that SSD drives would not work well with a Gentoo system configured like mine, on account of the SSD drives having a very limited lifespan if written to regularly.

Dunno, my Gentoo is doing fine. The vast majority of compilation happens in RAM anyway, and hitting swap once in a while is not a big deal.


Sound promising. Do you use the SSD's in a RAID?

I'd be interested to know which manufacturers you have found to be reliable.
Thanks
Back to top
View user's profile Send private message
ipic
Guru
Guru


Joined: 29 Dec 2003
Posts: 400
Location: UK

PostPosted: Fri Aug 09, 2024 8:57 am    Post subject: Reply with quote

szatox wrote:

Quote:
As an off the wall follow up question - what would people think of a SSD-Mechanical pairing in a RAID-0 mirror? Stupid idea, or some merit for faster read times?
It should give you faster reads, but write performance will be limited by hdd.
Depending on the internal optimizations, I can imagine this setup failing slow HDD out of the raid set during a big write (or at any time really), so you'd have to really really stress test it.
Bcache and similar probably won't survive the backing device failing, even if cache is big enough to keep all data forever.
Maybe you could abuse DRBD in some creative way to flag HDD as write-mostly, if you're -brave-enough- totally insane. I mean, this idea definitely sucks, but I am now curious if it's possible :lol:

One thing that did actually work well when mixing HDD and SSD was LVM snapshots. Full snapshots on a single HDD degrade performance like 10 times. LVM allows you to specify physical volume where a logical volume should be created. Having both, HDD and SSD in one group allows you to put an LV on HDD and then snapshot is to SSD, which avoids write amplification penalty.


My main reason for thinking of this stupid setup is the way I replace drives in the RAID-0. Process is:
- remove the drive to be replaced from the RAID array.
- format the new drive by copying the partition tables
Code:
sgdisk /dev/sdX -R /dev/sdY
sgdisk -G /dev/sdY

- Join the new disk to the array, and let MDADM recovery do it's thing.

Your comment about the HDD being dropped because of slow writes *looks* like it will be OK in this, one, specific, scenario - but I guess it could fail immediately afterwards, before I have a chance to reverse the process?
Back to top
View user's profile Send private message
lars_the_bear
Guru
Guru


Joined: 05 Jun 2024
Posts: 512

PostPosted: Fri Aug 09, 2024 9:01 am    Post subject: Re: Are SSD drives suitable for RAID-0 use these days? Reply with quote

ipic wrote:

How do people view this these days? Have the manufacturers mitigated the problem with high write rates to SSD drives?


My understanding is that writes are less a problem these days, with wear levelling and so-forth. Seagate claims a total write life of 1200 terrabytes for their 'Barracuda' SSD range. I don't if that's been independently verified but, even if it's a hundred times smaller in practice, it's still long enough for my purposes. Even if it's only 12 Tb, that's still 6Gb per day, every day, for five years.

These days, I think I would be quite happy to use decent-quality SSDs in RAID, in the same circumstances where I would use them singly. But the sizes I want are still not really affordable in SSD.

BR, Lars.
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3407

PostPosted: Fri Aug 09, 2024 10:09 am    Post subject: Reply with quote

Quote:
I have four 2TB mechanical hard drives configured into two RAID-0 mirrors
Oh, before we continue... You meant RAID1, didn't you?
RAID0 is striping without any redundancy

Quote:
Sound promising. Do you use the SSD's in a RAID?
I used to have a bunch of Debian servers with SSDs in RAID1 in my care, including a pretty big and damn busy integration database. It worked.
I don't know what SSDs was it stuffed with. You should be good with anything built to be installed inside the case (as a hot storage). Removable devices (cold storage, like memory cards and pendrives) are manufactured for low duty and will burn holes like crazy.

My personal Gentoo is just sitting on a single nvme. After a few years of daily use, it reports 2% wear with 0 cells burnt out. Looks like it's going tol last me a lifetime.

Quote:
Your comment about the HDD being dropped because of slow writes *looks* like it will be OK in this, one, specific, scenario - but I guess it could fail immediately afterwards, before I have a chance to reverse the process?
I don't know how it's going to behave.
Depends on how defensively it's coded, you may be fine, or adding the new, fast device might kick out the device with data on it from raid, leaving you in an unusable state.
Honestly I'd rather take that machine down and copy data manually with dd instead. 2 TB is not _that_ much, standard 7200RPM disks should let you image copy them within 2-3 hours.
_________________
Make Computing Fun Again
Back to top
View user's profile Send private message
ipic
Guru
Guru


Joined: 29 Dec 2003
Posts: 400
Location: UK

PostPosted: Fri Aug 09, 2024 11:44 am    Post subject: Reply with quote

szatox wrote:
Quote:
I have four 2TB mechanical hard drives configured into two RAID-0 mirrors
Oh, before we continue... You meant RAID1, didn't you?
RAID0 is striping without any redundancy


Yeah, senior moment, sorry. I've edited the heading and original post. RAID-1 mirrors.
Back to top
View user's profile Send private message
ipic
Guru
Guru


Joined: 29 Dec 2003
Posts: 400
Location: UK

PostPosted: Fri Aug 09, 2024 12:02 pm    Post subject: Reply with quote

szatox wrote:
Quote:
Your comment about the HDD being dropped because of slow writes *looks* like it will be OK in this, one, specific, scenario - but I guess it could fail immediately afterwards, before I have a chance to reverse the process?
I don't know how it's going to behave.
Depends on how defensively it's coded, you may be fine, or adding the new, fast device might kick out the device with data on it from raid, leaving you in an unusable state.
Honestly I'd rather take that machine down and copy data manually with dd instead. 2 TB is not _that_ much, standard 7200RPM disks should let you image copy them within 2-3 hours.


Thanks for the feedback, it's very useful. Good to know your RAID's work with SSD's.

I've gone back to my motherboard manual, and I have 6x SATA-6 connectors to play with, and a free 5.25inch drive bay. With adapter trays I can fit 2 SSD drives into the 5.25 bay.

So, a plan is:
- Boot with rescue image
- DD a HDD to one of the SDDs
- make new mirror with other SDD
- shutdown
- remove HDDs and replace with SDDs (to use original drive designations)
- reboot with SDD's now as original mirror
- pray, then test

If it all goes spectacularly wrong, I still have the original HDDs to go back to.

Another thought - would using different drive manufacturers for the two mirror SDDs be a way to spread risk a bit? I looked after a system at work a while back which had identical components in two redundant nodes - and they both failed at exactly the same time, due to the identical manufacturer error. So, burnt into my memory.
Back to top
View user's profile Send private message
ipic
Guru
Guru


Joined: 29 Dec 2003
Posts: 400
Location: UK

PostPosted: Fri Aug 09, 2024 12:09 pm    Post subject: Re: Are SSD drives suitable for RAID-0 use these days? Reply with quote

lars_the_bear wrote:
ipic wrote:

How do people view this these days? Have the manufacturers mitigated the problem with high write rates to SSD drives?


My understanding is that writes are less a problem these days, with wear levelling and so-forth. Seagate claims a total write life of 1200 terrabytes for their 'Barracuda' SSD range. I don't if that's been independently verified but, even if it's a hundred times smaller in practice, it's still long enough for my purposes. Even if it's only 12 Tb, that's still 6Gb per day, every day, for five years.

These days, I think I would be quite happy to use decent-quality SSDs in RAID, in the same circumstances where I would use them singly. But the sizes I want are still not really affordable in SSD.

BR, Lars.


Thanks for the feedback, appreciated.
As you say, even a whole order of magnitude lower would still make the drives good for probably the rest of my lifetime :-).
I'll assume that the mean time to failure for other reasons (manufacturing process error etc) would be as good as or probably better than mechanical - since assembling very small moving parts carries its own risks.
Back to top
View user's profile Send private message
lars_the_bear
Guru
Guru


Joined: 05 Jun 2024
Posts: 512

PostPosted: Fri Aug 09, 2024 12:45 pm    Post subject: Re: Are SSD drives suitable for RAID-0 use these days? Reply with quote

ipic wrote:

I'll assume that the mean time to failure for other reasons (manufacturing process error etc) would be as good as or probably better than mechanical - since assembling very small moving parts carries its own risks.


Seagate claims MTBF of 1.8 million hours, which is about 200 years. So, again, even if you're at the worst end of the practical range, the lifetime is still likely to be tens of years.

Having said that, I've not seen any independent verification of these claims. To be frank, I'm not sure how they can easily be tested. You'd need to run thousands of drives side-by-side for several years to stand a reasonable probability of seeing a failure. In my working life I've seen mechanical drives fail, even when treated well. I suspect that the bearings just run dry eventually. But I've never seen a Seagate SSD fail, even after installing dozens of them over 10+ years. Even the cheap, unbranded ones don't usually fail -- they're usually defective when they arrive.

I don't think there's anything special about Seagate -- it's just that they actually publish service lifetime stuff, and I'm not sure manufacturers usually do.

BR, Lars.
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3407

PostPosted: Fri Aug 09, 2024 1:18 pm    Post subject: Reply with quote

Quote:
So, a plan is:
- Boot with rescue image
- DD a HDD to one of the SDDs
- make new mirror with other SDD
- shutdown
- remove HDDs and replace with SDDs (to use original drive designations)
- reboot with SDD's now as original mirror
- pray, then test
I think you're overcomplicating things.
You can clone both HDDs to respective SSDs at the same time and avoid resyncing RAID afterwards. Just make sure to remove old disks before you activate raid, mdraid identifies members by their superblocks.

If new and old devices are exactly the same size, you can even clone them whole, including partition tables. With slightly bigger drives and GPT partitions, you'll need to repair the backup partition table afterwards, but it doesn't affect superblocks, so your raid will still assemble just fine; Only smaller devices would require more care.

Quote:
Another thought - would using different drive manufacturers for the two mirror SDDs be a way to spread risk a bit?
Yes.
Companies like buying stuff in bulk, so they put disks with consecutive serial numbers in the raid because that's what they have at hand. In theory it is also good for performance, but in practice comes with increased risk of common cause failures. Your story is not surprising.
_________________
Make Computing Fun Again
Back to top
View user's profile Send private message
pa4wdh
l33t
l33t


Joined: 16 Dec 2005
Posts: 881

PostPosted: Fri Aug 09, 2024 5:26 pm    Post subject: Reply with quote

I have a very similar setup to yours in my server. It used to be 2 mechanical drives, but i've replaced them with SSD's by replacing them one at a time and let RAID to it's magic.

As for writes, i do monitor them but the drives capability is so high i'll never reach it's theoretical maximum. What i did do is to replace one SSD with a different model after a few years of use. My RAID array now consists of two different drives with an age difference of 4 years, the chance of them failing at the same time should be quite low.

I made a simple script to calculate the amount of written data per day and estimate how long it will take until you reach the specified TBW for the drive. This is the script:
Code:
#!/bin/sh

DEVICE[0]="/dev/sda"
START[0]="2019-08-10"
TBW[0]="600"

DEVICE[1]="/dev/sdb"
START[1]="2023-09-02"
TBW[1]="600"

human_size()
{
 local INPUT=$1
 local SUFFIX="B"
 local FACTOR="1000"
 local LIMIT=$(($FACTOR*20))

 if [ "$INPUT" -gt "$LIMIT" ]
 then
  INPUT=$((INPUT/$FACTOR))
  SUFFIX="KB"
 fi
 if [ "$INPUT" -gt "$LIMIT" ]
 then
  INPUT=$((INPUT/$FACTOR))
  SUFFIX="MB"
 fi
 if [ "$INPUT" -gt "$LIMIT" ]
 then
  INPUT=$((INPUT/$FACTOR))
  SUFFIX="GB"
 fi
 if [ "$INPUT" -gt "$LIMIT" ]
 then
  INPUT=$((INPUT/$FACTOR))
  SUFFIX="TB"
 fi

 echo "$INPUT $SUFFIX"
}

for COUNT in ${!DEVICE[@]}
do
# Get sectors written from device
 SECTORS=`smartctl -A ${DEVICE[$COUNT]} | grep "^241 " | awk '{ print $NF }'`

 if [ -z "$SECTORS" ]
 then
  echo "${DEVICE[$COUNT]}: No write statistics available"
  continue
 fi

# Writes are reported by sectors, multiply by 512
 BW=$((SECTORS*512))
# Guaranteerd value is TBW so multiply by 1 TB
 ESTBW=$((${TBW[$COUNT]}*1000000000000))

# Calculate the days the device has been in use
 STARTUNIX=`date +'%s' -d "${START[$COUNT]}"`
 NOWUNIX=`date +'%s'`
 DAYS=$((($NOWUNIX-$STARTUNIX)/86400))
 if [ "$DAYS" -lt "1" ]
 then
  DAYS=1
 fi

# Calculate the daily write-rate
 DAYRATE=$(($BW/$DAYS))

# Calculate the number of days left given the current rate
 ESTDAYS=$((($ESTBW/$DAYRATE)-$DAYS))
 ESTDATE=`date +"%d-%m-%Y" -d "$ESTDAYS days"`

# Report
 echo "${DEVICE[$COUNT]}: Written `human_size $BW` in $DAYS days, `human_size $DAYRATE`/day, $ESTDAYS days left: $ESTDATE"
done

The arrays at the start are the place to specify the drive's device node, the date you started using it, and the TBW specified by the manufacturer. On my system this reports:
Code:
/dev/sda: Written 6324 GB in 1826 days, 3463 MB/day, 171406 days left: 24-11-2493
/dev/sdb: Written 1400 GB in 342 days, 4096 MB/day, 146134 days left: 15-09-2424

So at least in theory i don't have to worry about them :)

To select the right drives i followed this advice: https://raid.wiki.kernel.org/index.php/Choosing_your_hardware,_and_what_is_a_device%3F#TLER_and_SCT.2FERC
So far i've seen Samsung drives consistently supporting SCT/ERC in it's EVO series so my drives are an EVO 860 and an EVO 870.
Be aware that this feature needs activation, this is in my /etc/local.d:
Code:
#!/bin/sh

# Set SCT Error Recovery Control to 7 seconds
smartctl -l scterc,70,70 /dev/sda
smartctl -l scterc,70,70 /dev/sdb

_________________
The gentoo way of bringing peace to the world:
USE="-war" emerge --newuse @world

My shared code repository: https://code.pa4wdh.nl.eu.org
Music, Free as in Freedom: https://www.jamendo.com


Last edited by pa4wdh on Sat Aug 10, 2024 6:31 am; edited 3 times in total
Back to top
View user's profile Send private message
ipic
Guru
Guru


Joined: 29 Dec 2003
Posts: 400
Location: UK

PostPosted: Fri Aug 09, 2024 6:26 pm    Post subject: Reply with quote

pa4wdh wrote:
I have a very similar setup to yours in my server. It used to be 2 mechanical drives, but i've replaced them with SSD's by replacing them one at a time and let RAID to it's magic.


Many thanks for the super useful post. I have now pulled the trigger and have started ordering hardware.

Re your comment: "replacing them one at a time and let RAID to it's magic"
Was there a time when you were using a RAID array with a HDD and a SDD active at the same time?
Or was that just for the time while you converted the drives from HDD to SDD?
Back to top
View user's profile Send private message
pa4wdh
l33t
l33t


Joined: 16 Dec 2005
Posts: 881

PostPosted: Sat Aug 10, 2024 6:26 am    Post subject: Reply with quote

Quote:
Was there a time when you were using a RAID array with a HDD and a SDD active at the same time?
Or was that just for the time while you converted the drives from HDD to SDD?

For about a week i've been running with 1 HDD and 1 SSD. The difference in speed was huge (1TB 2.5" 5400rpm disk vs. EVO 860), but that didn't affect the array, just don't expect a performance increase during that time.
_________________
The gentoo way of bringing peace to the world:
USE="-war" emerge --newuse @world

My shared code repository: https://code.pa4wdh.nl.eu.org
Music, Free as in Freedom: https://www.jamendo.com
Back to top
View user's profile Send private message
lars_the_bear
Guru
Guru


Joined: 05 Jun 2024
Posts: 512

PostPosted: Sat Aug 10, 2024 6:26 pm    Post subject: Reply with quote

This isn't an issue for everybody, I guess: a huge advantage for me, if I could afford SSDs as large as I need, would be that they are silent in operation. Right now, with my stack of 7200rpm magnetic disks, I have to choose between having a constant drone from them, or letting them spin up and spin down all the time.

Actually, my wife makes that choice for me ;)

Personally, I would use SSDs if I could, even if it meant that individual drives won't last as long as the magnetic drives I currently use. But, as I said, I suspect they will.

As soon as the price of SSDs falls to about a quarter their current level, I'll be using them in my RAID for sure.

BR, Lars.
Back to top
View user's profile Send private message
ipic
Guru
Guru


Joined: 29 Dec 2003
Posts: 400
Location: UK

PostPosted: Wed Aug 21, 2024 1:34 pm    Post subject: Reply with quote

Having taken delivery of a pair of 4TB SSD drives, I decided to try out the 'hot' replacement route.


    Physically add the two new drives to the case
    Use mdadm --grow to increase the RAID size to 4
    Add each new SDD in turn, and wait for resync to finish
    Use mdadm --fail and then --remove to remove the 'old' HDD drives
    Use mdadm --grow to return RAID size to 2
    Make sure that grub is installed on new SSDs and EFI partitions and boot entries are good
    Shut down PC and physically remove 'old' HDD drives, move new SDD drives the bays and connectors the HDDs used
    Start PC


Boot up went fine - Raids were assembled correctly, LVMs found and mounted - everything as before.

Very please with everything loading faster now :-)

If I remember, I'll come back here from time to time with reliability reports.
Back to top
View user's profile Send private message
ipic
Guru
Guru


Joined: 29 Dec 2003
Posts: 400
Location: UK

PostPosted: Wed Sep 11, 2024 12:04 pm    Post subject: Reply with quote

I was digging around the smartctl output for interesting data, and I found this:
Code:

# smartctl -x /dev/sdk
.....
Device Statistics (GP Log 0x04)
...
0x01  0x010  4           28202  ---  Power-on Hours
0x01  0x018  6     71355063635  ---  Logical Sectors Written
...

So, I wrote a Perl script to dig the smartctl output, and out of curiosity used it on one of the old Western Digital HDDs that I replaced with and SSD.
Lo and behold, I got this:
Code:

Power on for: 3.2 years
/dev/sdk: WDC WD20EZBX-00AYRA0 Written: 36.53 TB

That drive was part of my main system raid mirror, so it was running a whole Gentoo system, which is maintained daily, and has a kernel build whenever the main kernel.org published a stable branch version (via Gentoo-Sources package)
Machine also used for programming, WWW browsing, and playing games via Steam.

It's one data point, but in my case it shows a typical data write rate of a little bit more than 10TB a year for a largish Gentoo system (> 1500 packages installed).
Though this may be of interest to people thinking of using SDDs for a main Gentoo system.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54577
Location: 56N 3W

PostPosted: Wed Sep 11, 2024 12:41 pm    Post subject: Reply with quote

lars_the_bear,

If you can connect the extra drive, without disconnecting are working member of the raid set, use mdadm --replace ...

It will populate the new volume from available data from any member of the raid. So if you have a drive fail during the replace, it still works.
See the mdadm man page.

I have an NVMe and SSD in raid1, with the SSD being write mostly.

As long as you don't use SMR HDD in raid, not ever, mix and match should be OK.
Stay away from Samsung, SSDs. Their warranty process is horrible and I had both halves of a mirror. fail with bad blocks just a few hours apart.

You can do your own wearout calculations. When I looked atone of mine a year or so ago, it should last 180 years.
iwon't care for several reasons. :)
Stay away from QLC SSDs if write life is a real cocern. They only have half the write life of SSDs.

smartctl -x on one of your HDD will tell all about data read/written.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum