View previous topic :: View next topic |
Author |
Message |
linux_os2 Apprentice
Joined: 29 Aug 2018 Posts: 240 Location: Zedelgem Belgium
|
Posted: Sun Mar 10, 2024 3:13 pm Post subject: mdadm software raid 5 extreme slow write speed on harddisks. |
|
|
Slow write speed on raid 5 (mdadm)
read test:
Code: | dd if=/home/test of=/dev/null bs=10M count=1000
1000+0 records in
1000+0 records out
10485760000 bytes (10 GB, 9.8 GiB) copied, 1.97621 s, 5.3 GB/s |
write test:
Code: | dd of=/home/test if=/dev/zero bs=10M count=1000
1000+0 records in
1000+0 records out
10485760000 bytes (10 GB, 9.8 GiB) copied, 34.5883 s, 303 MB/s |
setup: motherboard : ASUS Z10PE-D16 WS
2 INTEL-XEON® E5-2640 V4 14NM 2.4~3.4GHZ 25MB 10 CORES
memory: 128 GB
harddisks: 6 x
Code: | === START OF INFORMATION SECTION ===
Model Family: Toshiba MG09ACA... Enterprise Capacity HDD
Device Model: TOSHIBA MG09ACA18TE
Serial Number: Y1L0A03RFJDH
LU WWN Device Id: 5 000039 b48d9d5ec
Firmware Version: 0104
User Capacity: 18,000,207,937,536 bytes [18.0 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database 7.3/5319
ATA Version is: ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Mar 10 14:49:51 2024 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled | vmlinuz-6.6.13-gentoo-x86_64
Code: | cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md124 : active raid1 sdk[2] sdj[1]
9766304768 blocks super 1.2 [2/2] [UU]
bitmap: 0/73 pages [0KB], 65536KB chunk
md125 : active raid5 sdi1[6] sdh1[4] sdc1[2] sdb1[1] sda1[0] sdg1[3]
87890972160 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6] [UUUUUU]
bitmap: 5/131 pages [20KB], 65536KB chunk
md126 : active raid1 sde[1] sdd[0]
975585280 blocks super external:/md127/0 [2/2] [UU]
md127 : inactive sde[1](S) sdd[0](S)
2354608 blocks super external:ddf
unused devices: <none>
|
md125 is the raid 5 array
during test 1 thread is 100% for about 12 seconds then percentage drops to about 1 or two percent above normal use.
dumpe2fs /dev/md125:
Code: | Filesystem volume name: <none>
Last mounted on: /home
Filesystem UUID: 05eeba2c-9ab4-4bb2-93ca-85c29bc9d852
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 1373296640
Block count: 21972743040
Reserved block count: 1098637152
Overhead clusters: 87727706
Free blocks: 20163879481
Free inodes: 1372448771
First block: 0
Block size: 4096
Fragment size: 4096
Group descriptor size: 64
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 2048
Inode blocks per group: 128
RAID stride: 128
RAID stripe width: 640
Flex block group size: 16
Filesystem created: Tue Jul 19 00:11:28 2022
Last mount time: Sun Mar 10 11:52:29 2024
Last write time: Sun Mar 10 14:59:26 2024
Mount count: 17
Maximum mount count: 200
Last checked: Sun Mar 3 13:37:20 2024
Check interval: 2592000 (1 month)
Next check after: Tue Apr 2 14:37:20 2024
Lifetime writes: 20 TB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 32
Desired extra isize: 32
Journal inode: 8
First orphan inode: 1008698824
Default directory hash: half_md4
Directory Hash Seed: ea36baf3-7a46-4859-9868-5c8345c490d4
Journal backup: inode blocks
Checksum type: crc32c
Checksum: 0x28defea2
Journal features: journal_incompat_revoke journal_64bit journal_checksum_v3
Total journal size: 1024M
Total journal blocks: 262144
Max transaction length: 262144
Fast commit length: 0
Journal sequence: 0x0030792f
Journal start: 175397
Journal checksum type: crc32c
Journal checksum: 0xc51a21c5 |
----------------------------------------
the figures for md124: raid 1 on 2 10 TB drives: Western Digital Gold WDC WD101KRYZ-01JPDB1
connected via usb3 each in own usb-case
read: Code: | dd if=/mntbackup/backup_partition/test of=/dev/null bs=10M count=1000
1000+0 records in
1000+0 records out
10485760000 bytes (10 GB, 9.8 GiB) copied, 2.22996 s, 4.7 GB/s |
write: Code: | dd of=/mntbackup/backup_partition/test if=/dev/zero bs=10M count=1000
1000+0 records in
1000+0 records out
10485760000 bytes (10 GB, 9.8 GiB) copied, 9.73622 s, 1.1 GB/s |
Can a better performance be expected? |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54572 Location: 56N 3W
|
Posted: Sun Mar 10, 2024 3:59 pm Post subject: |
|
|
linux_os2,
dd is a horrible speed test.
With zoned rotating rust HDD, the speed near the spindle is about 1/3 the speed at the edge as there are less sectors per track near the spindle but the platter rotates at 7,200 RPM.
Code: | dd if=/home/test of=/dev/null bs=10M count=1000
1000+0 records in
1000+0 records out
10485760000 bytes (10 GB, 9.8 GiB) copied, 1.97621 s, 5.3 GB/s | is too fast to be true, so its probably not.
You might get 180Mb/sec sustained read speed near the edge of the platter and 40MB/sec near the spindle. That's the head/platter data rate limit for one drive. Caching in the drive and in the kernel can make it appear much faster, as transactions will be reported complete once the data is in the cache.
RAID should make it faster. Both RAID1 and RAID5.
303 MB/s is not too shabby, depending on where on the drive surface its being written. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
linux_os2 Apprentice
Joined: 29 Aug 2018 Posts: 240 Location: Zedelgem Belgium
|
Posted: Sun Mar 10, 2024 6:38 pm Post subject: |
|
|
Thanks Neddy,
I was expecting an answer from you...
So the backup - restore of the huge raid will take some time. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54572 Location: 56N 3W
|
Posted: Sun Mar 10, 2024 7:28 pm Post subject: |
|
|
linux_os2,
Yep. My one 18G drive takes 36h for the long test and you have 6 :)
What does smartctl -a ... say about the polling time for the long test.
That's long Toshiba think a full surface scan will take.
-- edit --
I would expect 36h (based on my drive) if you can work all 6 drives concurrently and keep the heads busy. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9815 Location: almost Mile High in the USA
|
Posted: Sun Mar 10, 2024 7:57 pm Post subject: |
|
|
Yeah my meager 2TB x 3 array takes forever to backup and ... resilver ... which is the danger of these huge disks, even if my 2T disks are "not" (large, that is).
Not sure when I have to use RAID6 and forget about speed, just worry about uptime even if a disk fails.
Not sure when I'll get 18T disks...but yeah depressing that they only read at 180MB/sec. I'm stuck with 2T disks (mine are anywhere from 90MB/sec to 180MB/sec) at the moment just because they were cheap, I don't have much of a data hoard, and have a pile of them ready to replace if one goes. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54572 Location: 56N 3W
|
Posted: Sun Mar 10, 2024 8:07 pm Post subject: |
|
|
eccerr0r,
A few years ago, the limit was about 6TB/drive for raid5, where recalculating data for a failed drive from the remains of raid5 was thought to be a bit iffy.
I only discovered that after set up 4x8TB drives in rait5 ... Oops.
I know I should add another drive for raid6 ... _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
linux_os2 Apprentice
Joined: 29 Aug 2018 Posts: 240 Location: Zedelgem Belgium
|
Posted: Mon Mar 11, 2024 2:34 pm Post subject: |
|
|
was able to do a backup to the usb-raid 1 on the WD gold 10TB's
will get some other 6 drive of 18 or 20 TB when I win euromillions.
here is the log: Code: | --------------------------------------
2024-03-10 20:29:36 backup of the partition starts
--------------------------------------
Opening User Interface mode.
Partclone v0.3.27 http://partclone.org
Starting to clone device (/dev/md125) to image (-)
Reading Super Block
memory needed: 2748690036 bytes
bitmap 2746592880 bytes, blocks 2*1048576 bytes, checksum 4 bytes
Calculating bitmap... Please wait...
Total Time: 00:12:21, Ave. Rate: 0.00byte/min, 100.00% completed!
done!
File system: EXTFS
Device size: 90.0 TB = 21972743040 Blocks
Space in use: 7.4 TB = 1808859127 Blocks
Free Space: 82.6 TB = 20163883913 Blocks
Block size: 4096 Byte
Total block 21972743040
Total Time: 18:04:06, Ave. Rate: 6.83GB/min, 100.00% completed!
Syncing... OK!
Partclone successfully cloned the device (/dev/md125) to the image (-)
--------------------------------------
2024-03-11 14:51:56 backup of the partition was succesfull
--------------------------------------
--------------------------------------
2024-03-11 14:51:56 Wrote fdisk of the drive
-------------------------------------- |
not so bad I think.
my backup-script pipes the output from partclone to zstd. runs under lfs.
the 7.4 TB is compressed to 6.9 TB.
the compression rate is so low because the partition contains mostly flacs and the backups of the other partitions. 5.4 TB together.
Neddy, tests with smartctl will be done later when system can be missed for some time. (and there is sun enough, my system is consuming about 250 watt so 36 hours means 9 KWh)
Marc. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54572 Location: 56N 3W
|
Posted: Mon Mar 11, 2024 2:49 pm Post subject: |
|
|
linux_os2,
I just intended you to read the smartctl output. Four an 8TB drive I get.
Code: | # smartctl -a /dev/sda
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.7.6-gentoo] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Toshiba N300/MN NAS HDD
Device Model: TOSHIBA HDWG480
Serial Number: 71R0A0NWFA3H
LU WWN Device Id: 5 000039 b08e0f122
Firmware Version: 0601
User Capacity: 8,001,563,222,016 bytes [8.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database 7.3/5577
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Mar 11 14:41:25 2024 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 690) minutes. |
The Code: | Extended self-test routine
recommended polling time: ( 690) minutes | is what I wanted to see. It only takes seconds :)
That's the time it takes the drive to do a surface scan with no data passing over the external data interface.
Your 6.83GB/min is about 114MB/sec which is not too bad. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
linux_os2 Apprentice
Joined: 29 Aug 2018 Posts: 240 Location: Zedelgem Belgium
|
Posted: Mon Mar 11, 2024 3:32 pm Post subject: |
|
|
here it is: Code: | # smartctl -a /dev/sda
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.13-gentoo-x86_64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Toshiba MG09ACA... Enterprise Capacity HDD
Device Model: TOSHIBA MG09ACA18TE
Serial Number: Y1A0A031FJDH
LU WWN Device Id: 5 000039 b48d08a0c
Firmware Version: 0104
User Capacity: 18,000,207,937,536 bytes [18.0 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database 7.3/5319
ATA Version is: ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Mar 11 16:28:08 2024 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: (1484) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0
3 Spin_Up_Time 0x0027 100 100 001 Pre-fail Always - 8765
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 2673
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 050 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 091 091 000 Old_age Always - 3945
10 Spin_Retry_Count 0x0033 100 100 030 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 2551
23 Helium_Condition_Lower 0x0023 100 100 075 Pre-fail Always - 0
24 Helium_Condition_Upper 0x0023 100 100 075 Pre-fail Always - 0
27 MAMR_Health_Monitor 0x0023 100 100 030 Pre-fail Always - 331287
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 21
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 282
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 3084
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 27 (Min/Max 13/33)
196 Reallocated_Event_Count 0x0033 100 100 010 Pre-fail Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
220 Disk_Shift 0x0002 100 100 000 Old_age Always - 69337098
222 Loaded_Hours 0x0032 091 091 000 Old_age Always - 3760
223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 0
224 Load_Friction 0x0022 100 100 000 Old_age Always - 0
226 Load-in_Time 0x0026 100 100 000 Old_age Always - 679
240 Head_Flying_Hours 0x0001 100 100 001 Pre-fail Offline - 0
241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 12189486532
242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 91336331000
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
The above only provides legacy SMART information - try 'smartctl -x' for more |
|
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9815 Location: almost Mile High in the USA
|
Posted: Mon Mar 11, 2024 3:52 pm Post subject: |
|
|
BTW, I use rsync to backup, and risk a bit of bit rot (though scrubbing the arrays help).
It was kind of funny, I chanced upon some 10Gb Ethernet cards for cheap and was thinking I should start upgrading to 10GbE but with the HDD bottleneck it probably doesn't make a whole lot of sense... On random reads even my RAID5 can't saturate 1GbE, and of course the single disk machines (other than the machines with SSDs) can't saturate 1GbE either. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54572 Location: 56N 3W
|
Posted: Mon Mar 11, 2024 3:56 pm Post subject: |
|
|
linux_os2,
Code: | Extended self-test routine
recommended polling time: (1484) minutes. |
Or just under 25 hours for a surface scan. You won't be able to read the entire drive any faster that that,
The good news is tat raid sets are accessed in parallel.
That's an average speed of 18,000,207,937,536/1484 bytes/min or 202MB/sec
The data sheet says
Quote: | Data Transfer Speed
(Sustained)(Typ.) 268MiB/s |
which hints are reading data from several heads concurrently. I not seen that since magnetic drum storage. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
linux_os2 Apprentice
Joined: 29 Aug 2018 Posts: 240 Location: Zedelgem Belgium
|
Posted: Mon Mar 11, 2024 4:06 pm Post subject: |
|
|
speaking of drum-storage, where is the time that we painted back magnetic material after a crash.
I also remember magnetic ring storage, they were nonvolatile too.
we had units of 16KB in the 360-115 wow. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54572 Location: 56N 3W
|
Posted: Mon Mar 11, 2024 4:26 pm Post subject: |
|
|
linux_os2,
I know Code: | magnetic ring storage | as core store. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|