Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
NVMe and emerge compile
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
Black
Apprentice
Apprentice


Joined: 10 Dec 2002
Posts: 158
Location: Québec, Canada

PostPosted: Tue May 09, 2023 5:37 pm    Post subject: NVMe and emerge compile Reply with quote

2 years ago, I got myself a new computer, this time with a NVMe drive mounted as root drive (/). I also have regular HDD set up in RAID1 for my /home. I did not create a separate partition for /var, so it's on the same NVMe drive as /.

That PC is on 24/7. At some point, I rebooted, and the BIOS complained about that drive failing the SMART test. I have now set up /var/tmp to be a tmpfs, taking 12GB out of the system's total 32GB of RAM. I also went looking on the net to see if leaving emerge to compile on a NVMe drive is bad. I came across a reddit page where people say it's not an issue, with one example saying he's got "38TB written in 7000 hours", with a warrantied TBW of 1200TB.

In my case, the drive is a Kingston SA2000M8250G - from what I find online, the TBW limit is 150TB. From what I can tell, I'm waaaaay past that, at 1.03PB (in almost 2 years - see below). Kingston's warranty is also apparently void since the "percentage used" is now 100%. So yeah, that drive is a failure waiting to happen (it still runs, despite the SMART test failure - I'm currently using this computer to post this message).

So my question is: is letting portage use a NVMe drive to compile killing such drives?

Code:
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
- NVM subsystem reliability has been degraded

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x04
Temperature:                        31 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    100%
Data Units Read:                    301,659 [154 GB]
Data Units Written:                 2,014,358,159 [1.03 PB]
Host Read Commands:                 11,206,812
Host Write Commands:                7,973,648,047
Controller Busy Time:               90,160
Power Cycles:                       46
Power On Hours:                     16,711
Unsafe Shutdowns:                   16
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9691
Location: almost Mile High in the USA

PostPosted: Tue May 09, 2023 5:54 pm    Post subject: Reply with quote

Today's TLC and QLC drives just don't have the endurance anymore but for most uses they are fine. 1PB written however is a LOT, what are you doing to the disk? Using it as a bittorrent dump?

I have machines on 24/7 but they're mostly idle. One of them is PVR and it's accumulated ~ 65TB written since last mkfs (it's a mechanical HDD however) and it's been over 10 years. I don't constantly do updates on it however, but it definitely gets emerge @world once in a while - but the vast majority of the writes are from downloading OTA TV programming.

Granted for me my Gentoo boxes typically use tmpfs when I can, but being RAM limited I cannot always use tmpfs. I do have a 180G SATA SSD that I've gone 23TB written according to SMART, but it has a minimum 540TBW estimate endurance limit
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?


Last edited by eccerr0r on Tue May 09, 2023 5:55 pm; edited 1 time in total
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54331
Location: 56N 3W

PostPosted: Tue May 09, 2023 5:54 pm    Post subject: Reply with quote

Black

Code:

Available Spare:                    100%
Data Units Read:                    301,659 [154 GB]
Data Units Written:                 2,014,358,159 [1.03 PB]
Power On Hours:                     16,711


I'm not sure I believe those numbers 1.03 PB in 16,711 hours is 60GB an hour. That's 16MB/sec Portage is not doing that.
The drive also has not used any of its spare capacity, which would be way down at end of life.

The data set is not self consistent.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9691
Location: almost Mile High in the USA

PostPosted: Tue May 09, 2023 6:00 pm    Post subject: Reply with quote

Definitely not Gentoo doing that usage but I can't say it's unbelievable - but we don't know what else is using the disk. One thing that is suspicious is that the read/write ratio is oddly skewed to writes - meaning that it's written and never read back...

I found that I (accidentally) took a big chunk out of some of my SSDs by thrash swapping to them, and with an NVMe interface this can add up fast.

I do have to say that there are firmware bugs out there that lie about usage. One of my SSDs, according to its POH, says it was made when Edison made his first light bulb...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Black
Apprentice
Apprentice


Joined: 10 Dec 2002
Posts: 158
Location: Québec, Canada

PostPosted: Tue May 09, 2023 6:19 pm    Post subject: Reply with quote

eccerr0r wrote:
Today's TLC and QLC drives just don't have the endurance anymore but for most uses they are fine. 1PB written however is a LOT, what are you doing to the disk? Using it as a bittorrent dump?

I have machines on 24/7 but they're mostly idle. One of them is PVR and it's accumulated ~ 65TB written since last mkfs (it's a mechanical HDD however) and it's been over 10 years. I don't constantly do updates on it however, but it definitely gets emerge @world once in a while - but the vast majority of the writes are from downloading OTA TV programming.

Granted for me my Gentoo boxes typically use tmpfs when I can, but being RAM limited I cannot always use tmpfs. I do have a 180G SATA SSD that I've gone 23TB written according to SMART, but it has a minimum 540TBW estimate endurance limit


No, that PC is mostly idle. It's my desktop - it's running 24/7, but the only server I'm running is Samba for my local network - and the files it is serving at on /home, so not on the NVMe. Portage is definitely the most disk-intense activity on that PC - when I run it, which is at most once a day, and not every day.

The swap partition is also there, but with 32GB RAM, it's not getting much use. In hindsight, I should have put it on the HDD, but I don't think it is a factor.

Running iotop gives mostly Google Chrome as the main io process, but, again, /home isn't on the NVMe. And iotop's "Current DISK WRITE" is at or close to 0, with bursts in the 300 K/s range.

ntop:
Code:
    0[|                         0.7%]   3[|                         0.7%]   6[||                        2.0%]   9[||                        3.3%]
    1[||                        1.3%]   4[                          0.0%]   7[||                        1.3%]  10[                          0.0%]
    2[||                        1.3%]   5[||                        2.0%]   8[                          0.0%]  11[|                         0.7%]
  Mem[|||||||||||||||||||||||||||||||||||||||||              1.83G/31.2G] Tasks: 101, 452 thr, 142 kthr; 1 running
  Swp[||                                                     7.90M/32.0G] Load average: 1.66 1.95 2.04
                                                                          Uptime: 72 days, 20:31:53


@NeddySeagoon you're right, 60GB/hour is rather high for a PC that's mostly idle.

@eccerr0r I think you might be on to something with firmware bugs...
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9691
Location: almost Mile High in the USA

PostPosted: Thu May 11, 2023 6:25 am    Post subject: Reply with quote

Can these newer nvme SSDs sustain 1GB/sec written?
Writing through 2PB would take less than 1 month...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Black
Apprentice
Apprentice


Joined: 10 Dec 2002
Posts: 158
Location: Québec, Canada

PostPosted: Mon May 15, 2023 3:07 am    Post subject: Reply with quote

eccerr0r wrote:
One thing that is suspicious is that the read/write ratio is oddly skewed to writes - meaning that it's written and never read back...


/var/log ?
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 21726

PostPosted: Mon May 15, 2023 3:43 am    Post subject: Reply with quote

Yes, logs are written and often not read, but typical logs should not be nearly large enough for that to be noticeable at this scale.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9691
Location: almost Mile High in the USA

PostPosted: Mon May 15, 2023 6:13 am    Post subject: Reply with quote

only thing that could do this are:

- Backups (unless you verify)... I had one hard drive that I only wrote backups to (as well as it kept on getting dropped from the array for electrical problems, so it kept on getting resilvered and that's all writes)
- killer endurance testing
- sabotage...

mystery continues...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Black
Apprentice
Apprentice


Joined: 10 Dec 2002
Posts: 158
Location: Québec, Canada

PostPosted: Thu Jun 08, 2023 12:36 pm    Post subject: Reply with quote

I have a new NVMe "drive" on my desk, and I will switch in the near-future, but in the meantime, here's some interesting data. I don't think it shows anything (other than it doesn't match). I have run smartctl twice, at 24-hour interval. iotop has been running (in accumulation mode) for that same period. md127 is a RAID1 array of spinning rust - so not the NVMe. Chrome, running under user "black" should be writing to the home folder, which is not on the NVMe (it's on md127 - the spinning rust). Syncthing's folders are also on md127.

/var/portable/tmp has been in tmpfs for a month now and doesn't appear to make a difference. I have included the relevant fstab line, in case I made a newbie mistake there.

Code:
Every 2.0s: smartctl -A /dev/nvme0                           blackphoenix: Wed Jun  7 08:08:57 2023

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.12-gentoo] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF SMART DATA SECTION ===
SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x04
Temperature:                        30 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    100%
Data Units Read:                    311,084 [159 GB]
Data Units Written:                 2,127,156,742 [1.08 PB]
Host Read Commands:                 11,576,860
Host Write Commands:                8,419,516,333
Controller Busy Time:               95,656
Power Cycles:                       47
Power On Hours:                     17,402
Unsafe Shutdowns:                   17
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0


Every 2.0s: smartctl -A /dev/nvme0                           blackphoenix: Thu Jun  8 07:52:55 2023

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.12-gentoo] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF SMART DATA SECTION ===
SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x04
Temperature:                        30 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    100%
Data Units Read:                    311,797 [159 GB]
Data Units Written:                 2,131,061,845 [1.09 PB]
Host Read Commands:                 11,581,631
Host Write Commands:                8,434,917,922
Controller Busy Time:               95,837
Power Cycles:                       47
Power On Hours:                     17,426
Unsafe Shutdowns:                   17
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0


Code:
Total DISK READ:         0.00 B/s | Total DISK WRITE:         0.00 B/s
Current DISK READ:       0.00 B/s | Current DISK WRITE:       0.00 B/s
  PID  PRIO  USER     DISK READ DISK WRITE>  SWAPIN      IO    COMMAND                                             
 1161 be/3 root          0.00 B   1629.28 M  0.00 %  0.00 % [jbd2/md127-8]
  168 be/3 root          0.00 B    697.21 M  0.00 %  0.00 % [jbd2/nvme0n1p3-8]
22754 be/4 black       692.00 K    105.75 M  0.00 %  0.13 % chrome --profile-directory=Default --disable-async-dns
22799 be/4 black        52.00 K     56.47 M  0.00 %  0.07 % chrome --type=utility --uti~,13347491690870620486,262144
31748 ?dif syncthin      6.39 M     53.09 M  0.00 %  0.02 % syncthing -no-browser -home~ddress=http://127.0.0.1:8384
 3120 be/4 black       136.00 K     44.25 M  0.00 %  0.00 % liferea
 1567 be/4 root          0.00 B     28.40 M  0.00 %  0.20 % syslogd -F -m 0 -s -s
22811 be/4 black         4.00 K     23.26 M  0.00 %  0.05 % chrome --type=utility --uti~,13347491690870620486,262144
 3136 be/4 black       928.00 K      3.07 M  0.00 %  0.00 % WebKitNetworkProcess 7 18
22939 be/4 black         0.00 B      2.49 M  0.00 %  0.00 % chrome --type=renderer --cr~,13347491690870620486,262144
32232 be/4 black         0.00 B   1916.00 K  0.00 %  0.00 % chrome --type=renderer --cr~,13347491690870620486,262144
22882 be/4 black         0.00 B   1244.00 K  0.00 %  0.00 % chrome --type=renderer --cr~,13347491690870620486,262144
 1465 be/4 black         0.00 B    972.00 K  0.00 %  0.00 % chrome --type=renderer --cr~,13347491690870620486,262144
19559 be/4 root          0.00 B    624.00 K  0.00 %  0.00 % nmbd -D
23088 be/4 black         0.00 B    604.00 K  0.00 %  0.00 % chrome --type=renderer --cr~,13347491690870620486,262144
25183 be/4 black         0.00 B    276.00 K  0.00 %  0.00 % chrome --type=renderer --cr~,13347491690870620486,26214


Relevant line of /etc/fstab:
Code:
PARTUUID=8ca208e8-2e44-454a-b4ec-51e76d3acdab      /      ext4      noatime      0 1
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9691
Location: almost Mile High in the USA

PostPosted: Thu Jun 08, 2023 2:02 pm    Post subject: Reply with quote

You're still "writing" 100TB/month somehow!

is there anything funky show up in your dmesg?

What happens if you mount the disk from livecd (R/W) and wait out a similar period? Or probably at least an hour?
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Black
Apprentice
Apprentice


Joined: 10 Dec 2002
Posts: 158
Location: Québec, Canada

PostPosted: Thu Jun 08, 2023 3:48 pm    Post subject: Reply with quote

Filtering out the UFW (Uncomplicated FireWall) that I started using about 2 or 3 months ago (therefore, long after this excessive writing started), I get the output below.

I just found this page for ArchLinux stating there is an issue with that exact same drive with that exact same firmware revision. I don't have the exact same symptoms - the drive does not become unresponsive after a while, and I ran it for around 300 days without rebooting at some point. I'll have to try updating the firmware, or at least passing the kernel parameter to set a max latency to see if it changes anything.

Thanks for the livecd suggestion, I'll give that one a try as well.

Code:
[1494969.021227] nvme nvme0: I/O 704 (I/O Cmd) QID 4 timeout, aborting
[1494969.021256] nvme nvme0: I/O 512 (I/O Cmd) QID 6 timeout, aborting
[1494969.021274] nvme nvme0: I/O 513 (I/O Cmd) QID 6 timeout, aborting
[1494969.021283] nvme nvme0: I/O 514 (I/O Cmd) QID 6 timeout, aborting
[1494969.021290] nvme nvme0: I/O 515 (I/O Cmd) QID 6 timeout, aborting
[1494999.229229] nvme nvme0: I/O 28 QID 0 timeout, reset controller
[1494999.229272] nvme nvme0: I/O 704 QID 4 timeout, reset controller
[1495061.697461] nvme nvme0: Abort status: 0x371
[1495061.697464] nvme nvme0: Abort status: 0x371
[1495061.697465] nvme nvme0: Abort status: 0x371
[1495061.697466] nvme nvme0: Abort status: 0x371
[1495061.697466] nvme nvme0: Abort status: 0x371
[1495061.717464] nvme nvme0: 12/0/0 default/read/poll queues
[1495526.205328] nvme nvme0: I/O 0 (I/O Cmd) QID 11 timeout, aborting
[1495526.205358] nvme nvme0: I/O 1 (I/O Cmd) QID 11 timeout, aborting
[1495526.205377] nvme nvme0: I/O 2 (I/O Cmd) QID 11 timeout, aborting
[1495526.205385] nvme nvme0: I/O 3 (I/O Cmd) QID 11 timeout, aborting
[1495526.205392] nvme nvme0: I/O 4 (I/O Cmd) QID 11 timeout, aborting
[1495556.221333] nvme nvme0: I/O 0 QID 11 timeout, reset controller
[1495556.801271] nvme nvme0: I/O 29 QID 0 timeout, reset controller
[1495618.755318] nvme nvme0: Abort status: 0x371
[1495618.755328] nvme nvme0: Abort status: 0x371
[1495618.755333] nvme nvme0: Abort status: 0x371
[1495618.755337] nvme nvme0: Abort status: 0x371
[1495618.755341] nvme nvme0: Abort status: 0x371
[1495618.777427] nvme nvme0: 12/0/0 default/read/poll queues
[1495799.485337] nvme nvme0: I/O 64 (I/O Cmd) QID 3 timeout, aborting
[1495799.485366] nvme nvme0: I/O 65 (I/O Cmd) QID 3 timeout, aborting
[1495799.485385] nvme nvme0: I/O 66 (I/O Cmd) QID 3 timeout, aborting
[1495799.485393] nvme nvme0: I/O 67 (I/O Cmd) QID 3 timeout, aborting
[1495799.485401] nvme nvme0: I/O 68 (I/O Cmd) QID 3 timeout, aborting
[1495829.693302] nvme nvme0: I/O 64 QID 3 timeout, reset controller
[1495831.743273] nvme nvme0: I/O 28 QID 0 timeout, reset controller
[1495893.185661] nvme nvme0: Abort status: 0x371
[1495893.185665] nvme nvme0: Abort status: 0x371
[1495893.185667] nvme nvme0: Abort status: 0x371
[1495893.185668] nvme nvme0: Abort status: 0x371
[1495893.185672] nvme nvme0: Abort status: 0x371
[1495893.205196] nvme nvme0: 12/0/0 default/read/poll queues
[1496162.493336] nvme nvme0: I/O 768 (I/O Cmd) QID 8 timeout, aborting
[1496162.493365] nvme nvme0: I/O 769 (I/O Cmd) QID 8 timeout, aborting
[1496162.493385] nvme nvme0: I/O 770 (I/O Cmd) QID 8 timeout, aborting
[1496162.493397] nvme nvme0: I/O 771 (I/O Cmd) QID 8 timeout, aborting
[1496162.493418] nvme nvme0: I/O 772 (I/O Cmd) QID 8 timeout, aborting
[1496192.701334] nvme nvme0: I/O 768 QID 8 timeout, reset controller
[1496193.213322] nvme nvme0: I/O 28 QID 0 timeout, reset controller
[1496253.633570] nvme nvme0: Abort status: 0x371
[1496253.633573] nvme nvme0: Abort status: 0x371
[1496253.633574] nvme nvme0: Abort status: 0x371
[1496253.633575] nvme nvme0: Abort status: 0x371
[1496253.633575] nvme nvme0: Abort status: 0x371
[1496253.654974] nvme nvme0: 12/0/0 default/read/poll queues
[1496343.229376] nvme nvme0: I/O 128 (I/O Cmd) QID 3 timeout, aborting
[1496343.229404] nvme nvme0: I/O 129 (I/O Cmd) QID 3 timeout, aborting
[1496343.229424] nvme nvme0: I/O 130 (I/O Cmd) QID 3 timeout, aborting
[1496343.229433] nvme nvme0: I/O 131 (I/O Cmd) QID 3 timeout, aborting
[1496343.229440] nvme nvme0: I/O 132 (I/O Cmd) QID 3 timeout, aborting
[1496373.437379] nvme nvme0: I/O 128 QID 3 timeout, reset controller
[1496374.973334] nvme nvme0: I/O 28 QID 0 timeout, reset controller
[1496433.863355] nvme nvme0: Abort status: 0x371
[1496433.863371] nvme nvme0: Abort status: 0x371
[1496433.863378] nvme nvme0: Abort status: 0x371
[1496433.863383] nvme nvme0: Abort status: 0x371
[1496433.863389] nvme nvme0: Abort status: 0x371
[1496433.885341] nvme nvme0: 12/0/0 default/read/poll queues
[1497018.561425] nvme nvme0: I/O 896 (I/O Cmd) QID 1 timeout, aborting
[1497018.561455] nvme nvme0: I/O 897 (I/O Cmd) QID 1 timeout, aborting
[1497018.561475] nvme nvme0: I/O 898 (I/O Cmd) QID 1 timeout, aborting
[1497018.561484] nvme nvme0: I/O 899 (I/O Cmd) QID 1 timeout, aborting
[1497018.561491] nvme nvme0: I/O 900 (I/O Cmd) QID 1 timeout, aborting
[1497048.765429] nvme nvme0: I/O 13 QID 0 timeout, reset controller
[1497048.765474] nvme nvme0: I/O 896 QID 1 timeout, reset controller
[1497109.698422] nvme nvme0: Abort status: 0x371
[1497109.698434] nvme nvme0: Abort status: 0x371
[1497109.698438] nvme nvme0: Abort status: 0x371
[1497109.698442] nvme nvme0: Abort status: 0x371
[1497109.698445] nvme nvme0: Abort status: 0x371
[1497109.717281] nvme nvme0: 12/0/0 default/read/poll queues
[1498502.845492] nvme nvme0: I/O 704 (I/O Cmd) QID 2 timeout, aborting
[1498502.845523] nvme nvme0: I/O 705 (I/O Cmd) QID 2 timeout, aborting
[1498502.845542] nvme nvme0: I/O 706 (I/O Cmd) QID 2 timeout, aborting
[1498502.845551] nvme nvme0: I/O 707 (I/O Cmd) QID 2 timeout, aborting
[1498502.845558] nvme nvme0: I/O 708 (I/O Cmd) QID 2 timeout, aborting
[1498533.053493] nvme nvme0: I/O 704 QID 2 timeout, reset controller
[1498534.077495] nvme nvme0: I/O 29 QID 0 timeout, reset controller
[1498596.546319] nvme nvme0: Abort status: 0x371
[1498596.546331] nvme nvme0: Abort status: 0x371
[1498596.546335] nvme nvme0: Abort status: 0x371
[1498596.546339] nvme nvme0: Abort status: 0x371
[1498596.546347] nvme nvme0: Abort status: 0x371
[1498596.572858] nvme nvme0: 12/0/0 default/read/poll queues
Back to top
View user's profile Send private message
Anon-E-moose
Watchman
Watchman


Joined: 23 May 2008
Posts: 6103
Location: Dallas area

PostPosted: Thu Jun 08, 2023 7:35 pm    Post subject: Reply with quote

That's an insane amount of disk writes, for that short a time.

I suppose it could happen if, you "emerge -e" several times a day or are using some part of the nvme for swap space and keep running out of memory or you have several things writing to /var/log/<something> constantly.

Edit to add: From kingston site about smart data
Quote:
For the NVM command set, logical blocks written as part of Write operations shall be included
in this value. Write Uncorrectable commands shall not impact this value.


Not sure what constitutes a logical block (as opposed to physical block) but might explain the high write amount if logical is much larger than physical.
_________________
PRIME x570-pro, 3700x, 6.1 zen kernel
gcc 13, profile 17.0 (custom bare multilib), openrc, wayland
Back to top
View user's profile Send private message
toralf
Developer
Developer


Joined: 01 Feb 2004
Posts: 3925
Location: Hamburg

PostPosted: Thu Jun 08, 2023 8:38 pm    Post subject: Reply with quote

At the tinderbox I made similar experiences.

From the smartctl values about 1.4 PB of data were written in the last 2 yrs to a BTRFS filesystem spawned at 2 partitions of 2 NVMe drives.
This are about 24 MiB/sec . The Grafana metrics node_disk_written_bytes_total (I do use it since 2 months) told me the same.
What is interesting, is that this value dropped down to 9-10 MiB/sec since kernel 6.3.x. And nothing else was changed at the server.

The emerge is made using a tmpfs for /var/tmp/portage.

FWIW, the nightly house keeping process here - which deletes about 10-100 GB of old data at that file system- is shown as node_disk_written_bytes_total too. So there's a big discrepancy between the house kept space and the reported written space, the factor is still 10x or 20x.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9691
Location: almost Mile High in the USA

PostPosted: Thu Jun 08, 2023 8:44 pm    Post subject: Reply with quote

Based on the 24 hour sample, the iotop written bytes and the hard drive written bytes actually seem to correspond (assuming 512-byte logical blocks), but that 24 hour sample of 700MB/day, if sustained, would only 22GB/month, nowhere the 100TB that was measured.

Is your ext4 filesystem formatted for 512 or 4096 blocks? 512 byte blocks or perhaps partition alignment problems could cause some extraneous writes.

Did you even expect that 700MB writes that one day?

I never took a look at how much I write per day...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Black
Apprentice
Apprentice


Joined: 10 Dec 2002
Posts: 158
Location: Québec, Canada

PostPosted: Thu Jun 08, 2023 8:57 pm    Post subject: Reply with quote

eccerr0r wrote:
Did you even expect that 700MB writes that one day?


No, all I'm doing is browsing the net, watch a few Youtube videos. Syncthing is for my LAN, not much happening there. The PC is taking backups at night, but sending them to another PC, so that's just reading, not writing. /var/log doesn't seem to move much (if at all) when I look at it - though UFW seems to write by burst. I just turned it off and reran smartctl, I'll check again tomorrow to see if there's a change.
Back to top
View user's profile Send private message
Anon-E-moose
Watchman
Watchman


Joined: 23 May 2008
Posts: 6103
Location: Dallas area

PostPosted: Thu Jun 08, 2023 9:43 pm    Post subject: Reply with quote

I suppose you could have buggy firmware.

If interested you could find the firmware rev and check google for that model and firmware version to see if there are reported problems.
_________________
PRIME x570-pro, 3700x, 6.1 zen kernel
gcc 13, profile 17.0 (custom bare multilib), openrc, wayland
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9691
Location: almost Mile High in the USA

PostPosted: Thu Jun 08, 2023 9:47 pm    Post subject: Reply with quote

might be interesting to take a log snapshot every day and see if there's some anomalous behavior, but missing one day of 40MB/sec writes all day is kind of hard to make up - so it's probably not demand writes, more like firmware or consequential writes going on.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Anon-E-moose
Watchman
Watchman


Joined: 23 May 2008
Posts: 6103
Location: Dallas area

PostPosted: Thu Jun 08, 2023 9:50 pm    Post subject: Reply with quote

Do you have discard turned off and run fstrim periodically
_________________
PRIME x570-pro, 3700x, 6.1 zen kernel
gcc 13, profile 17.0 (custom bare multilib), openrc, wayland
Back to top
View user's profile Send private message
Black
Apprentice
Apprentice


Joined: 10 Dec 2002
Posts: 158
Location: Québec, Canada

PostPosted: Fri Jun 09, 2023 12:57 am    Post subject: Reply with quote

Anon-E-moose wrote:
Do you have discard turned off and run fstrim periodically


Unless there's something specific to disable discard, I did not willingly turn it on - the "discard" option is not in my fstab. I actually didn't even know about that until about a month ago when I made the first post of this thread.

As to the fstrim command, I tried running it once in a dry-run and it said it didn't have anything to do:

Code:
blackphoenix / # fstrim -n -v /
/: 0 B (dry run) trimmed
Back to top
View user's profile Send private message
Goverp
Advocate
Advocate


Joined: 07 Mar 2007
Posts: 2014

PostPosted: Fri Jun 09, 2023 9:15 am    Post subject: Reply with quote

I've sometimes wondered if an overzealous combination of logging (or writing anything) and syncing can cause this sort of problem - the zeal being to sync after every line rather than let the kernel do its thing. Not syncing means the writes would be buffered, and there's a danger of losing the last buffer(s) if there's a power outage, but the cost of syncing on SSD or NVMe is a write (=new block allocated and written for some value of "block" for every single line...) I presume databases and other transactional mechanisms have some way round this for their journals; alternatively, just ensure the journal is on spinning rust.

FWIW I have a 5 disk RAID 10 array that I use for /home,and /var/tmp, and run emerges in a chroot in /home/packager/chroot to create binary packages, and then install the binpkgs into the root filesystem on NVMe, so all the compilation stuff happens on spinning disks.
_________________
Greybeard
Back to top
View user's profile Send private message
Black
Apprentice
Apprentice


Joined: 10 Dec 2002
Posts: 158
Location: Québec, Canada

PostPosted: Fri Jun 09, 2023 2:51 pm    Post subject: Reply with quote

Just another data point: I discovered the inotifywait command, so I'm running it on /tmp.

In the last 2.5 hours, Google Chrome is the only process that has written anything in /tmp - 740 times, either create, modify, or delete a file there. For most of the time, Chrome is actually just sitting there, as I'm working away on another PC. It doesn't mean it's all Chrome's fault, but it sure doesn't help. I guess I should run inotifywait on the entire partition.

I also ran inotifywait on /var/log, and only 75 writes were done in the same time period.
Back to top
View user's profile Send private message
Anon-E-moose
Watchman
Watchman


Joined: 23 May 2008
Posts: 6103
Location: Dallas area

PostPosted: Fri Jun 09, 2023 3:12 pm    Post subject: Reply with quote

Do you have nvme-cli installed? It has lots of useful options for nvme investigation.
_________________
PRIME x570-pro, 3700x, 6.1 zen kernel
gcc 13, profile 17.0 (custom bare multilib), openrc, wayland
Back to top
View user's profile Send private message
Black
Apprentice
Apprentice


Joined: 10 Dec 2002
Posts: 158
Location: Québec, Canada

PostPosted: Fri Jun 09, 2023 3:41 pm    Post subject: Reply with quote

Anon-E-moose wrote:
Do you have nvme-cli installed? It has lots of useful options for nvme investigation.


Yes I do, but I haven't used that before. Any hint as to which commands to look at?

Thank you (and everyone else)!
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9691
Location: almost Mile High in the USA

PostPosted: Fri Jun 09, 2023 3:50 pm    Post subject: Reply with quote

Searching the web, I get a lot of hits on kingston ssds having this behavior...

Currently I only have Intel, Samsung(mPCIe) Patriot (mPCIe), HP, and Micron/Crucial SSDs ... they don't seem to exhibit this behavior though the Samsung I accidentally swap stormed on and ate a chunk of its life ...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum