View previous topic :: View next topic |
Author |
Message |
dystopic_utopia n00b
Joined: 17 Sep 2019 Posts: 12
|
Posted: Sun Mar 29, 2020 4:39 pm Post subject: [Solved] System freeze. possible hard drive failure? |
|
|
I am posting this thread here, since it didn't really seem to fit into the categories.
Yesterday, I had the rare, in my experience, occurrence of my system freezing up. I mean the type of freeze up where ctrl+alt+F1-F6 would not bring up a console, let alone any noticeable change in my system. After giving it around 10 minutes to see if it would right itself, I did a hard power down of my laptop. The problem occurred while I was running emerge on seamonkey-2.53-r1, which I wouldn't think would be the issue, since I have compiled the prior version, let alone much of the other packages on my system. I have booted from a systemrescuecd image (I know it is Arch based now, but the copy of the minimal installation image I used, didn't seem to have smartctl on it), and have ran smartctl on the hard drive that came with this laptop.
Here is the output of smartctl --all /dev/sda after running smartctl --test=long: http://dpaste.com/3E37YV8
Edit: removed period.
My only assumption is that any errors from a storage device means, "prepare for immanent failure", let alone 612 of them. Could these have been the cause of my system hanging, especially during a build?
If it is at all relevant, my Gentoo partition is a LUKS->LVM->ext4 setup.
I haven't booted into Gentoo, let alone the OS that came with this laptop, since the freeze. I am thinking about mounting my root partition read-only, after decrypting, activating and fsck.ext4-ing it, so I can see if anything was logged, before the freeze. As well as, at least backup anything in /home/, at least. I will post back if I find anything that might be useful.
The laptop is a refurbished (I know, but I am on budget) Acer Nitro 5. Additional specs to be posted, if they become relevant. My concern of course is, did I get a faulty hard drive, as well as the system freeze itself.
Though the laptop is still within the 90 day warranty period, I am going to assume that installing any non-Windows OS is an automatic void of any warranty. I of course will deal with that issue myself, if I decide to contact Acer about it. If anything, I could stick the hard drive that the original Gentoo install came, from into this machine.
Anyway, thanks for reading, and as always, thanks in advance for any input from the Gentoo community.
[Moderator edit: fixed url. Forum auto-linking considers trailing periods to be part of the URL. [Fix made after Goverp posted.] -Hu]
Last edited by dystopic_utopia on Tue Mar 31, 2020 1:25 pm; edited 2 times in total |
|
Back to top |
|
|
Goverp Advocate
Joined: 07 Mar 2007 Posts: 2014
|
Posted: Sun Mar 29, 2020 5:10 pm Post subject: |
|
|
A) The URL you gave includes the trailing '.', so it doesn't work.
B) When I use the right one, the report says the drive is healthy. According to some gurus I found with Google, the 610 CRC errors are data transmission between the drive and motherboard - i.e. a bad cable or socket. I wonder if it's something to do with unexpected power-offs (it is a laptop). More interesting is the drive has done 2037 hours, say about 18 months use at 5 hours a day 5 days a week, and has been powered up nearly 900 times. I suspect you can ignore the CRC errors, they're only about one every 3 hours. The hardware will have retried them anyway. The one to worry about is reallocated sectors, and there aren't any.
C) I've found that sort of lockup when I've run out of RAM running a big compilation, such as qtwebengine (=chromium), err, firefox (=seamonkey) and their ilk. Nothing works for me, as the box has to page software in and out to find the program to handle the interrupt. Before blaming the hardware, I'd check you aren't overstressing the system. Typical snafus include "-jtoomany", "--jobs=toomany", not having enough swap space, using tmpfs for portage temp disk and leaving nothing over for the compiler, and so forth. _________________ Greybeard |
|
Back to top |
|
|
dystopic_utopia n00b
Joined: 17 Sep 2019 Posts: 12
|
Posted: Sun Mar 29, 2020 9:24 pm Post subject: |
|
|
Thank you Goverp, for your response.
I did not find anything helpful in the logs. I also opened up the case and checked that the SATA connected was firmly attached to the hard drive. I do not have any spare cables of that type, let alone open the case enough to see if it is hard wired to the motherboard. I ended up booting Gentoo, and all seems normal so far.
18 months of usage? Could the hard drive they stuck in this model have been a "gently used" component? I do not see how I could have racked up that much time, after having it only a little over two months.
I will just have keep an eye on it, for now. As for seamonkey, I will probably try sorting that one more time. Otherwise, there is always the binary release. |
|
Back to top |
|
|
Ant P. Watchman
Joined: 18 Apr 2009 Posts: 6920
|
Posted: Sun Mar 29, 2020 10:47 pm Post subject: |
|
|
The hard disk is fine. CRC errors being caught means they're doing their job (preventing silent corruption). They're infrequent enough that it's probably just a slightly flaky cable.
If it's hanging during emerge you either have an overheating issue or OOM problems. Heat is far more likely for a laptop. |
|
Back to top |
|
|
dystopic_utopia n00b
Joined: 17 Sep 2019 Posts: 12
|
Posted: Tue Mar 31, 2020 1:24 pm Post subject: |
|
|
Thank you Ant P. for the extra reassurance about the health of my hard drive.
It seems that Goverp's suggestion about not having enough swap space might have been my problem. Since my last post, I have resized my swap partition, and was able to emerge seamonkey successful. There was a brief moment of anxiety during the emerge, when I trying switch to another xfce-terminal tab that top was running on caused a momentary hang. However, it switched after a second and showed typical resource usage when running emerge. I even ran sensors from the lm_sensors package, and it showed that the temperature of each core wasn't at its peak temperature.
=
I guess the lesson for this situation is to never underestimate the need for swap space, even with modern systems that have have 8GB+ of ram. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54317 Location: 56N 3W
|
Posted: Tue Mar 31, 2020 1:37 pm Post subject: |
|
|
dystopic_utopia,
As its a laptop and its an interface issue,
Code: | ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 610 |
its worth removing the HDD and replacing it to 'wipe' the connector pins.
This will fix it if its not seated properly for some reason too.
You won't have a data cable that you can replace. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
JustAnother Apprentice
Joined: 23 Sep 2016 Posts: 186
|
Posted: Tue Aug 25, 2020 3:31 am Post subject: |
|
|
With a laptop, always put it on one of those fan bases, and always put spacers under the fan base feet to get the fan base 1/2 inch off the underlying surface. I cut up some erasers to make the spacers.
My Toshiba laptop acted up (i.e., mysterious shutdowns) until I did the above. My sister gave it to me because it always shut down. Finally I asked her how she held it. She put it on her legs.
Also, hit the fan section in the laptop with a blast of canned air. But stick a pencil in there to keep the fan from spinning up. The back emf from the fan can blow things up. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|