How can I find the reason my PC crashes?

gohixo9650 ( @gohixo9650@discuss.tchncs.de ) · edit-2 1 year ago

How can I find the reason my PC crashes?

MNByChoice ( @MNByChoice@midwest.social ) · 1 year ago

Boot with a live CD from a different distro. This will split a hardware issue from a software issue.

InputZero ( @InputZero@lemmy.ml ) · 1 year ago

This! Try this! Don’t go taking your computer apart until you try this. It’s great advice.

Possibly linux ( @possiblylinux127@lemmy.zip ) · 1 year ago

gohixo9650 ( @gohixo9650@discuss.tchncs.de ) · 1 year ago

thanks! nice idea, i’ll try it

Atemu ( @Atemu@lemmy.ml ) · 1 year ago

Not really. Distros usually build the same software slightly differently. If the bug is in a piece of software used by all distros such as the Linux kernel, it won’t make a difference.

CapnElvis ( @CapnElvis@kbin.social ) · 1 year ago

Test your RAM. I had a machine doing this a few years ago - turns out I had a stick of RAM with a 128k block somewhere in the middle that was dead.

That machine worked fine as long as I didn’t get it doing anything too intensive, then it would crash. A new stick of RAM solved the issue.

Avid Amoeba ( @avidamoeba@lemmy.ca ) · 1 year ago

This is the most likely issue. To add - test 3-4 passes of Memtest86+. The first pass is shorter and meant for finding egregious RAM problems. It can fail on subsequent full passes. I had my RAM fail on 3rd of 4th pass which passed the 1st. It could even be caused by incompatibility of the size of RAM with the platform. For example in my case AMD supported 2x 8GB sticks of this RAM with no issues. Insert 4x 8GB and it starts producing errors even if each individual stick passes with flying colors.

lemmyvore ( @lemmyvore@feddit.nl ) · edit-2 1 year ago

Seconded. I’d been having issues (random freezes, crashes) for a while but I had attributed then to a lack of RAM. So I bought some more RAM at some point and ran memtest on all RAM together and saw errors. Those bastards, they sold me dodgy RAM, right? Tested the new sticks individually, they were clean. Turns out I had a bad 64kb area on one of my old sticks.

You can tell the kernel to not use the bad area btw if it’s all in one place, so don’t necessarily rush out to replace the bad stick.

gohixo9650 ( @gohixo9650@discuss.tchncs.de ) · 1 year ago

thanks. I run memtest for about an hour and no errors. I’ll leave it run more if nothing else shows any progress

mnmalst ( @mnmalst@lemmy.zip ) · 1 year ago

I was in a similar situation not too long ago and couldn’t find anything to fix it either at first. One thing that was high on my list was changing my PSU since a defect or weak one often seems to be a problem in such cases. Besides a general hardware failure of course. If it’s the hardware that could be anything really. Motherboard, RAM, GPU, PSU. PSU is the easiest to switch tho, so if you go that route I would try that first.

Anyways, I never had to do this cause in my case, believe it or not, a BIO update fixed my problem. I am still not 100% sure what happened but I think the update fixed the GPU voltage distribution or something similar.

Hope that help at least a little bit.

gohixo9650 ( @gohixo9650@discuss.tchncs.de ) · 1 year ago

good idea about the PSU. I hadn’t thought of that. The PSU is not any high-performance/high-quality and is already 5 years old. Being unable to provide the required voltage may be a possibility if we accept that the performance degrades in time. (Was working without issues for 5 years in the same PC configuration).

I think I’ll try by first removing the extra HDDs so reducing the load and check again. Thanks for your input

dylanmorgan ( @dylanmorgan@slrpnk.net ) · 1 year ago

If your processor/MB has onboard video, it would probably be easier to pull the gpu and test. If you still suspect power management, pulling other components like additional HDDs after adding the gpu back would confirm it.

mortrek ( @mortrek@lemmy.ml ) · 1 year ago

Can you be more specific when you say “plays videos”?

Like in vlc, or YouTube, or something else? What videos? Like, 4k hevc videos, or literally anything?

gohixo9650 ( @gohixo9650@discuss.tchncs.de ) · 1 year ago

mostly on youtube, usually at 720p30fps. I think if I go to 60fps it crashes even faster. Also I’ve tried watching on freetube and on firefox + mpv, but it can crash in all combinations

ourob ( @ourob@discuss.tchncs.de ) · 1 year ago

If you have another pc, ssh from it to the problem machine and run sudo dmesg -w. That should show kernel messages as they are generated and won’t rely on them being written to disk.

gohixo9650 ( @gohixo9650@discuss.tchncs.de ) · 1 year ago

i will try it but I’m quite confident that it will be unresponsive/not reachable since if the kernel was listening it would respond to the alt + PrntScr + REISUB by unmounting the drives and I would see it when I examine the logs afterwards

ourob ( @ourob@discuss.tchncs.de ) · 1 year ago

To be clear, dmesg -w should be run before you do anything to cause the crash. It will continuously print kernel output until you press ctrl+c or the kernel crashes.

In my experience, a crashing kernel will usually print something before going unresponsive but before it can flush the log to disk.

gohixo9650 ( @gohixo9650@discuss.tchncs.de ) · 1 year ago

from now on I’m having it open and shown on the screen show if it happens again I’ll see it. thanks

Possibly linux ( @possiblylinux127@lemmy.zip ) · 1 year ago

Try this

https://memtest.org/

gohixo9650 ( @gohixo9650@discuss.tchncs.de ) · 1 year ago

thanks. I run memtest for about an hour and no errors. I’ll leave it run more if nothing else shows any progress

Possibly linux ( @possiblylinux127@lemmy.zip ) · 1 year ago

It should say “PASS” if your rams good. It can take a while depending on your ram speed and amount.

gohixo9650 ( @gohixo9650@discuss.tchncs.de ) · 1 year ago

it reached to the point that says “Pass complete, no errors, press Esc to exit” but the test still runs. So yes, I will re-do it for the additional passes, but from a first look, it looked fine.

Possibly linux ( @possiblylinux127@lemmy.zip ) · 1 year ago

In that case I’m pretty sure you have a issue with your gpu. Have you tried reseating it in the PCI slot? If you have a second GPU you could test you setup with it instead

gohixo9650 ( @gohixo9650@discuss.tchncs.de ) · 1 year ago

yeah, I will try these today…

MiddledAgedGuy ( @MiddledAgedGuy@beehaw.org ) · 1 year ago

I think a live boot cd or trying to use an integrated gpu, if available, (both I saw already suggested) are better steps but you could also try blacklisting nvidia and use nouveau. Could point to those drivers if it works ok.

plasticcheese ( @plasticcheese@lemmy.one ) · edit-2 1 year ago

I had a similar problem a while back and it turned out to be my Asus motherboard’s “AI” frequency control hard locking the system. Took me days of troubleshooting and headaches to figure this out. Ended up switching it off in BIOS and everything is stable now. Just my 2c.

Hubi ( @Hubi@feddit.de ) · edit-2 1 year ago

Have you checked /var/log/syslog?

If not, see if there’s anything around the time of the crash there that indicates a GFX problem, like “GPU has fallen off the bus”.

gohixo9650 ( @gohixo9650@discuss.tchncs.de ) · 1 year ago

yes. As I wrote

After I reboot I have tried checking all logs available and I cannot find anything logged right before the incident. Last entries are always different and not indicating anything.

Urist ( @Urist@lemmy.ml ) · 1 year ago

Just to be certain, you have checked journalctl too?

gohixo9650 ( @gohixo9650@discuss.tchncs.de ) · edit-2 1 year ago

isn’t this a unified way to present logs that also exist in var/log ? I mean if the logs are saved in var/log I’ve checked them. If there is a possibility that journalctl has more entries, then I need to check this too.

Urist ( @Urist@lemmy.ml ) · 1 year ago

Seems you might know more than me. When I had an obscure crash related to my pc going into C-sleep state, I managed to find a pattern viewing the logs in reverse from the time of the crash.

axzxc1236 ( @axzxc1236@lemm.ee ) · edit-2 1 year ago

On my system there is no traditional log files like kern.log or message (Not sure about Ubuntu 22.04), I would say it’s worth a try.

Try journalctl --boot -1 -xe or journalctl --boot -1 -xep3

Lubuntu

cobra89 ( @cobra89@beehaw.org ) · 1 year ago

Maybe for some logs but no, systemd logs are stored in binary format and can only be accessed with journalctl. I would definitely give that command a shot.

gohixo9650 ( @gohixo9650@discuss.tchncs.de ) · 1 year ago

will check this too then. Thanks

qjammer ( @qjammer@lemmy.ml ) · 1 year ago

I read you mentioned firefox. I had a similar experience a while ago, related to this bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1704774#c13

CuttingBoard ( @CuttingBoard@sopuli.xyz ) · 1 year ago

Re seat your RAM modules.