[OT] PC Hardware Problem

It's been ages since I kept current with PC technology, so I wanted to run this by some of you, to see if it lights any bulbs.

One of my boxen runs for a while, then (in Linux at least) kernel panics and resets (in Windows it resets, but I haven't stood over it to know if Windows notices the problem). My kid and I were working on it today to reinstall Ubuntu on the theory that the software was just royally screwed, which is when I noticed the kernel panicking.

It acts like a thermal problem -- leave it off for a long time and it takes a long time to have a problem, use it a lot and it happens much more often. All the fans work, and at one point I was able to monitor the various system temperatures which showed OK, so it's not something simple like the processor overheating.

At this point I'm about ready to start swapping parts, but part-swapping costs $$, so I thought I'd ask the group if these symptoms sound familiar, and if you found out anything specific to go with them.

Reply to
Tim Wescott
Loading thread data ...

I had the same problem. Starting a few weeks ago my computer started freezing up mainly after I would leave it(say in over night). It turned out that the heatsink compound was dried up... fixed that and it's been running fine ever since(about 2 weeks).

Of course it could potentially have been something else but that seems to have been the issue. What happened was the thermal compound was relatively dry and I guess wasn't making good enough contact and would eventually cause the thermal sensor to trip(most modern CPU's have a shutdown mode to prevent damage).

I was monitoring the temp too but since it always happened when I was off(except the last few times) I never knew what was going on and imagined it couldn't be overheating when I wasn't on it(since it was basically in idle) but after replacing the compound no issues at all.

Anyways, it's worth a look...

It could be the memory or PS... usually one of those is the issue(Which is why I figured it was my memory since I have a monster PS).

Reply to
Jon Slaughter

Memory SIMMS can cause that...or something else. Take all cards out, clean contacts and reseat them and all the cables. If more than one SIMM, try to run on one or the other and move it to different slots. Disconnect all devices and drives to see if it will act-up with just the MB, processor and a SIMM. Examine all the capacitors on the MB with a magnifying glass to look for bulges and splits. Do you smell anything? Pull the cover and clean out the power supply. I've never seen heatsink compound fail but good idea to redo it. Sometimes if you just give it a ride in the trunk of a car for a few days it will work fine.

Reply to
Buerste

Have you tried running memtest86? It's usually an option when you're in grub. Also clean the heatsink fins if you haven't Mine were packed but you couldn't tell till you removed the fan.

Reply to
kfvorwerk

Definitely run memtest86 for a few hours. We had a similar problem at work, turned out to be memory. It did not show up until we ran memtest86 for a few hours.

i
Reply to
Ignoramus15530

Others have made good suggestions regarding the PSU and the memory. Very often it's just the contacts rather than a defective piece of hardware. I suggest you remove the RAM sticks and clean the contacts, taking the usual precautions against ESD. Brush and blow the slots too. You didn't say how old the computer is. If it's using a PATA HDD, check the Molex power connector by pushing and pulling at the wires _individually_ to see if one or more are a bit loose. I've come across numerous cases in which symptoms like the one you're having now are caused by either of these two possibilities.

Reply to
pimpom

Could you tell if the problem was a poor connection? I've found that wiggling connectors solves the majority of electronic problems. When I was testing and repairing field returns over half the boards had no problem although they came from competent techs at major semiconductor manufacturers. We assumed that swapping the board cleaned the contacts, and they sent the old one back as a precaution. Those boards almost never came back a second time.

I suspected that ammonia from floor cleaning compounds was attacking the copper underneath the gold fingers. I couldn't convince management to pay to have that tested, but they did confirm that silicone from candy bar wrappers contaminated boards and caused poor solder joints. It's applied to make them slide out of vending machines better.

jsw

Reply to
Jim Wilkins

My first thought on seeing the kernel panic was Memtest. Under Memtest it quietly locks up and resets.

Reply to
Tim Wescott

That sounds like bad electrolytics in the power supply for the CPU. If you keep using it, it will reach a point where it won't even boot. There are about a half dozen low ESR electrolytics in parallel. The AC current through them causes them to heat up. As the ESR rises, the capacitor dries up, causing more jheat until it fails. If you are good with a soldering iron I would replace the electrolytics. Make sure to use brand name low ESR 105° C parts.

Some motherboards have a couple extra sets of holes where they either went with a higher capacitance, or cut corners.

Reply to
Michael A. Terrell

Have you taken a good look at the caps on the motherboard??? Mine had started misbehaving about a month back and got to the point it sometimes would not boot. 5 33000 mfd caps had "popped".

Take a look at the top of the cans - they have 3 lines scored in them. If they are at all convex they are shot. I replaced all 5 caps last night and it is working fine

Reply to
clare

Thanks to all. We've been reminded of stuff we knew, used your responses to order our attack, and had the obvious (dust in the heat sink) pointed out.

And there _was_ enough dust in the CPU heat sink to insulate a house. I don't know if that was the problem, but it sure could have been.

Reply to
Tim Wescott

While not necessarilly applicable here I thought I'd mention it. A friends daughter, 14, was complaining of her laptop BSODing regularly and we investigated and found no problems but suspected a heat related cause. In the end it seems she like to use the laptop on the carpet or her lap and both situations blocked the CPU cooling causing the issue. When used on a suitable flat surface that didn't block the cooling all was fine.

Reply to
David Billington

Even a small hair between the CPU heatsink and the CPU can cause a thermally induced system crash. It also could be a defective hard drive or in the case of some motherboards leaking and swelling electrolytic capacitors.

Reply to
RFI-EMI-GUY

Also suspect the power supply. Some PSUs are such POSs that they will try to kill the mombo when (not if) the e-caps dry out. I won't buy a PSU without OVP.

Best regards, Spehro Pefhany

Reply to
Spehro Pefhany

What did you use for heat sink compound? Just the usual white silicone goo like I may find in my 30-year-old tube? This has the magic melting elastomeric stuff that came with the CPU.

We've replaced the power supply, and drives, and played 'swap the memory' games -- still does it. The caps on the mobo look good, so either it's a bad cap that's not visibly bad, or it's a CPU fit issue.

If I have good heat sink compound I think I'll give that a go.

Reply to
Tim Wescott

Ok, Tim, here is how to locate the problem...

Make a cone of paper that will fit over a component to be tested. Big end up - little end fits the device to be tested. Printer paper and tape work fine. You'll probably wind up with several odd shaped cones for computer parts.

Use a hair dryer to blow WARM air onto the part for a few seconds to try to fail a part.

Use a freon can the same way to try to recover a part.

Try to control your spray area carefully so as to affect only one part at a time.

With computers this is more difficult because once a computer goes crazy it must usually be cooled and restarted before it will run right. So you probably want to plan an attack that keeps parts cool to prevent the crazies rather than cause them.

Old light aircraft autopilots and other avionics are about the only thing expensive enough these days to warrant fixing rather than just replacing. That's where I learned this trick.

Reply to
cavelamb

(...)

(...)

Yup. We used to call Freon 11 "Tech-in-a-can".

--Winston

Reply to
Winston

This could be the dreaded capacitor problem, prevalent on many motherboards made about 2001. Dell had a $100 million charge against earnings for the warranty work on this mess. There are guys on eBay who sell kits of replacement caps for about $20, but you have to ID the right cap values for your specific mobo, and it takes some serious soldering know-how to replace them without frying the board. Look at the mobo, near the CPU, for bulging, burst, discolored caps or those with a yellow crusty mess oozing from them.

I repaired a machine at work this way, I'm pretty sure it exhibited your exact symptoms before going completely dead.

Jon

Reply to
Jon Elson

Only problem with freeze spray is that if there is high humidity, you end up frosting good circuitry that upon melting starts flaking out and maybe even developing corrosion and conductive tracks then you waste time figuring out what just happened. I am not big on that method here in humid Florida.

Reply to
RFI-EMI-GUY

We always kept the lab comfortable WRT both temperature and humidity. Never saw an issue caused by condensation, probably because it dried quickly and there wasn't anything it could pick up that would've made it conductive.

I feel lucky that I was out of that business by the time that lead-free solder became prevalent. I don't like that stuff. :)

--Winston

Reply to
Winston

PolyTech Forum website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.