STRESS TESTING YOUR COMPUTER
BACKGROUND
----------
Today's computers are not perfect. Even brand new systems from major
manufacturers can have hidden flaws. If any of several key components such
as CPU, memory, cooling, etc. are not up to spec, it can lead to incorrect
calculations and/or unexplained system crashes.
Overclocking is the practice of increasing the speed of the CPU and/or
memory to make a machine faster at little cost. Typically, overclocking
involves pushing a machine past its limits and then backing off just a
little bit.
For these reasons, both non-overclockers and overclockers need programs
that test the stability of their computers. This is done by running
programs that put a heavy load on the computer. Though not originally
designed for this purpose, this program is one of a few programs that
are excellent at stress testing a computer.
RESOURCES
---------
This program is a good stress test for the CPU, memory, L1 and L2 caches,
CPU cooling, and case cooling. The torture test runs continuously, comparing
your computer's results to results that are known to be correct. Any
mismatch and you've got a problem! Note that the torture test sometimes
reads from and writes to disk but cannot be considered a stress test for
hard drives.
You'll need other programs to stress video cards, PCI bus, disk access,
networking and other important components. In addition, this is only one
of several good programs that are freely available. Some people report
finding problems only when running two or more stress test programs
concurrently. You may need to raise prime95's priority when running two
stress test programs so that each gets about 50% of the CPU time.
Forums are a great place to learn about available stability test programs
and to get advice on what to do when a problem is found.
The currently popular stability test programs are (sorry, I don't have
web addresses for these):
Prime95 (this program's torture test)
3DMark2001
CPU Stability test
Sisoft sandra
Quake and other games
Folding@Home
Seti@home
Genome@home
Several useful websites for help (look for overclocking community or forum):
http://www.overclockers.com
http://www.arstechnica.com
http://www.hardocp.com
http://www.anandtech.com
http://www.tomshardware.com
http://www.sharkyextreme.com
Also try the alt.comp.hardware.overclocking Usenet newsgroup.
Utility programs you may find useful (I'm sure there are others - look around):
Motherboard monitor from
http://mbm.livewiredev.com
Memtest86 from
http://www.memtest86.com
Cpuburn by redelm:
http://pages.sbcglobal.net/redelm/
TaskInfo2002 from
http://www.iarsn.com/
WHAT TO DO IF A PROBLEM IS FOUND?
---------------------------------
The exact cause of a hardware problem can be very hard to find.
If you are not overclocking, the most likely cause is an overheating CPU
or memory DIMMs that are not quite up to spec. Another possibility is
you might need a better power supply. Try running MotherBoard monitor
and browse the forums above to see if your CPU is running too hot.
If so, make sure the heat sink is properly attached, fans are operational,
and air flow inside the case is good. For isolating memory problems, try
swapping memory DIMMs with a co-worker's or friend's machine. If the errors
go away, then you can be fairly confidant that memory was the cause of
the trouble. A power supply problem can often be identified by a significant
drop in the voltages when prime95 starts running. Once again the overclocker
forums are a good resource for what voltages are acceptable.
If you are overclocking then try increasing the core voltage, reduce the
CPU speed, reduce the front side bus speed, or change the memory timings
(CAS latency). Also try asking for help in one of the forums above - they
may have other ideas to try.
CAN I IGNORE THE PROBLEM?
-------------------------
Ignoring the problem is a matter of personal preference. There are
two schools of thought on this subject.
Most programs you run will not stress your computer enough to cause a
wrong result or system crash. If you ignore the problem, then video games
may stress your machine resulting in a system crash. Also, stay away from
distributed computing projects where an incorrect calculation might cause
you to return wrong results. Bad data will not help these projects!
In conclusion, if you are comfortable with a small risk of an occasional
system crash then feel free to live a little dangerously! Keep in mind
that the faster prime95 finds a hardware error the more likely it is that
other programs will experience problems.
The second school of thought is, "Why run a stress test if you are going
to ignore the results?" These people want a guaranteed 100% rock solid
machine. Passing these stability tests gives them the ability to run
CPU intensive programs with confidence.
FREQUENTLY ASKED QUESTIONS
--------------------------
Q) My machine is not overclocked. If I'm getting an error, then there must
be a bug in the program, right?
A) The torture test is comparing your machines results against
KNOWN CORRECT RESULTS. If your machine cannot generate correct
results, you have a hardware problem. HOWEVER, if you are failing
the torture test in the SAME SPOT with the SAME ERROR MESSAGE
every time, then ask for help at
http://mersenneforum.org - it is
possible that a recent change to the torture test code may have
introduced a software bug.
Q) How long should I run the torture test?
A) I recommend running it for somewhere between 6 and 24 hours.
The program has been known to fail only after several hours and in
some cases several weeks of operation. In most cases though, it will
fail within a few minutes on a flaky machine.
Q) Prime95 reports errors during the torture test, but other stability
tests don't. Do I have a problem?
A) Yes, you've reached the point where your machine has been
pushed just beyond its limits. Follow the recommendations above
to make your machine 100% stable or decide to live with a
machine that could have problems in rare circumstances.
Q) A forum member said "Don't bother with prime95, it always pukes on me,
and my system is stable!. What do you make of that?"
or
"We had a server at work that ran for 2 MONTHS straight, without a reboot
I installed Prime95 on it and ran it - a couple minutes later I get an error.
You are going to tell me that the server wasn't stable?"
A) These users obviously do not subscribe to the 100% rock solid
school of thought. THEIR MACHINES DO HAVE HARDWARE PROBLEMS.
But since they are not presently running any programs that reveal
the hardware problem, the machines are quite stable. As long as
these machines never run a program that uncovers the hardware problem,
then the machines will continue to be stable.