Intro to Computer Systems

Chapter 10: Equipment Failure

Reliability Strategies

Most failure modes of computer components are rather well understood, and as designs mature, they usually have modifications and enhancements to either remove or reduce the risk of failure.

Chassis and Cooling Failure

The main solution to chassis and cooling failure is to implement safety margins in the engineering of the case, and to use better quality discrete components.

A redundant power supply.
A redundant power supply.

Chip Failure

There are a number of strategies that can reduce or prevent catastrophic failure of processors and other semiconductors.

A system designer can also be proactive, in underclocking the component: this is where chips are run at a speed lower than that rated by the manufacturer. This is a common technique in mobile phone design, to limit power consumption and heat generation.

Storage Device Failure

Mechanical Mass Storage

Mechanical failure modes of storage devices can be mitigated through the use of quality components and new technologies, such as fluid bearings instead of ball bearings. Other active data safety features such as automatic head parking, and acclerometer-controlled head parking, can prevent most occasional risks of head crashing.

A sample of SMART data.
A sample of SMART data.

Hard disks also have much greater diagnostic abilities with the SMART (Self-Monitoring, Analysis and Reporting Technology) system. This monitors a number of telltale signs of a hard disk's health, such as the time taken to spin up to speed, the number of read or write errors at the head, how fast it can seek for information, etc. and check for any degradation in performance which may be an indicator a physical component may soon fail.

Modern high-speed CD and DVD drives are able to sense an imbalance in the disc being spun, which is a warning sign that the disc is not coping well with the rotational energy. When this is detected, it triggers an algorithm in the drive which spins the disc back down to a lower speed, removing the risk of disc shattering.

Solid-State Storage

The wear characteristics of flash memory cells can be minimised through a number of management techniques that are implemented by the solid-state disk controller:

Display Failure

The industry solution to the failure-prone CCFL backlight bulbs and inverters is to invest in LED-based backlight technology. Being a solid state light source without the need for an inverter, this technology is much simpler and more reliable.

There are no 'solutions' to dead or stuck pixels, except for improvements in fabrication technology that reduce the incidence of manufacturing errors. Nowadays, it is quite uncommon to find LCD panels with dead pixels - and their presence is often considered infantile failure, and typically eligible for an immediate return for replacement.