Category Archives: Fixed Yet?

What happens when a fault first appears.

Let’s consider a system – say, my car – that works perfectly when it is new. It has no design flaws. It goes on working for some considerable time, and then some part breaks. The car now has a problem, a fault, that should be repaired by replacing the broken part. But, I don’t yet know there is a fault.

firstError

In the chart, the part breaks at time A. But I don’t recognise there is a problem, because I’ve not yet noticed anything wrong. As far as I am concerned, the car is still working fine. The blue line on the chart represents this period. Eventually, though, the broken part causes an error that I can detect. In my case, the engine stopped when I expected it to be rotating. The red line represents this event, which occurred at time B, some time after time A.

I then realise that the car has a fault. However, I don’t get it fixed straight away, so the car goes on being faulty for some while – the horizontal part of the red line. I can still get the car to start, and drive it around – carefully – but I know it is broken.

In order to know that there is a fault, I need to be able to detect an error, and I won’t be able to do that until the fault is exercised to produce an error I can notice. A fault which isn’t exercised doesn’t produce any errors.

A car headlamp bulb is a good example of a fault that’s not detected until it’s exercised. The bulb can break any time, when the lamp is on or off. When the headlights are switched off, even if there is a broken bulb, there is no error. I’ll not know until I switch on the lights – of course, when it’s dark and I really need it. That’s when I exercise the fault to produce an error, the lamp not producing light when I expect it to.

Is It Fixed Yet?

This is the first in a series of posts about a calculation that worried me for years – about fifteen years. The question occurred to me when my car, a newish company car, broke. It stopped going along. Eventually it started again and I took it to the garage. They changed some parts, presented a big bill, and I drove away. Next week, the car stopped again. After a few minutes, it started. I took it back to the garage, they changed some more parts, and I paid another big bill. Next week… you get the idea. It was driving away from the garage for the third time that I wondered, “How do I work out whether the car is really repaired?” How much testing do I have to do before I trust the car again?

It took me 15 years to work out the answer. I tried collaborating with four different people, and couldn’t solve it. Then, one day, Eureka.

This all happened when I was working for Sun Microsystems, so if there is any intellectual property in the idea, it belongs to them, or now to Oracle. I don’t think a patent was ever applied for – it was in a chaotic time close to when I left Sun, and it’s not in this list. But, it’s old enough now that I don’t think anyone will mind if I publish it. I’ve never seen the result anywhere else, although I’m a bit out of touch with the field now.

The eventual result is applicable to things that break, and also new products with design flaws. It seems quite general.

Future posts here will explain the idea and (if I get my act together) present an online calculator for working out the chance that a problem has been fixed by a repair, given a pattern of test failures and passes both before and after the repair. But I haven’t written that yet.