Say we have a micro controller with limited memory.Say it will perform some realtime control of something.How to make a SW for a micro controller, that in addition to its normaloperation (control of something), from time to time it will also checkitself if it is doing okay or not ? How a program can test itself? Cansome one suggest any intelligent method (other than watch dog) ?

That's called a 'watchdog' timer and is standard in most microcontrollers.It's basically a countdown timer which the computer program running on themicrocontroller needs to set every x times per second to prevent it reachingzero. When it reaches zero the microcontroller is reset. So when a program'hangs' the program stops setting the watchdog countdown timer and themicrocontroller is reset.

"Am I still working okay?" asked the micro controller...

G

Guillaume 22 years ago

Besides: redundancy still isn't a good reason not to use watchdogs.

You may have 4 redundant devices, but what if they all fail at the same time (which could happen under extreme, unplanned condition)? What if only one of them fails, but there is another unexpected failure that prevents redundancy to function as expected (that is, you have

3 working devices, but the whole system fails to notice there is something wrong with the 4th)? Well, you get the idea.

If fighting planes were perfect, pilots were perfect and conditions were perfect, guaranteed 100% of the time, we wouldn't need to design ejecting seats. But we still design them, and once in a while, they are actually useful and save a life. That's exactly the same thing. Who cares whose fault it is when an unexpected event occurs? It's useful to be able to retrieve detailed info of failures, but right when it happens, nobody cares at this point: the system has to recover in the quickest way possible. Period.

As a basic rule of thumb, I'd just say that watchdogs are good for dealing with transient, temporary, unexpected failures. Redundancy is used more with a long-term (or complete) failure of one or several devices in mind. Of course, if designed in a sensible manner, they can complement one other and even interact with one another. That's when things get interesting.

Vote

P

Paul Keinanen 22 years ago

But how does the WDT tell the difference between a transient failure and the hardware falling apart ?

The self test routines after reset may detect some permanent failure or it might not. The self test routine itself could go crazy due to permanent hardware problems and the WDT kicks in again.

Now we have an other interesting situation, which has not been discussed so far. If there is a permanent hardware/software error and the WDT triggers over and over again, this can also cause a lot of damage (e.g. due to repeated large startup currents in some big loads). Thus, the WDT should be allowed to kick in only for a predefined number of times and then disable the whole system until manual intervention.

Paul

Vote

J

Jim Granville 22 years ago

I have also noticed a trend for some newer WDOG devices to have quite long timeout options (mins to even hours). This can have merit, as examples given in another thread show the problems with designing too close to a WDOG's poorly defined timebase. Other WDOGs I've seen have a longer FIRST trigger window, to allow more elasticity on POST/Boot modes, until the opeational SW proper starts working.

It would be a good idea to check for annoyance/damage modes, in a continually firing WDOG failure instance.

-jg

Vote

P

Paul E. Bennett 22 years ago

The answer to that is you DO NOT turn on any outputs until your system can determine for itself that it is able to function within its design parameters. You can count the number of watchdog kicks once you have completed the POST routines to ensure that a minimum number of correct kicks have happened before you enable the outputs to be turned on.

Vote

M

Mel Wilson 22 years ago

As colleague DW said, " ... idiot proof. It proves we're idiots." He was kidding, of course.

Regards. Mel.

Vote

E

Eric Bohlman 22 years ago

The Artist Formerly Known as Kap'n Salty wrote in news: snipped-for-privacy@corp.supernews.com:

Amen to #4. I remember reading a story about a company that, when hiring salesmen, would always ask the prospective salesman about the major accounts that he had *lost*. If he had never lost a customer, he didn't get hired, because that meant that he had never "played in the major leagues."

Part of being a geek is having a tendency to grossly overestimate the role that personal ability plays in the success of one's work. The reality is that the highest levels of intelligence (or its correlates) that have been observed in human beings are *far, far* away from the levels that would guarentee perfection. Any business process that relies on humans being omniscient is, by definition, a failure. There is *no* way to guarantee that Mr. Murphy will never pay you a visit. There are practices that will make him feel distinctly unwelcome (and there are practices that amount to buying him a first-class plane ticket and putting him up in the penthouse suite of the most expensive hotel in town), but none of them will offer you absolute certainty.

Vote

"Am I still working okay?" asked the micro controller...

Join the Discussion

Didn't find your answer?