"Am I still working okay?" asked the micro controller...

wrote in


Amen to #4. I remember reading a story about a company that, when hiring salesmen, would always ask the prospective salesman about the major accounts that he had *lost*. If he had never lost a customer, he didn't get hired, because that meant that he had never "played in the major leagues."
Part of being a geek is having a tendency to grossly overestimate the role that personal ability plays in the success of one's work. The reality is that the highest levels of intelligence (or its correlates) that have been observed in human beings are *far, far* away from the levels that would guarentee perfection. Any business process that relies on humans being omniscient is, by definition, a failure. There is *no* way to guarantee that Mr. Murphy will never pay you a visit. There are practices that will make him feel distinctly unwelcome (and there are practices that amount to buying him a first-class plane ticket and putting him up in the penthouse suite of the most expensive hotel in town), but none of them will offer you absolute certainty.
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload

Besides: redundancy still isn't a good reason not to use watchdogs.
You may have 4 redundant devices, but what if they all fail at the same time (which could happen under extreme, unplanned condition)? What if only one of them fails, but there is another unexpected failure that prevents redundancy to function as expected (that is, you have 3 working devices, but the whole system fails to notice there is something wrong with the 4th)? Well, you get the idea.
If fighting planes were perfect, pilots were perfect and conditions were perfect, guaranteed 100% of the time, we wouldn't need to design ejecting seats. But we still design them, and once in a while, they are actually useful and save a life. That's exactly the same thing. Who cares whose fault it is when an unexpected event occurs? It's useful to be able to retrieve detailed info of failures, but right when it happens, nobody cares at this point: the system has to recover in the quickest way possible. Period.
As a basic rule of thumb, I'd just say that watchdogs are good for dealing with transient, temporary, unexpected failures. Redundancy is used more with a long-term (or complete) failure of one or several devices in mind. Of course, if designed in a sensible manner, they can complement one other and even interact with one another. That's when things get interesting.
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
On Fri, 28 May 2004 20:57:21 GMT, "Christopher X. Candreva"

But how does the WDT tell the difference between a transient failure and the hardware falling apart ?
The self test routines after reset may detect some permanent failure or it might not. The self test routine itself could go crazy due to permanent hardware problems and the WDT kicks in again.
Now we have an other interesting situation, which has not been discussed so far. If there is a permanent hardware/software error and the WDT triggers over and over again, this can also cause a lot of damage (e.g. due to repeated large startup currents in some big loads). Thus, the WDT should be allowed to kick in only for a predefined number of times and then disable the whole system until manual intervention.
Paul
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
Paul Keinanen wrote: <snip>

I have also noticed a trend for some newer WDOG devices to have quite long timeout options (mins to even hours). This can have merit, as examples given in another thread show the problems with designing too close to a WDOG's poorly defined timebase. Other WDOGs I've seen have a longer FIRST trigger window, to allow more elasticity on POST/Boot modes, until the opeational SW proper starts working.
It would be a good idea to check for annoyance/damage modes, in a continually firing WDOG failure instance.
-jg
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
Paul Keinanen wrote:

The answer to that is you DO NOT turn on any outputs until your system can determine for itself that it is able to function within its design parameters. You can count the number of watchdog kicks once you have completed the POST routines to ensure that a minimum number of correct kicks have happened before you enable the outputs to be turned on.
--
********************************************************************
Paul E. Bennett ....................<email://peb@a...>
  Click to see the full signature.
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload
Paul Keinanen wrote:

Double or triple redundancy is not always the answer for Safety Critical Systems. Sometimes just a different logical processor (or even a relay based interlocking scheme) will provide the protection. Sometimes you have to even consider fully mechanical interlocking as part of the system. Whatever mitigation scheme you need to use should be based on the risk assessment arising from a fully discovered HAZOP study.
Having watched over a lot of the responses, I am in the camp that is aimed at getting the code as correct as you possibly can before you begin to worry about turning the watchdog on. However, I also use a separate Puilse Maintained Relay circuit that has to be kept energised by a correctly responding system. This relay automaticazlly signals unhealthy if it de-energises due to a system failing to kick it properly or by a failure in its own circuitry (see my Reading and Writing the World articles on my website).
--
********************************************************************
Paul E. Bennett ....................<email://peb@a...>
  Click to see the full signature.
Add pictures here
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Add image file
Upload

Polytechforum.com is a website by engineers for engineers. It is not affiliated with any of manufacturers or vendors discussed here. All logos and trade names are the property of their respective owners.