# Safety Shutdown Systems: Design, Analysis, and Justification by Paul Gruhn & Harry L. Cheddie

I couldn't find any where to report this mistake so I'll report it here. The version of this book published in 1998 appears to have the
wrong formula for the 1oo2 (1 out of 2) case on page 87, section 8.7.1 formula Set 2:. The error jumps out at you because P(a,b)=p(a)*p(b) if a and b are independent. Thus the probability of failure for the 1oo2 case should be roughly equal to the square of the 1oo1 case. Thus we suspect that the coefficient in front of the 2 out of 2 case should be 1 and not 2.
The formula for the 1oo1 case is very intuitive. That is the availability is equal to roughly the MTTR (mean time to repair) divided by the MTBF (mean time before failure). The book says these equations are simplifications of results derived from Markov models in the book: Reliability, Maintainability, and Risk by David J. Smith. http://www.knovel.com/knovel2/Toc.jsp?BookIDQ9
The exact formulas given by Gruhn are: 1oo2 lambda_d*(MTTR+(TI_a/2)) 1oo2 2*(lambda_d)^2*(MTTR+(TI_a/2))^2 2oo2 2*(lamgda_d)*(MTTR+(TI_a/2)) 2oo3 6*(lambda_d)^2*(MTTR+(TI_a/2))^2 TI=Automatic diagnostic interval MTTR=mean time to repair Lambda�ilure rate (1/MTBF) d�ngerous (inhibiting) failure
If we look at section 8.1 of Smith, table 8.6 we see that all of the formulas agree with Gruhn's formula's except for the 1oo2 case. We also see that our intuition is correct for the 1oo2 case. The other difference we notice is Smith does not include a term for the automatic diagnostic time in his book. However, later on Smith addresses the case in section 8.1.4 where the time to start repairs is not instantaneous but occurs at some periodic manual test interval T. This is very analogous to the automatic diagnostic time.
Grunth also provides formulas for this case in formula set 3 but instead calls the automatic diagnostic time the Manual test interval and misses the fact that as the MTTR approaches zero the formulas for the availability given above should approach the formulas for the availability where the Manual test interval is much greater then the MTTR but with the automatic diagnostic time replaced by the manual test interval.
The formulas' by Gruhn for the case where the manual test interval TI dominates the mean time to repair are:
1oo1 lambda_d*(TI/2) 1oo2 ((lambda_d)^2*(TI)^2)/3 2oo2 lambda_d*TI 2oo3 (lambda_d)^2*(TI)^2
These formula's agree with the formula's given by Smith in table 8.8 in section 8.1. Thus without derivation a reasonable conjecture of a general equation set is: 1oo2 lambda_d*(MTTR+(TI_a/2)) 1oo2 1*(lambda_d)^2*(MTTR+(TI_a/3))^2 2oo2 2*(lamgda_d)*(MTTR+(TI_a/2)) 2oo3 6*(lambda_d)^2*(MTTR+(TI_a/2))^2
This of course ignores the nuances that some failures could be detected automatically well other types of error would need manual testing. For these cases proper expressions or simulated results would be needed using Markov models.
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
John
Have copied this to Paul who's email I found on another list, am sure he will be in touch, either directly or via this list.
Best Regards
Steve Yates MTL Instruments New Blog from MTL's Chris Towle: www.mtlblog.com
JohnCreighton snipped-for-privacy@hotmail.com wrote:

<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Steve Y wrote:

Okay, thanks :)
I made a slight mistake in the corrected formulas. The corrected formulas should be:
1oo2 lambda_d*(MTTR+(TI_a/2)) 1oo2 1*(lambda_d)^2*(MTTR+(TI_a/sqrt(3)))^2 2oo2 2*(lamgda_d)*(MTTR+(TI_a/2)) 2oo3 6*(lambda_d)^2*(MTTR+(TI_a/sqrt(6)))^2
I missed that the part of some of the expressions that contains the TI is squared. So the biggest correction is in the 1oo2 formula. A minor correction in the 2oo3 formula. I am sure Paul would pick up the mistake once he knows that one is there. I am not sure if there are newer versions of this book and if these formulas are revised in newer versions.

<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
I think I found another mistake in the 1998 version of Paul's book. Whether it is consequential or not is subjective. In section 8.11 Paul gives an example of a triplicate PLC system (TRM) where if one leg fails the system fails. In his example the CPU MTTF is 10 years and the IO mean time to fail is 50 years. He combines the failure rates in failures per hour by addition which is more or less correct.
P(a+b)=P(a)+P(b)-P(a,b)~ P(a)+P(b)-P(a)(b)~ P(a)+P(b)
If the possibility of simultaneous failure was worth considering then it is probably a sign that the system is too unsafe to begin with so, so far I agree with Paul. He says that 40% of CPU failures are dangerous and 25% of the IO failure rates are dangerous so the failure rates he gets for dangerous failures in one leg of the triplicated system are:
lambda=5.70E-6 failures per hour
He says the mean time to repair (MTTR) for the system is four hours while the manual test interval (TI) is 8740 hours. He says that 99% of the errors are caught by automatic diagnostics while 1% are caught by manual testing. His calculation of the average probability of failure on demand is:
PFD_avg =[((5.70e-6*0.99)^2*4^2]+ (failure rate of one leg, diagnostic coverage, repair time) [((5.70e-6*0.001)2*(8760/2)^2] (failure rate of one leg, no diagnostic coverage, manual test interval)
Assuming he is using the 2oo3 formula from formula set 2 for the first term and the 2oo3 formula from formula set 3 for the second term, the first term should be multiplied by 6 and in the second term the MTTR (8760) should not be divided by 2.
Recall that his formula set two is given by: 1oo2 lambda_d*(MTTR+(TI_a/2)) 1oo2 2*(lambda_d)^2*(MTTR+(TI_a/2))^2 2oo2 2*(lamgda_d)*(MTTR+(TI_a/2)) 2oo3 6*(lambda_d)^2*(MTTR+(TI_a/2))^2 TI_a=Automatic diagnostic interval MTTR=mean time to repair Lambda�ilure rate (1/MTBF) d�ngerous (inhibiting) failure
His formula set three is given by: 1oo1 lambda_d*(TI/2) 1oo2 ((lambda_d)^2*(TI)^2)/3 2oo2 lambda_d*TI 2oo3 (lambda_d)^2*(TI)^2 TI=Manual test interval
I claimed in a previous post that the following formula set correctly encompass the correct version of the above formula sets.

<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
Steve Y wrote:

I got a response from Paul. He says he is not aware of any errors and dropped formula set 2 in later versions of his book, since in all the cases he has seen formula set 3 overwhelms formula set 2. None the less formula set 2 is still of academic interest to me whether or not it is practical in engineering.
<% if( /^image/.test(type) ){ %>
<% } %>
<%-name%>
I have a question for 8.7.3. It discusses the increased risk during testing:
"During the test, a simplex system must be off line (to prevent an actuall trip due to testing), a dual system is reduced to simplex, a triplicated system is reduced to duel"
The formulas it gives are: 1oo1 MTD/MTI 1oo2 2*MTD*lambda_d*(((MTI/2)+MTTR)/MTI) 2oo2 2*(MTD/MTI) 2oo3 6*MTD*lambda_d*(((MTI/2)+MTTR)/MTI)
The first formula is obvious, in that the time the 1oo1 loop is off line the system is unavailable. The second formula looks like the 1oo1 case for when the system is not undergoing a test but multiplied by 2*MTD/MTI. I am not sure where the factor of 2 comes. I'll have to check the original paper to see if this is explained. I am not sure why the 1oo2 case and 2oo2 case are different and in the 2oo3 case I am not sure where the factor of 6 comes from. Maybe when I check the original paper it will be more clear.

case can you realistically shut down part of you're safety system and test it without affecting the rest of the safety system or process. What must you put in place so this can be done. Are any unknown risks introduced by this testing procedure. I think also at times it might be possible to test part of a safety system without shutting anything down. For instance an operator could check to see how the measurements of a level gauge compares with the measurements of a level transmitter. Switches of course are of course different. Of course checking the output of a transmitter or a switch is not a thorough test of a safety loop. For example if a PLC is used to implement the SIS we would want to make sure the PLC actually sees the correct transmitter output and not just that the local indicator correctly reads the output. Is removing an instrument from a system and testing it in a lab a suitable test procedure? Are there any risks introduced by it not being installed back correctly.