CAN bus reply problems

Translate This Thread From English to

Threaded View


Hi folks!

We are developing a system using the CAN bus to implement the network
connecting different nodes. We have a PC that needs to ask for some
data (the node status) to the nodes that have to answer to the request
immediately.
In order to ask each node for its status we send a "remote frame"
message to the CAN bus with a specific ID. The relevant node has to
answer with the relevant data by using a "data frame" message.
Each node is in a while loop reading a buffer and sending back data
when necessary. Usually everything goes well but sometimes it happens
that one of the nodes does not answer to the PC request, even if the
request is sent to the bus (it is seen by another node and it can be
seen by using an oscilloscope connected to the CAN bus lines). It
seems the node do not see the message, it misses the interrupt for
updating the buffer...
We usually send a sequence of "remote frame" messages waiting every
time for the answer: send ,waiting for answer, send, waiting, ... Even
if we insert a sleep between a send and another, sometimes the
messages are missed by a node...
We modified the baud rate (from 500Kbit to 20Kbit) but the problem is
not solved.
We are using a T89C51CC03 micro-controller by ATMEL.

Have you ever experienced this problem? Any suggestion?

Thank you in advance for any help!

Cheers,
Ska

Re: CAN bus reply problems



Ska wrote:


1:  This is either a problem with your microprocessor or with your code.
2:  I have no experience with Atmel & CAN.
2a: The TMS320F2812 has been rock solid for me.
3:  No protocol should trust external nodes 100% to receive something --
     you should always have a timeout & retry mechanism.

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Re: CAN bus reply problems



Ska wrote:


I can not answer your specific question, in other words I don't know which
part of your software or hardware is responsible for it. Could be the
driver, could be a miss configuration of the CAN controllers, could be the
cabling.
But you should consider switching your node monitoring from the master/slave
principle you are using now to something other.
Your current implementation looks exactly like to _old_ CANopen Node
Guarding mechanism. CANopen switched to Heart Beat years ago, where each
node is an autonomously Heart Beat Producer and can be monitored by every
node that wishes to do so. The benefit is more flexibility and reduced band
width for the node monitoring.
Anyway, it can happen that one of the Heart Beat Consumers is missing one
Heart Beat of one of the Producers. In this case increase the rate or
accept that one or more HB are missing.

Regards
  Heinz
--

with best regards / mit freundlichen Grüßen

   Heinz-Jürgen Oertel
+===================================================================
| Heinz-Jürgen Oertel  port GmbH  http://www.port.de
| mailto:oe@port.de
| phone +49 345 77755-0     fax   +49 345 77755-20
| Regensburger Str. 7b,     D-06132 Halle/Saale,  Germany
| CAN Wiki    http://www.CAN-Wiki.info
| Newsletter: http://www.port.de/engl/company/content/abo_form.html
+===================================================================

Re: CAN bus reply problems



Hello Tim, hello Heinz, hello everybody

Thank you for your mails.

What you both are telling is that "No protocol should trust external
nodes 100% to receive something -- you should always have a timeout &
retry mechanism"!
This is exactly what we are doing now, but it is something I don't
like so much... :(
We set a maximum number of retry messages (say 10) and it sometimes
happens that the trials go over this threshold! In this case we reset
and start again the CAN bus but, as I said, it is something we don't
like so much...

...mmm...

Regards,
Ska



Re: CAN bus reply problems



[Note: F'up2 cut down to one group --- should have been done by OP.]



[Massive quote without actual referral snipped.  Please don't do that.]

What you're observing appears to be a rate of failure to receive CAN
messages that is quite a lot beyond expectations of the protocol,
unless you were operating in a pathologically noisy environment ---
but you didn't mention anything like that.

What this hints at is a genuine bug in the receiving end, but I'm
afraid you didn't reveal enough of its details for anybody out here to
be able to remote-diagnose it more precisely.  So I'll just bombard
you with some questions:

Did you test this with only two nodes on the bus, and check if the
receiving one ACKs the transmission?  

What *is* the rate of failure, anyway, i.e. one in how many messages
gets lost?  What is the rate of transmissions with CRC or other
failures, on the same network?

Do you have any way of debugging into the receiving CAN controller's
register banks after a failed receival, to distinguish if the message
actually failed to arrive in the message box, or just failed to raise
the IRQ it's configured to?  (There's a bug like that in another 8051
derivative with integrated CAN...)

Do you have a storage scope that would let you record the exact
signalling up to the point of failure, so you could go look for any
differences between successful and failing transmissions, on physical
level?

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

Re: CAN bus reply problems


Actually, this is not necessary for CAN. The beginning of the frame contains a
node ID that possible
recipients filter through their match/accept registers. Active receivers
calculate CRC as the frame bytes
clock in and then compare it to the CRC at the frame end. If they match, the
accepting receiver drives the bus
active (low) for one bit in a designated tailing window. This lets the master,
or sender of the frame, know
that someone received it.

Use your scope to look at the bus for this ACK bit. If you see it, but the
receiver doesn't process the frame,
you've missed the interrupt. If you don't see the ACK bit, then the receiver
didn't match the node ID or the
CRC, or it's in Bus Off mode for error containment.

Also be sure you have both ends properly terminated; I've seen wild behavior on
DeviceNET packets at 125, 250
and 500 kb/s.

Dan


Re: CAN bus reply problems



node ID

"Node ID" is only meaningful for some higher level protocols, such as
CanOpen, but it does not make any sense in simple CanBus systems,
which fully relies on message identifiers.


Unless the receiver is in the "bus off" or "error passive" mode, _all_
receivers should monitor the bus and signal ACK or error frame
accordingly.


accepting receiver drives the bus

The ACK bit is sent by _any_ active (also "nonaddressed") device. Also
if _any_ receiver detects a CRC or other error, it will send the error
flag, which mutilates the message and no device will accept it.


This is only usable with only two devices (sender and receiver) on the
bus. With more than two devices, someone else will acknowledge it.
Instead of an oscilloscope, you should also be able to tell from the
transmitter status registers, if someone ASKed the transmitted frame.


Or you have configured the mask registers incorrectly.


The identifier match should not affect the appearance of the ACK.

It should be possible to determine from the _transmitter_ status
registers, if the frame was ACKed or an error flag generated by the
receiving device.

Paul


Re: CAN bus reply problems


[... CAN ACK mechanism...]

No.  It only lets the sender know that someone *could* have received
it, if he had been interested in it.  The crux being that ACK is
flagged even by nodes who won't actually do anything with this
message, because it wasn't meant for them.


... or the ID didn't match the mask set in the receiver.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

Re: CAN bus reply problems


Having read a number of articles and threads recently on this subject,
it seems to me that despite CAN's excellent hardware based
acknowledgement & retry system, the above statement is probably true
once you have three or more processors on the bus and certain types of
message being sent.

Consider a system with processors P1, P2 and P3.

P1 wants to send a message to P2. The message is not one of the often
quoted "nice" CAN bus examples whereby P1 is constantly spewing out
repeated readings of a sensor so that P2 or anyone else may "consume"
them; the loss of a message in this scenario isn't so important as the
next reading will usually suffice. Instead, the message is an
instruction for P2 to perform something, such as turn an I/O line on, or
write some data to an LCD, and it is therefore 100% essential that P2
receives this message or the product fails.

So, P1 sends the message, and gets the hardware ACK. But the ACK came
from P3, who isn't interested in consuming the message.

From what I understand, although P2 "should" generate an error to
destroy the ACK if it detects an error, there are a number of
circumstances where it may not and P2 may "lose" a message.

1. A software bug in P2.
2. A receive overflow in P2.
3. Errata in the P2 CAN controller.
4. P2 has gone error-passive or bus-off.
5. Are there any other reasons?

Admittedly, (1) should be fixed and would be a problem even in a two-
node system, but (2) may be unavoidable on certain smaller CAN
controllers with limited FIFOs, (3) is unavoidable unless you change to
another processor/CAN device, and (4) is actually designed to happen.
I'd truly like to know if there is a (5).

So, it would seem in this situation that despite the hardware based ACK
system present in the CAN controllers, you must still produce a high
level protocol which provides a software based mechanism for
acknowledge, timeout and retry.

Such lost messages may only be one in a billion, but if my product sends
a billion or more messages per week and it doesn't include a high-level
acknowledge, timeout and retry mechanism, then I'll have a product MTBF
of a week or less which is totally unacceptable.

I'd be interested in the opinion of others here. I'm in the process of
firmware development on my first CAN based system and only have one of
the nodes up and running in loopback mode for now so I can't assess
reliability on a three-or-more-node system. But based on the fact that
the possibility of a message going missing isn't completely zero, I'm
taking the view that I must implement the additional high-level ACK
mechanism. The general view I sense from reading CAN articles is that
although CAN's error mechanism is extremely robust, it's not 100%, and
stuff does occasionally go missing.

Re: CAN bus reply problems

says...

One thing to watch for that hasn't been pointed out is that a CAN node
may recieve multiple valid copies of the same message.  This has two
consequences, the first is that toggling the state based on message
receipt is a bad idea.  The second is that any acknowledge/retry scheme
has to be able to recognize and discard duplicates if necessary.

Robert

Re: CAN bus reply problems

On Fri, 8 Apr 2005 15:51:33 -0400, R Adsett


Apart for some strange networks with  multiple store and forward
repeaters, it is hard to imagine how such could situations could
happen.

Basically this would require that the transmitter has recognised an
error (missing ACK or error frame) and thus resends the message.
However, your node did not detect that something was wrong and
accepted the message at the first time.

A properly working receiver should check the CRC, the ACK fields _and_
check that at least six recessive bits are received in the End Of
Frame field.

If your receiver is happy that the frame that you are interested in,
passed the CRC check and immediately accept the message, without
checking the ACK and EOF fields, you are going to get duplicates, if
the transmitter works according to the standard. An other node may
have generated the error frame, which the transmitter detects and
retransmits, but your receiver is content with the first copy.

Paul
 

Re: CAN bus reply problems


That's indeed what happens, if a bit error hits exactly the wrong bit
in the CAN message: the last bit of the end-of-frame field.  This bit
is checked by the transmitter, but not by the receiver(s).  So, if
this bit is struck by an error, the transmitter will detect this as a
"form error", and re-send, but the receiver will not have noticed any
problem.


No.  That's not the actual definition of a properly working receiver.
A proper CAN receiver will *not* look at the last bit of the EOF
field.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

Re: CAN bus reply problems

aachen.de says...

I've seen it happen.  Not frequently but far more than could be ignored
even if you were inclined to do so.

Robert


Re: CAN bus reply problems



At least you'll be forewarned.  Non-systematic errors will supposedly
hit any bit of a CAN message randomly, at equal probability.  So this
particular error will occur at most 1/50 as often as the other types
of error, which both transmitter and receiver notice --- less if you
use longer CAN messages.

Keeping an eye on overall error-induced frame retransmission rates
thus provides a handle on how often to expect this particular error.
Combined with the requirements of the communication at hand, one can
design the amount of countermeasures to match the risk.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

Re: CAN bus reply problems

[Robotics removed from F'up2 list --- should have been done much
earlier...]



You understand that incorrectly.  No CAN node can possibly "destroy"
an ACK being flagged by some other node.  And "generating an error"
(by which I assume you mean "sending an error frame") for reasons not
already diagnosed by the CAN protocol itself would be a layer model
violation.  Application layer errors have no business generating
transport/link layer errors.  That's also the reason why CAN
controllers typically don't support sending error frames on purpose:
if an error frame needs to be sent, the controller will do that all by
itself.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

Re: CAN bus reply problems



Ska wrote:


if it does not responds with in a set time out, then simply use the last
read info and go to the next one and come back to the problem one when
it's time. each time you get a no
response you should increment an error byte and decrement the byte when
its ok.
   if the error byte reaches lets say 5, then you can assume that maybe
the node has a serious problem.

   its vary possible a CAN node gets into a critical state where it can
answer the request at that moment.




Site Timeline