CAN bus reply problems

Question

Hi folks!We are developing a system using the CAN bus to implement the networkconnecting different nodes. We have a PC that needs to ask for somedata (the node status) to the nodes that have to answer to the requestimmediately.In order to ask each node for its status we send a "remote frame"message to the CAN bus with a specific ID. The relevant node has toanswer with the relevant data by using a "data frame" message.Each node is in a while loop reading a buffer and sending back datawhen necessary. Usually everything goes well but sometimes it happensthat one of the nodes does not answer to the PC request, even if therequest is sent to the bus (it is seen by another node and it can beseen by using an oscilloscope connected to the CAN bus lines). Itseems the node do not see the message, it misses the interrupt forupdating the buffer...We usually send a sequence of "remote frame" messages waiting everytime for the answer: send ,waiting for answer, send, waiting, ... Evenif we insert a sleep between a send and another, sometimes themessages are missed by a node...We modified the baud rate (from 500Kbit to 20Kbit) but the problem isnot solved.We are using a T89C51CC03 micro-controller by ATMEL.Have you ever experienced this problem? Any suggestion?Thank you in advance for any help!Cheers,Ska

Tim Wescott · Accepted Answer

1:  This is either a problem with your microprocessor or with your code.2:  I have no experience with Atmel & CAN.2a: The TMS320F2812 has been rock solid for me.3:  No protocol should trust external nodes 100% to receive something --     you should always have a timeout & retry mechanism.

Heinz-Jürg · Answer

I can not answer your specific question, in other words I don't know which part of your software or hardware is responsible for it. Could be the driver, could be a miss configuration of the CAN controllers, could be the cabling. But you should consider switching your node monitoring from the master/slave principle you are using now to something other. Your current implementation looks exactly like to _old_ CANopen Node Guarding mechanism. CANopen switched to Heart Beat years ago, where each node is an autonomously Heart Beat Producer and can be monitored by every node that wishes to do so. The benefit is more flexibility and reduced band width for the node monitoring. Anyway, it can happen that one of the Heart Beat Consumers is missing one Heart Beat of one of the Producers. In this case increase the rate or accept that one or more HB are missing.

Regards Heinz

Jamie · Answer

if it does not responds with in a set time out, then simply use the last read info and go to the next one and come back to the problem one when it's time. each time you get a no response you should increment an error byte and decrement the byte when its ok. if the error byte reaches lets say 5, then you can assume that maybe the node has a serious problem.

its vary possible a CAN node gets into a critical state where it can answer the request at that moment.

Ska · Answer

Hello Tim, hello Heinz, hello everybodyThank you for your mails.What you both are telling is that "No protocol should trust externalnodes 100% to receive something -- you should always have a timeout &retry mechanism"!This is exactly what we are doing now, but it is something I don'tlike so much... :(We set a maximum number of retry messages (say 10) and it sometimeshappens that the trials go over this threshold! In this case we resetand start again the CAN bus but, as I said, it is something we don'tlike so much......mmm...Regards,Ska

Hans-Bernhard Broeker · Answer

[Note: F'up2 cut down to one group --- should have been done by OP.][Massive quote without actual referral snipped.  Please don't do that.]What you're observing appears to be a rate of failure to receive CANmessages that is quite a lot beyond expectations of the protocol,unless you were operating in a pathologically noisy environment ---but you didn't mention anything like that.What this hints at is a genuine bug in the receiving end, but I'mafraid you didn't reveal enough of its details for anybody out here tobe able to remote-diagnose it more precisely.  So I'll just bombardyou with some questions:Did you test this with only two nodes on the bus, and check if thereceiving one ACKs the transmission?  What *is* the rate of failure, anyway, i.e. one in how many messagesgets lost?  What is the rate of transmissions with CRC or otherfailures, on the same network?Do you have any way of debugging into the receiving CAN controller'sregister banks after a failed receival, to distinguish if...

Dan Danknick · Answer

Actually, this is not necessary for CAN. The beginning of the frame contains a node ID that possible recipients filter through their match/accept registers. Active receivers calculate CRC as the frame bytes clock in and then compare it to the CRC at the frame end. If they match, the accepting receiver drives the bus active (low) for one bit in a designated tailing window. This lets the master, or sender of the frame, know that someone received it.

Use your scope to look at the bus for this ACK bit. If you see it, but the receiver doesn't process the frame, you've missed the interrupt. If you don't see the ACK bit, then the receiver didn't match the node ID or the CRC, or it's in Bus Off mode for error containment.

Also be sure you have both ends properly terminated; I've seen wild behavior on DeviceNET packets at 125, 250 and 500 kb/s.

Dan

Paul Keinanen · Answer

"Node ID" is only meaningful for some higher level protocols, such as CanOpen, but it does not make any sense in simple CanBus systems, which fully relies on message identifiers.

Unless the receiver is in the "bus off" or "error passive" mode, _all_ receivers should monitor the bus and signal ACK or error frame accordingly.

accepting receiver drives the bus

The ACK bit is sent by _any_ active (also "nonaddressed") device. Also if _any_ receiver detects a CRC or other error, it will send the error flag, which mutilates the message and no device will accept it.

This is only usable with only two devices (sender and receiver) on the bus. With more than two devices, someone else will acknowledge it. Instead of an oscilloscope, you should also be able to tell from the transmitter status registers, if someone ASKed the transmitted frame.

Or you have configured the mask registers incorrectly.

The identifier match should not affect the appearance of the ACK.

It should be possible to determine from the _transmitter_ status registers, if the frame was ACKed or an error flag generated by the receiving device.

Paul

Hans-Bernhard Broeker · Answer

Wrong.[... CAN ACK mechanism...]No.  It only lets the sender know that someone *could* have receivedit, if he had been interested in it.  The crux being that ACK isflagged even by nodes who won't actually do anything with thismessage, because it wasn't meant for them.... or the ID didn't match the mask set in the receiver.

Stephen · Answer

In article , Ska writesHaving read a number of articles and threads recently on this subject,it seems to me that despite CAN's excellent hardware basedacknowledgement & retry system, the above statement is probably trueonce you have three or more processors on the bus and certain types ofmessage being sent.Consider a system with processors P1, P2 and P3.P1 wants to send a message to P2. The message is not one of the oftenquoted "nice" CAN bus examples whereby P1 is constantly spewing outrepeated readings of a sensor so that P2 or anyone else may "consume"them; the loss of a message in this scenario isn't so important as thenext reading will usually suffice. Instead, the message is aninstruction for P2 to perform something, such as turn an I/O line on, orwrite some data to an LCD, and it is therefore 100% essential that P2receives this message or the product fails.So, P1 sends the message, and gets the hardware ACK. But the ACK camefrom P3, who isn't interested in consuming the...

Hans-Bernhard Broeker · Answer

[Robotics removed from F'up2 list --- should have been done muchearlier...]You understand that incorrectly.  No CAN node can possibly "destroy"an ACK being flagged by some other node.  And "generating an error"(by which I assume you mean "sending an error frame") for reasons notalready diagnosed by the CAN protocol itself would be a layer modelviolation.  Application layer errors have no business generatingtransport/link layer errors.  That's also the reason why CANcontrollers typically don't support sending error frames on purpose:if an error frame needs to be sent, the controller will do that all byitself.

R Adsett · Answer

One thing to watch for that hasn't been pointed out is that a CAN node may recieve multiple valid copies of the same message. This has two consequences, the first is that toggling the state based on message receipt is a bad idea. The second is that any acknowledge/retry scheme has to be able to recognize and discard duplicates if necessary.

Robert

CAN bus reply problems

Join the Discussion

Didn't find your answer?