Motor Controller timeout but then motion completed 14 minutes later

We were running some extended testing over the weekend and had one failure we don’t fully understand.
The following sequence of events took place.
At 16:21:45 we issued an absolute move and got the event 11 to indicate the start of motion.
At 16:21:47 we got event 4 indicating a position error.
At 16:21:48 we got event 18 axis at target
At 16:27.48 our code gave a Motor Controller timeout as we hadn’t received the completion message
At 16:35:41 We got an event 1 amplifier fault
At 16:35:41 we got event 10 motion done
Our code showed the encoders were at the correct position.
Our system went into an EStop at this point because of the amplifier fault. Upon inspection this morning the axis appeared to be in the correct place and was operational.

Any thoughts on what might have happened here?

I forget to mention we are using version 8.1.5.

@patrick

It looks like the move was interrupted due to a Position Error. When you say the Axis was in the correct place do you mean that it appears the Axis completed the original move?

Assuming Start Position A and Destination B, I’d expect you to be at A (if it never moved) or A’ (including a partial motion) rather than B. It is possible that you could end up basically at B if you happened to get the position error during the deceleration or had an unusual StopTime.

Do you have any additional defaults about the Amp Fault event?
What was the Position Error Action?

Jacob,
unfortunately I didn’t get to visual inspect the system myself this morning but when our code get the motion completed event it reads the encoder and the value matched the requested move position so I think it did end up at B. Like you I would have expected it was at A or A’. I will see what additional information I can get.

Jacob,

The Position Error Action is Abort.
We don’t collect any additional information on the Amp Fault event. What would be useful to collect on this event as we can add this to the code.

Patrick

@patrick

It should only reach B if error limit was at the end of the motion and momentum would get you there. (Seems unlikely.) Possibly a very unrealistic instantaneous (Very large number) deceleration rate?

Can you dig up more information about the Amplifier fault? Do you get any additional information? Can you add a SourceGet call after detecting an Amplifier Fault?

I wonder what could have caused it when the drive had been sitting there disabled for 8 minutes.

The amplifier fault was actually 14 minutes after the “axis at target” event. The “timeout” in between was in this case just a warning message from our code, which did not cause the drive to be disabled.

The PositionErrorLimit should have Aborted the Axis. The following Amplifier Fault shouldn’t have been related to anything that took place in the motion controller with the information we have right now. Perhaps whatever ultimately caused the AmpFault was affecting the axis earlier and caused the PositionErrorLimit to trigger then.

Alternatively there could have been some additional calls which weren’t logged. Consider something physically preventing motion and triggering the error limit. If some process re-enabled the axis on physical object, it would be reasonable to expect some torque limit in the drive to trigger an AmpFault.

We need to know more about the flavor of Amp Fault to make a good guess as to the cause.

Jacob, we will be adding code to collect the AmpFault information in the future but unfortunately for this error we don’t know the flavor of the fault. I have attached an image which shows the events at the time of the failure (after I removed the unrelated output). Once thing that caught my eye is that there are two motion IDs.

The motion id is wrong on the event with the Amplifier fault. Is there a chance you collect something else for your log for that event type? (Amplifier Fault - 1) It shows the correct motion id 10 samples later for the motion done event.

Are you using MotionIdSet anywhere in your code?

Yes, we call MotionIdSet just before we initiate each motion. So that would have happened (with an argument of 58142) just before printing out that line “commanding .AbsoluteMove …”. That is the only time we call it, and we separate all the motion IDs we set by 6 (starting with 2), to try to keep them distinct from ones generated by the MotionController itself. So ID 2072 is one that we might conceivably have set (a long time before this code executed), but 2073 could not be one of ours.

As Patrick mentioned, we were not capturing or logging other info about amp faults at that time.

We haven’t had a repeat of the problem so it will remain a mystery for now. We did put the call into SourceGet if we get an amplifier error, and I got one on my test setup and the output looked like this:
got event 1: Amplifier Fault for motion ID 24 on axis 0 at 2019-10-03T00:21:34.507Z
Amplifier fault source: Possible sources:

  • RapidCode command
  • RapidSetup command

Is this the type of information you were looking for if we get an Amplifier fault on the real system? It doesn’t look very helpful to me.

@patrick

Sometimes the source of an event is well understood (Amp Faults, Home, ErrorLimit, End of Travels, Software boundaries, Feedback Faults, or Out of Frames errors). SourceGet helps us determine if any of these were the cause of situation you are in. The next most likely case for an event is the user commanded it by pushing a Stop button or a thread calling Axis->Stop(), EStop(), etc.

RapidCode command & RapidSetup Command is basically indicating it is an action by the user.

The other information you collected indicates it is an Amp Fault though. Likely signaled from through the Status Word. There may be some drive specific Service channel requests we can add to find out more information.