Community

User Limits and Axes' Commanded Positions: Runaway

I tried capturing both these values, but I don’t know which ones to compare with the bits defined in RSIFirmwareStatus.

The docs only repeat what’s in the comments in rsi.h. Can you advise me?

I was hoping to find some error bit coming on just before the MultiAxis did its bad things.

I don’t know what we’ll do with those, but if you are recording both, let’s look at AxisAddressTypeStatus (firmware axis object status) and MultiAxisAddressTypeMOTION_STATUS around the time of the anomoly.

In the meantime I’m adding more to my test app to reproduce.

Can you try something else, such as changing the ActionAxis number of the UserLimit to 0? It should propagate to the MultiAxis either way and I’m curious if you can make any changes that resolve the problem to give us more clues.

I’ll capture RSIAxisAddressTypeSTATUS (per axis) and RSIMultiAxisAddressTypeMOTION_STATUS and get back to you.

I’ll also start logging the user limit configuration each time, just to be sure that we’re not doing anything bad.

Currently, I create a user limit for each axis. Only the first one has a non-None action. Each of them specify the corresponding (global) axis index as the actionAxis (so 0, 1, and 2 in this scenario).
Do you want me to set them all to 0 or something else? It makes sense (for an experiment) to try setting them all to the same value. All the runs I’ve done previously have configured the user limit for axis 0 to be the one with the action. It sounds like what you suggested is what I’m already doing. How would you like me to vary things?

@scott, Can you think of other things that an RSI app could do that would have any influence on commanded positions being changed (or invalidated)? I’ve got a complex scenario that I can’t simply record and play back, and you’ve tried simplistic reproductions of the behavior under investigation and found nothing. It’s conceivable that something that’s happening elsewhere is resulting in this problem, when the timing is just right. My question is, what things could I be doing that could have any bearing on this?

I’m fairly confident that I’m not setting some value that would immediately invalidate all my objects (e.g. axis count, user limit count, recorder count, …).
Are there other things that I could be doing that I can check on?

No, I can’t imagine what you could do to produce this behavior, which is why I’m trying to reproduce it as well. I’ve run thousands and thousands of loops. I’m using different threads to load motion and to trigger the TRIGGERED_MODIFY action. I’ll switch to a UserLimit for the TRIGGERED_MODIFY now.

What are you using for your PT motion empty_count? Can you check FramesToExecute() for Axis 0? Your most recent plot looks like it was ~60ms without calling MovePT(). Is it possible the motion supervisor runs out of frames to execute, leaving you in an undefined state?

We have emptyCount hardcoded to 10.

I can’t elucidate or justify this decision. As far as I know, we copied it out of an example and haven’t changed it since.

Ok that’s good, I just wanted to be sure it wasn’t set to -1 (disabled). At the default sample rate of 1000 Hz, that’s 10ms.

Updated my test code to having a UserLimit cause a TRIGGERED_MODIFY here on Axis 1, while 1ms points are streamed to an XYZ MultiAxis (keeping X at 0, Y incrementing, and Z fixed at a large value). So I think I have most of what you described. I have never seen a failure, my X position is always at 0.

I’m deliberately keeping all axes away from zero and each other so that anything resembling an assignment to anything that exists will produce a noticeable change.

Ok I’m now forcing X to a non-zero value and very different from Z.

Here are some more pictures.


The second plot is the new stuff. It breaks apart the status words and tracks them, bit by bit.

Here’s the second one, zoomed in to the instant the command position jumped.

The command position jump happened at sample #2766977.
I don’t see really see anything new. What I do notice is that the Triggered Modify bit went high on all three axes and the multiaxis within the same sample. Also, the estop/error state went high 2ms later on all three axes and the multiaxis.

I was half hoping to see one of those bits come on first, but since everytihng’s running way faster inside the RTOS than 1 KHz, the scope data just can’t illustrate it.

@scott, Andrea and I were speculating about the order in which things ought (?) to happen. Should I expect otherwise simultaneous events (like three user limits being triggered by an input) to be processed in:

  • axis (ID) order (e.g. 0, 1, 2, …)
  • user limit number order
  • some other prioritized ordering?

yes all the firmware objects are processed in order, starting from 0

any idea why you have such a long gap without calling MovePT before the TRIGGERED_MODIFY occurs?

also it appears that you are getting AT_TARGET on Axis 1 before the TRIGGERED_MODIFY which means its commanded motion is complete?

One reason is that I added some checks before invoking MovePT() so as not to call it if any of the stopping action bits were set. (This hasn’t always been the case.)

When a probe strikes, we attempt to transition the MultiAxis motion state to STOPPED/IDLE while we clean up. I suspect this accounts for the long gaps around the probe strikes.
So, the AT_TARGET probably indicates that we’ve sent a stop point and nothing’s MOVING any more. We normally send motion every 10ms.

Here’s the same data for one of the other thousands of strikes.

It looks like there’s still a gap preceding the TriggeredModify of maybe 35ms (good scenario) and 66ms (bad scenario).
Nothing I’ve said previously would account for that.

On the good strikes, do you ever get the AT_TARGET bit going high before the TRIGGERED_MODIFY?

No. Now, I don’t have a lot of data to work with. The scope only captures 10 seconds of data by default (I only just now turned it up to 500 seconds).

Here is the latest test, filtering for AT_TARGET and TRIGGERED_MODIFY

Here’s a closer view of the anomaly.

I’ll check some of my more recent data files to see if the answer to this question is always the same.

UPDATE: I only have one other data file with all the axis status words in it, but it looks just like the one illustrated here. This certainly could be a reliable pattern, @scott

Ok, in my testing, I never reach AT_TARGET because I’m always asynchronously triggering the TRIGGERED_MODIFY while the PT motion is running.

So in your case, you must be calling a MovePT(final=true) and the command trajectory completes before the TRIGGERED_MODIFY?

TRIGGERED_MODIFY should not do anything if you trigger it an the Axis/MultiAxis is not in motion.

1 Like

This particular type of test that we’re running is attempting to shake out timing bugs when the probe triggers right at (or very close to) the end of the move.

The commanded stop is something we do before “disarming” the probe (a form of cleanup). We put the multi axis in a STOPPED/IDLE state (and wait for it before proceeding) so that we can be certain nothing is moving while we’re changing internal states and sending commands to the drives. That procedure is also inhibiting further calls to MovePT() until cleanup is finished.

Here’s the anomaly again, but only looking at the Axes’ STATUS.AT_TARGET, MultiAxis.MOTION_STATUS.TriggeredModify, and Probe input, armed/success bits.

It looks like what you’ve described (slightly augmented by me) is what’s happening here.

Supposing that the probe triggers just after the end of the commanded decel, but before we’ve disabled the user limit.
What would you expect to happen then?