User Limits and Axes' Commanded Positions: Runaway

      if(frames_left < lowLimit)
			Status |= MS_FAULT;

So it seems like you’re not keeping the frame buffer full enough in those instances. Though this is an internal value, set to 128. It’s used more commonly if you were sending a huge quantity of PT points to MovePT.

The re-send was something we added in an attempt to recover from the motion buffer running dry. We’re running on Windows, and occasionally our thread(s) get denied enough run time that the motion buffer can be exhausted. Sometimes this is due to problems we created, but other times, it seems like it’s an OS thing. Anyway, until recently, we were trying to keep the multiaxis in the streaming state as much as possible. We may not need the re-send behavior any longer.

I’m a little hesitant to change it now, just because we’re planning to release a new version of our software, and making this kind of change would necessitate a lot more testing. I’ll take it under advisement that we probably don’t need to do what we’re doing, but unless that’s actually the culprit of this issue, I’d rather wait a little while before revamping the motion loop.

What happens to the (already) buffered motion when each of the stopping actions takes place? The Stopping Actions document doesn’t explicitly address my question, but it sounds like only Stop would keep the buffered motion (since it can be resumed). Is that correct?

I’m trying to have informed expectations about using MovePT(). If all buffered motion will be discarded, and I know the call to MovePT() will be ineffectual (because of the current MultiAxis state), then skipping the call probably isn’t any sort of drastic change to the motion loop.

Per your earlier message, agreed on not making major changes now.

As for the actions, any of the “modify” actions will modify your PT motion and load new frames with the programmed deceleration. The already buffered motion would be ignored.

For STOP and ESTOP, the internal feedrate is ramped to zero in the programmed time. In order for these options to work, you’ll need to be sure the motion buffer has enough motion remaining to cover the programmed stop/estop time.

For TRIGGERED_MODIFY, I worry that you are not waiting for TRIGGERED_MODIFY to complete before attempting to add new PT motion. I would advise trying to use ETOP_MODIFY instead, so that your MovePT calls have no chance of interfering with the ESTOP_MODIFY action.

I’m going to experiment with skipping MovePT() if the MultiAxis has any of the stopping condition bits set. I’ll let you know what happens.

In this experiment, I added two things:

  • MovePT() isn’t called (it’s just skipped for that FSM iteration) if any of the “action flags” are set in the MultiAxis firmware status.
  • I increment a counter (a user buffer) every time I call MovePT().

The hope is that this will permit us to know if a very badly-timed call to MovePT() might be a contributor to the issue I’m seeing.

The anomaly (when axis 0 command position changed to something very wrong) this time was that axis 0 was commanded to something very close to the axis 1 cmd/actual position. (Actual: 563077, Cmd:562464, Axis0_Cmd: 561017). I don’t know quite where that number came from or what it might correspond to. These motors generally flicker 1000-5000 counts while they’re holding still, so this could have been the actual position at some (sub-millisecond) moment in time, though it doesn’t correspond to any actual position axis 1 was in during this brief window.

Here’s a visual from (~500ms) around the moment of the anomaly.

Items of interest:

  • The MovePT_Counter doesn’t change for 60 ms before the anomaly.
    • I don’t have any data for MovePT() calls before I added the skip-calling-MovePT-if-action-bits-are-set behavior.
    • I can’t offer reasonable speculation about the timing of MovePT() calls previously.
    • In this instance, it doesn’t appear that a call to MovePT() happened while something “interesting” was happening (e.g. TriggeredModify).
  • Axis 0 was commanded to move to the Axis 1 position (I frequently see it get sent to Axis 2).
  • The anomaly happened after the probe input (part of the condition of the user limit) changed. (This has always been the case, I just wanted to mention it again.)
    • The command position changed back to the correct one 4 ms after my next call to MovePT(), at which time the TriggeredModify and TriggeredModify_EStop happened the next millisecond.

Are there other runtime things I should be monitoring, like limit error (or other axis/multiaxis error bits)?

Does this bring to mind other things I can do to isolate the problem?

If I wanted to monitor sources of stopping action, could I? Would I look at the same bits (RSIFirmwareStatus) in RSIAxisAddressTypeSTATUS or should I use RSIAxisAddressTypeMOTION_STATUS?

I tried capturing both these values, but I don’t know which ones to compare with the bits defined in RSIFirmwareStatus.

The docs only repeat what’s in the comments in rsi.h. Can you advise me?

I was hoping to find some error bit coming on just before the MultiAxis did its bad things.

I don’t know what we’ll do with those, but if you are recording both, let’s look at AxisAddressTypeStatus (firmware axis object status) and MultiAxisAddressTypeMOTION_STATUS around the time of the anomoly.

In the meantime I’m adding more to my test app to reproduce.

Can you try something else, such as changing the ActionAxis number of the UserLimit to 0? It should propagate to the MultiAxis either way and I’m curious if you can make any changes that resolve the problem to give us more clues.

I’ll capture RSIAxisAddressTypeSTATUS (per axis) and RSIMultiAxisAddressTypeMOTION_STATUS and get back to you.

I’ll also start logging the user limit configuration each time, just to be sure that we’re not doing anything bad.

Currently, I create a user limit for each axis. Only the first one has a non-None action. Each of them specify the corresponding (global) axis index as the actionAxis (so 0, 1, and 2 in this scenario).
Do you want me to set them all to 0 or something else? It makes sense (for an experiment) to try setting them all to the same value. All the runs I’ve done previously have configured the user limit for axis 0 to be the one with the action. It sounds like what you suggested is what I’m already doing. How would you like me to vary things?

@scott, Can you think of other things that an RSI app could do that would have any influence on commanded positions being changed (or invalidated)? I’ve got a complex scenario that I can’t simply record and play back, and you’ve tried simplistic reproductions of the behavior under investigation and found nothing. It’s conceivable that something that’s happening elsewhere is resulting in this problem, when the timing is just right. My question is, what things could I be doing that could have any bearing on this?

I’m fairly confident that I’m not setting some value that would immediately invalidate all my objects (e.g. axis count, user limit count, recorder count, …).
Are there other things that I could be doing that I can check on?

No, I can’t imagine what you could do to produce this behavior, which is why I’m trying to reproduce it as well. I’ve run thousands and thousands of loops. I’m using different threads to load motion and to trigger the TRIGGERED_MODIFY action. I’ll switch to a UserLimit for the TRIGGERED_MODIFY now.

What are you using for your PT motion empty_count? Can you check FramesToExecute() for Axis 0? Your most recent plot looks like it was ~60ms without calling MovePT(). Is it possible the motion supervisor runs out of frames to execute, leaving you in an undefined state?

We have emptyCount hardcoded to 10.

I can’t elucidate or justify this decision. As far as I know, we copied it out of an example and haven’t changed it since.

Ok that’s good, I just wanted to be sure it wasn’t set to -1 (disabled). At the default sample rate of 1000 Hz, that’s 10ms.

Updated my test code to having a UserLimit cause a TRIGGERED_MODIFY here on Axis 1, while 1ms points are streamed to an XYZ MultiAxis (keeping X at 0, Y incrementing, and Z fixed at a large value). So I think I have most of what you described. I have never seen a failure, my X position is always at 0.

I’m deliberately keeping all axes away from zero and each other so that anything resembling an assignment to anything that exists will produce a noticeable change.

Ok I’m now forcing X to a non-zero value and very different from Z.

Here are some more pictures.

The second plot is the new stuff. It breaks apart the status words and tracks them, bit by bit.

Here’s the second one, zoomed in to the instant the command position jumped.

The command position jump happened at sample #2766977.
I don’t see really see anything new. What I do notice is that the Triggered Modify bit went high on all three axes and the multiaxis within the same sample. Also, the estop/error state went high 2ms later on all three axes and the multiaxis.

I was half hoping to see one of those bits come on first, but since everytihng’s running way faster inside the RTOS than 1 KHz, the scope data just can’t illustrate it.

@scott, Andrea and I were speculating about the order in which things ought (?) to happen. Should I expect otherwise simultaneous events (like three user limits being triggered by an input) to be processed in:

  • axis (ID) order (e.g. 0, 1, 2, …)
  • user limit number order
  • some other prioritized ordering?