SyncInterruptWait() Taking Longer than Expected?

todd_mm · January 24, 2023, 6:56pm

[RMP 10.3.8]

Symptoms

I have a thread that calls MotionController::SyncInterruptWait() in a loop.
MotionController::SyncInterruptPeriodSet(7) was called earlier during a configuration step.

The vast majority of the time, this function returns according to the configured period.
Sometimes, it doesn’t.

I logged elapsed time before calling SyncInterruptWait() and after it returns. Here’s a histogram of the times I recorded.

 Wait Time         Count
 ---------    ----------
     1.055         1,261
     8.280    24,955,263
    15.506            18
    22.732             4
    29.957             1
    37.182             1
    44.408             2
    51.633             0
    58.859             1
    66.085             1

These are the long ones (≥ 20 ms)

Wait Duration
-------------
       20.867
       22.358
       23.233
       23.974
       25.807
       26.664
       36.357
       39.477
       46.567
       46.940
       65.014
       73.310

Question

What things could cause (or influence the behavior of) SyncInterruptWait() to take a long time to return?

I don’t really understand the interaction between the Windows process and the RTA.
I presume that if the CPU were very busy, Windows might not schedule the calling thread for a time. How likely is that? I can’t prove that the CPU is under long-term heavy load (it’s generally < 75%).

Can you explain to me how this function works? Are there external things that will influence the behavior I’m seeing?

scott · January 24, 2023, 7:15pm

The short answer is that Windows is not an RTOS. Windows is not designed to guarantee a specific response time for events or tasks.

Even if Windows doesn’t have many processes running, there are still many factors that can affect the performance of a program trying to perform an operation every 7 milliseconds. Some possible reasons for missing the 7ms deadline include:

Scheduling: The Windows scheduler is designed to handle a wide range of tasks and processes, but it may not be able to provide the same level of real-time performance guarantees as an RTOS. The scheduler may prioritize other tasks over your program, causing it to miss its 7ms deadline.
Interrupt Latency: Windows uses interrupts to handle input/output operations and other events. These interrupts can cause brief delays in your program’s execution, which could cause it to miss the 7ms deadline.
Context Switching: When the Windows scheduler switches between tasks, it needs to save and restore the context of each task, which can cause additional delays.
Power Management: Windows has various power management features that can cause the CPU to go into lower power states, which can cause delays in your program’s execution.
Background tasks: Windows has various background tasks that run in the background, such as security scans, software updates, and indexing of files. These tasks can consume CPU resources and cause delays in your program’s execution.
Hardware: The hardware, such as the CPU and memory, may not be able to keep up with the demands of your program’s execution, causing it to miss the 7ms deadline.

SyncInterruptWait() is waiting for a signal (semaphore) from the RMP. Since the RMP is running in an RTOS, I think it is probably signaling Windows every 7ms. You could try increasing your Windows thread’s priority to see if you can improve your worst-case latencies.

todd_mm · January 24, 2023, 11:46pm

Thanks for the response, Scott.

Trying to minimize what Windows is doing is a hobby for us that never grows old (since Windows is always changing), and so I realize how much Windows controls (and sometimes offers no control over) performance. You’ve pointed out some typical performance points, some of which we’ve already done our best to deal with. My question was more about things that my app might be doing (wrong/poorly) or that could be happening outside the app that might particularly impact performance in this way. Regarding what you’ve said, I have a couple of questions.

Interrupt Latency
- Is there any way to measure or track this metric? How would I know if this were an issue? Is there a way to “see” the latency in some way?
- Are there categories of things/behaviors that generate lots of (CPU?) interrupts that I could look into (e.g. types of adapter cards, software behaviors) as possible sources of generating too many interrupts?

Can you provide some detail about the synchronization primitives available to/shared by both the Windows library code (running in user space) and the RTOS/RTA? I know, in the abstract, what a semaphore is, but what is actually used? Is this an INtime invention or adaptation of a Windows thingy?

todd_mm · January 24, 2023, 11:47pm

@scott
Also, should I expect to see the numbers in the histogram above resemble the histogram that the INtime Graphical Jitter test displays (presuming that I align the buckets)?

scott · January 25, 2023, 7:24pm

In the case of the SyncInterrupt, it’s not a true hardware interrupt as it used to be when the firmware was running on a PCI board with a hardware interrupt.

I don’t know how you track down things in Windows that are hogging the CPU (interrupts or otherwise). CPU usage % is probably of some relevance, but since your thread seemed choked out for up to 60+ milliseconds (totally normal thing in Windows, not RTOS) I don’t know where you can look.

Your recordings of elapsed times are giving you the latency. In the RMP firmware, there is a semaphore that is released every 7 samples (assuming 1kHz sample rate). In Windows, SyncInterruptWait is waiting for it to be released. It’s an INtime semaphore, created in INtime and therefore real time.

The INtime jitter buckets are 7 microseconds. So the semaphore might be released early/late by up to the INtime jitter (likely less than 100us on most systems), so even if there was 500us of jitter (there isn’t) you’d barely be able to tell with your 7ms period.

You could try looking at the value returned from SyncInterruptWait(), which is the RMP sample counter just before it returns. I suspect these will match up with your elapsed times.

You could also sandbox a simple project that tries to do a 7ms loop in Windows. I suspect it would have similar behavior… fine most of the time and 60+ms late sometimes.