Starting the Network _Too Frequently_?

todd_mm · June 21, 2021, 6:13pm

[RMP 8.3.1]

Is it possible to try to start the EtherCAT network too often or too soon after a recent failure to start?

The Problem

I’m trying to make my application respond gracefully to situations where the EtherCAT network disappears (e.g. turning off power to the drives). In this scenario, I unplug the ethernet cable from node 0, wait a while, and plug it back in.

Currently, my app tries to start, it fails, it tries again pretty soon (maybe 100ms later), and repeats, hoping that sooner or later it will work. I have observed that at some point in this impatient, relentless attempt to get everything working again, that the RTA seems to get into a state where it kind of stops working correctly (without raising any sort of exception that INtime would complain about).

This happens somewhere around attempt #10. All attempts thereafter fail (in my app) or stall (in RapidSetup).

It just gets stuck in discovery/starting and never gets out of it.

Note: the firmware time seems a little large. Perhaps I shouldn’t expect something meaningful during this phase, but expected to. Also, the sample counter is incrementing much slower than normal: ~ 0.5 - 1.0 seconds per tick.

I have to restart the RTA to get out of this state.

The Question

Is it possible to try too frequently to start the network? I’m pretty sure that I’m not making multiple concurrent attempts to start, but I am trying to start pretty soon after the last failure.
Is there a practical limit for this? Should I not attempt to restart for a certain amount of time?

jacob · June 22, 2021, 1:41pm

Hi Todd,

NetworkStart has a timeout overload which is used internally with 30 seconds. Don’t try to restart the command without waiting for the previous results. There is a built in 4.2 second start required to deal with rollover issues for nodes which don’t have a 64bit clock. Beyond that every network topology is different in how long it takes to get to Operational. Testing with RapidSetup can help you determine how long you want it to be if not using the default.

The RTOS INtimeRestart is there to help if you find yourself in a fubar state. The sample counter moving at anything other than your sample rate being a good example. You are seeing the firmware semaphores timeout releases in the case you are talking about. Something has gone wrong with processes communicating internally.

Generally speaking, we are expecting customers to have a specific system in mind. ~We are producing machine ABC123 which has topology XYZ.~ If you find yourself missing Z, we don’t expect an application to Discover, Generate a new ENI, and get the network operational again as soon as possible. We expect the missing Z Node to be a show stopping event which needs to be evaluated immediately. I don’t recommend putting Generate ENI in your normal code flows. As developers, we can see a lot of value in being highly flexible. I’m constantly changing the size and composition of my networks. We expect people to be hesitant to automate or make an easy recovery button for the factory floor user. All too many people will just try to click through a software problem without understanding the machine ramifications.

I’m available if you’d like to do a remote session to work through any errors are seeing come up regularly.

todd_mm · June 22, 2021, 5:09pm

“Consider this scenario…”

So, suppose the master loses connectivity to slave 0 (e.g. unplugging the network cable or the slave is powered off)…

What would you expect to happen the first time

MotionController::NetworkStart(RSINetworkStartModeOPERATIONAL, RSINetworkStartupMethodNORMAL, network_start_timeout_sec*1000)

is called? In particular, is the runtime still waiting for something to complete?

I’m giving it a timeout. I get an exception from the RMP runtime after the timeout I specified (15 seconds) has elapsed.

Timed out waiting for network to start. (Error 1000000022) (RSI::RapidCode::Impl::MotionController::NetworkStart) (Object 0) (File …..\source\motioncontroller.cpp) (Line 2938) (Version 8.3.1 for 04.04.02.RMP)

I wait 5 seconds to attempt to start it again, but the network state is now RSINetworkStateSTARTING. I go ahead and attempt to start the network again (same call as above), it times out after 15 seconds, I wait 5 seconds, and repeat.

Now, eventually, I plug the network cable back in, connecting the master and node 0, but RMP never reports the network state as anything other than “starting.” I have to restart the INtime node in order to be able to start the network again. Even RapidSetup was unable to get the network started. (I might be able to do less, but nothing else I’ve thought of trying has produced results.)

Proper Behavior

What should I do under these circumstances? I want to be able to recover from the network dying. Should I never try to start if it’s already starting? Could my attempting to do so cause the RTA to run amok and get into an irrecoverable state?

scott · June 22, 2021, 7:56pm

In these circumstances, I think you’ll probably need to restart the RMP, possibly INtime. You might be able to get more information (for your log) using NetworkLogMessageGet(). See our C# helper function, StartTheNetwork() for details.

NetworkStart() will fail if you were previously running (with a valid, discovered topology), unplug one node and call NetworkStart() again. In that case it will expect the previously discovered topology and will fail. I expect the NetworkLogMessagesGet() would help confirm the result.

You should not try to start the network if the state is STARTING. This could cause the RTA to run amok and get into an unrecoverable state. If it’s stuck in the STARTING state, it seems like it is already unrecoverable.

todd_mm · June 22, 2021, 9:17pm

How much of a topology change will necessitate generating the ENI again?

change of node order (supposing something more than two identical nodes are switched)
change of revision number (suppose I update the firmware on a drive…)
station alias?

In RMP 8.3.1, is it possible to generate a new ENI from RapidCode? I see where I can probe, start, or probe+start, but I don’t see anything about generating an ENI.

Is there a way to generate a new ENI based solely on what’s on the network (rather than have an XML file and give it to rsiconfig) in an automated fashion? Is it available in C# for 8.3.1?

todd_mm · June 22, 2021, 9:38pm

I see some references to RapidENI in the .NET assemblies, but I can’t find anything resembling documentation or examples.

What I want is to be able to generate the ENI from the most recent probe/discovery.

todd_mm · June 23, 2021, 1:25pm

My chief goal here is to be able to Gracefully Recover™ when the physical network goes away temporarily as well as respond to superficial changes in network topology (e.g. switching an axis with an I/O block—in short, something the application knows how to handle).

If the network goes away, I need to be able to detect that somehow and attempt to restart the network until it succeeds or our application is closed.

If the topology changes in a way that my app knows how to handle (e.g. adding a new I/O slice to an existing block of slices), I would rather just generate a new ENI and proceed. Our customers don’t use RapidSetup directly because our application doesn’t generally need them to and it would just be one more thing for them to have to become familiar with. The only thing RapidSetup does that we can’t do is produce an ENI file. (It’s useful for troubleshooting, but it’s not useful for normal operations, where they want to run G code, etc.) If I could automate the generation of an ENI file, we wouldn’t need to jump into RapidSetup only to click the “Generate ENI” button, then leave the app. AFACT, RapidSetup isn’t automatable.

todd_mm · June 23, 2021, 1:34pm

Having a programmatic way to accomplish what clicking the “Generate ENI” button does is what I want.

RapidSetup is a useful app, but my expectation is that I ought to be able to do everything I need using the API (or command-line tools).

jacob · June 23, 2021, 1:58pm

Hi Todd,

Noting again that I recommend against automatically generating an ENI file, here are the tools for troublemaking!

In earlier versions you will need to reference the RSI.System or RSI.System64.dll. However, we rolled this code into RapidCode64.Net.dll in 10.2.0.

RSI.RapidCode.dotNET.RapidENILib.RapidENI is the namespace & class that you want to use.

RapidENIResult result = RapidENI.GenerateENIFile(RapidENI.NetworkDetailsFromMotionController(mc));

is the most basic example of its use. You will want the overload in which you specify overwrite previous ENI file.

Here are the possible results:
public enum RapidENIResult
{
Success, //Aren’t you glad you aren’t dealing with one of the below?
MissingESIFiles, //Add ESI file with correct Vendor, Product, Revision to the ESI folder.
ConfigNotDetected, //Failed Configuration creation or 0 slaves to add to it. Did you move your ESI folder?
ProductCodeLookupError, // This enum represents that an error(s) has been found in NodeInfo.xml.
UnknownFailure, // Fall through case. Contact RSI with details about your changes to the various xmls.
ENIFileAlreadyExists, // Use overload which overwrites or be glad we didn’t just delete your old file.
FailedToCreateENIBuilder, // did not create ENIBuilder internal object
}

todd_mm · June 23, 2021, 4:13pm

Thanks, Jacob!

Noting again that I recommend against automatically generating an ENI file

Noted.
Out of curiosity, why so strong a reservation? Is there real risk of breaking something under the covers, or is it too easy to foul things up, or something else?

NetworkDetailsFromMotionController

I can’t find this function anywhere in the object browser (and the compiler complains) (for 8.3.1). It looks like GenerateENI just takes the motion controller object (interface) itself. Is that correct?

jacob · June 23, 2021, 8:15pm

Hi Todd,

I worry about the users of the machines after they are out of the hands of developers. It is easy for me to trust you and you using your code as intended. I’m worry about any feature leveraging it being used outside of your intent. I have a great deal of faith in most people just trying to click through warnings to get software to work. I do it! I figure many others do too.

Ah yes, we did make some changes to the arguments to make it more testable. I gave you the 10.2.2 call rather than the older style. I believe your method works, but requires a real motion controller and network data.

todd_mm · June 23, 2021, 8:49pm

Good point. I’ll add some logic to mitigate this possibility.

todd_mm · June 23, 2021, 8:56pm

Are there other (implicit) preconditions?

When I didn’t set the CWD to the RSI dir, I got ProductCodeLookupError.

When I do set it, I get MissingESIFiles. I know the ESI files are in RSI_DIR\ESI.

todd_mm · June 24, 2021, 2:13pm

If I move the executable into the RSI directory, it succeeds.

I would have expected changing the CWD would have been enough. Do you know why setting the CWD isn’t enough?