Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding simple autorecover - hopefully all of my build errors are fixed #249

Open
wants to merge 9 commits into
base: kinetic-devel
Choose a base branch
from

Conversation

watters-torc
Copy link

@watters-torc watters-torc commented Sep 12, 2017

Maintainers: This is a feature we needed for a project, originally we were under the impression the socketcan_bridge and socketcan_interface were under BSD, it seems like this project (ros_canopen) has a bunch of mixed licenses. I don't necessarily see this as a production ready fix but it works for what we need. I'm able to make changes or take direction from the maintainers if you'd like work to further be done here.

Proposed fix / gap stop for #244

Copy link
Member

@mathias-luedtke mathias-luedtke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First all: Thanks for your contribution!
Please find my immediate change requests below.

In ros_canopen auto-recovering should not be done by default.
So please add a parameter to enable this feature explicitly.
It would be great to have a Trigger service that could be called e.g. in non-auto mode.

PS: You can add changes in this PR, it will update automatically. There is no need to open new PRs.

@@ -1,4 +1,5 @@
cmake_minimum_required(VERSION 2.8.3)
add_compile_options(-std=c++11)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++11 shall not be used!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh. It's 2018 i.e. 2 more standards later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c++14 is now the default for melodc 🎉

${catkin_LIBRARIES}
)


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a .clang-format file which can be used to avoid this kind of error?

socketcan_to_topic
socketcan_to_topic_node
topic_to_socketcan
topic_to_socketcan_node

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace

@@ -131,4 +151,6 @@ if(CATKIN_ENABLE_TESTING)
${catkin_LIBRARIES}
)


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace, please keep the new line ending

{
state_pub_ = nh->advertise<can_msgs::CanState>("can_state", 1, true);

timer_ = nh->createWallTimer(ros::WallDuration(5), [this](const ros::WallTimerEvent& event) {recover();}, true, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please migrate the lambda into a dedicated function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to make the timeout configurable.

}
else
{
timer_.stop();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to stop a one-shot timer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If status has been updated before the callback fires, we we're giving a little time for this system to "maybe" fix itself

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void ros::WallTimer::stop ()
  | Stop the timer. Once this call returns, no more callbacks will be called. Does nothing if the timer is already stopped.

ie, if status became ready we don't want the callback to fire

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no way that the state can change from bad to good without intervention (?).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if that statement would hold true in future versions of socketcan_interface

*/
void SocketCANRecoverCtrl::recover()
{
timer_.stop();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to stop here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stop is explicitly required, you will have to forgive me I don't remember why, either it made the timer hang holding a reference preventing from being able to simple Ctrl-C the app, or it was needed to call start below, ill look into my reasoning here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, it might block at ctrl+c.
But fixing it here is just handling the symptom.

default:
state_msg.driver_state = can_msgs::CanState::CLOSED;
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to just copy it: state_msg.driver_state = state.driver_state;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can::State and can_msgs::CanState are incompatible, i think

@@ -149,7 +149,7 @@ class SocketCANInterface : public AsioDriver<boost::asio::posix::stream_descript
ret = true;
}
if( internal_error & CAN_ERR_RESTARTED){
str += "ontroller restarted;";
str += "controller restarted;";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

*/
void SocketCANRecoverCtrl::stateCallback(const can::State & state) {
publishStatus(state);
if(!state.isReady())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the driver should get recovered only if it is not closed.

@mathias-luedtke
Copy link
Member

@watters-torc: Do you have time for working on it?

If not, I will see if I can find some time to implement the follwing base on part of your code:

  • diagnostics output for all bridge nodes (instead of the custom message)
  • recover service
  • opt-in auto-recover

@redkite
Copy link

redkite commented May 24, 2018

Is this still work in progress? We also receive error frames which kill the bridge all the time.

@koenlek
Copy link

koenlek commented Jun 4, 2018

Works like a charm for us. Would be nice if this could make it into an official release

@Erumeldor
Copy link

Would be still a good thing to have a auto recovery when having errors. For my case, when disconnecting the can network from our hardware and reconnect it again, there is no time the can messages will be send again. It always says: "[ERROR] [...]: Failed to send message: 8...#00....."

@tonybaltovski
Copy link

Anything more needed to get this merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants