Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: Synchronize autoevent start and stop operation to prevent fatal segmentation faults in device services #444

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

TuhinDasgupta-eaton
Copy link

@TuhinDasgupta-eaton TuhinDasgupta-eaton commented Feb 25, 2023

BREAKING CHANGE: The concept is to have a proper synchronization mechanism using two level semaphores for device autoevents stop and start operation whenever there is some device profile update... This mechanism prevents the chances of segmentation faults in device services which actually results to device service start failure and instability...

Signed-off-by: Tuhin Dasgupta [email protected]

If your build fails due to your commit message not passing the build checks, please review the guidelines here: https://github.com/edgexfoundry/device-sdk-c/blob/main/.github/Contributing.md

PR Checklist

Please check if your PR fulfills the following requirements:

  • I am not introducing a breaking change (if you are, flag in conventional commit message with BREAKING CHANGE: describing the break)
  • I am not introducing a new dependency (add notes below if you are)
  • I have added unit tests for the new feature or bug fix (if not, why?)
  • I have fully tested (add details below) this the new feature or bug fix (if not, why?)
  • I have opened a PR for the related docs change (if not, why?)

Testing Instructions

New Dependency Instructions (If applicable)

@TuhinDasgupta-eaton TuhinDasgupta-eaton changed the title Synchronize autoevent start and stop operation to prevent fatal segmentation faults in device services feat!: Synchronize autoevent start and stop operation to prevent fatal segmentation faults in device services Feb 25, 2023
@TuhinDasgupta-eaton TuhinDasgupta-eaton marked this pull request as draft February 25, 2023 07:31
Copy link
Member

@iain-anderson iain-anderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of things on this so far -

  • For the sem_init calls I'd rather see things like that in an __attribute__((constructor)) function
  • We should avoid master/slave terminology
  • Should we have one 'slv' lock per device? ie make it part of the autoimpl structure?
  • Do these extra synchronisations remove the need for the refcount in the autoimpl?

@TuhinDasgupta-eaton
Copy link
Author

Couple of things on this so far -

  • For the sem_init calls I'd rather see things like that in an __attribute__((constructor)) function
  • We should avoid master/slave terminology
  • Should we have one 'slv' lock per device? ie make it part of the autoimpl structure?
  • Do these extra synchronisations remove the need for the refcount in the autoimpl?

@iain-anderson Thanks for the review... Below are the responses:

  • For the sem_init calls I'd rather see things like that in an __attribute__((constructor)) function
    => sem_init function is the function which belongs to the default POSIX semaphore used in C...
  • We should avoid master/slave terminology
    =>Fine... Can you please suggest some terminologies which shall be suitable to replace these...
  • Should we have one 'slv' lock per device? ie make it part of the autoimpl structure?
    =>This shall be fine but I believe that using 2 level semaphores shall give a better streamlined processing where with first level semaphore we will make sure that at a time either autoevent_start (which creates autoevent schedulers) or autoevent_stop_to_update (which stops and deletes autoevent schedulers while profile update ) function shall be called. While the second level semaphore would ensure that autoevent schedulers work one at a time in sequence instead of working all at once parallelly. This would also ensure that whenever a particular autoevent scheduler is in progress, this should not get deleted when there is a triggering of device profile update callback anytime which stops and deletes all autoevent scheduler which can further result into the fatal segmentation fault which we were facing at our end. Thus ensuring a streamlined processing...
    But yes, I think we can make this a part of the autoimpl structure...
  • Do these extra synchronisations remove the need for the refcount in the autoimpl?
    => Yes... there isn't any need for the refcount in autoimpl as this would itself ensure streamlined processing of autoevents schedulers...

@iain-anderson
Copy link
Member

iain-anderson commented Mar 1, 2023

@iain-anderson Thanks for the review... Below are the responses:

  • For the sem_init calls I'd rather see things like that in an __attribute__((constructor)) function
    => sem_init function is the function which belongs to the default POSIX semaphore used in C...

I mean, like this:

__attribute__((constructor)) static void edgex_device_autoevent_init (void);

...

static void edgex_device_autoevent_init (void)
{
  sem_init (&grp_mutex, 0, 1);
  sem_init (&elem_mutex, 0, 1);
}

and similarly a destructor funtion for the cleanup. Or, add the sem_init/sem_destroy calls to the service startup/stop sequences.

  • We should avoid master/slave terminology
    =>Fine... Can you please suggest some terminologies which shall be suitable to replace these...

eg. primary/secondary, global/local, group/element

Further thought - can this locking be achieved with plain mutexes rather than semaphores?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants