Generic exceptions within storage.write statements are not caught potentially causing inconsistent state #32

jordanly · 2018-08-16T22:03:26Z

A finding from #31.

A user created an update to remove instances from a job. This throws a NullPointerException as mentioned in the issue above. The LoggingInterceptor actually swallows the exception. This happens because we do the initial evaluation of the update within the user calling the RPC method (follow along the start(...) method if you are not convinced).

Although the above start command throws a NullPointerException, the update is still added to the MemJobUpdateStore but not persisted to the log. We still call saveJobUpdate(...) within the ‘start(...)’ code which will add it to the memory stores. However, because a NullPointerException is thrown before the write lock is exited, these operations are never persisted to the log. The design of the storage system in the scheduler is transactional so everything is added to the log at the end of the write. Due to this, we are now in a state where the memory store does not match the log store.

I think that we should catch all unhandled exceptions within the write lock and immediately kill the scheduler. This would avoid errors leaving a potentially inconsistent state and corrupting the log preventing easy rollback.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic exceptions within storage.write statements are not caught potentially causing inconsistent state #32

Generic exceptions within storage.write statements are not caught potentially causing inconsistent state #32

jordanly commented Aug 16, 2018

Generic exceptions within storage.write statements are not caught potentially causing inconsistent state #32

Generic exceptions within storage.write statements are not caught potentially causing inconsistent state #32

Comments

jordanly commented Aug 16, 2018