Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load simulation results from DB before scheduling + performance improvements #1051

Merged
merged 10 commits into from
Aug 21, 2023

Conversation

adrienmaillard
Copy link
Contributor

@adrienmaillard adrienmaillard commented Aug 1, 2023

Description

As usual, on my way to the feature, I get distracted by other improvements and fixes. Some of those are really worth it, in commits (4) and (5) especially. The last commit shows a performance improvement in 20+ tests. About 30-50% less simulations in these test cases.

In (1), I removed the need for an ugly artificial time buffer in the scheduler simulation by fixing an issue in Windows: if the last interval of a profile was closed-open, Windows considered that there was a gap because it only used Interval.contains and that contains does not care about the open/closed property. The detected gap prevented from turning these windows into spans and so on. Hopefully, that is now squared.

In (2), I have added a small optimization that may spare us a simulation if not needed.

In (3), a small fix regarding the domain of converted simulation results.

In (4), a significant change. I removed the conflict detection after goal solving. After solving a goal, we ran conflict detection to check that we effectively solved it. This resulted in a simulation runs that could be avoided. The only goal which was taking advantage of this mechanism was the cardinality goal in the case of activities with uncontrollable duration. It was posting one conflict at a time and kept posting conflict until the duration and cardinality conditions were not satisfied. The MissingActivityTemplateConflict (and its processing by PrioritySolver) has been generalized to handle duration and cardinality thus removing the need for the second conflict detection after goal solving.

In (5), I fixed a boundary reset condition that were causing more simulations than needed. This bound had been set conservatively because at some point, the driver was dropping some tasks. But this has been fixed a few month ago.

In (6), I have modified the scheduler so that it is able to use initial simulation results coming from the DB.

  • the solver keeps the initial simulation results and returns them when needed if the associated initial plan has not been modified.
  • a field has been added to the SimulationFacade to hold an initial plan until a simulation is needed. This is to avoid simulating a plan that will not be needed. Rootfinding needs the facade to have the current plan "loaded" but if external simulation results have been loaded, there is no need for a first simulation before the first rootfinding try.

In (7) I pull the simulation results from the DB and pass them to scheduling. DB simulation results are considered useful if they correspond to the current revision of the plan and if they cover the entire plan horizon. We do not pull events and topics to save time (and they are useless to the scheduler right now).

In (8), I have fixed some tests. The results are the same but the data path as well as the in-memory state was wrong.

In (9) I have added a test to show how loading initial simulation results allows to remove the first simulation. The test is the same as its neighbor and displays a lower number of simulations (-1) for the same results.

In (10), I have updated the simulation performance of the scheduler tests. These improvements are mostly due to (4) (and a little bit (5)).

Verification

Existing tests are passing. Only one new test. Testing the graphql requests/scheduler-worker is not easy, I am open to suggestions.

Documentation

None.

Future work

None.

@adrienmaillard adrienmaillard added performance A code change that improves performance feature A new feature or feature request scheduling Anything related to the scheduling domain labels Aug 1, 2023
@adrienmaillard adrienmaillard requested a review from a team as a code owner August 1, 2023 18:33
@adrienmaillard adrienmaillard self-assigned this Aug 1, 2023
@adrienmaillard adrienmaillard temporarily deployed to e2e-test August 1, 2023 18:33 — with GitHub Actions Inactive
@adrienmaillard adrienmaillard changed the title 745 pulling sim results Load simulation results from DB before scheduling + performance improvements Aug 1, 2023
@adrienmaillard adrienmaillard temporarily deployed to e2e-test August 1, 2023 18:34 — with GitHub Actions Inactive
@Mythicaeda Mythicaeda linked an issue Aug 1, 2023 that may be closed by this pull request
@adrienmaillard adrienmaillard requested review from JoelCourtney and removed request for camargo and cohansen August 1, 2023 20:51
@adrienmaillard
Copy link
Contributor Author

Here are some actions I am going to take following the walkthrough this morning:

  • check that the documentation mentions that the sim configuration is loaded by the scheduler
  • when looking for a suitable simulation dataset, add a condition checking that the current sim config is equal to the sim config used for the potential dataset
  • break down the big graphql request into similar smaller requests (and maybe multi-thread them)
  • add an e2e test with stale results

@adrienmaillard adrienmaillard temporarily deployed to e2e-test August 4, 2023 20:43 — with GitHub Actions Inactive
@Mythicaeda Mythicaeda linked an issue Aug 7, 2023 that may be closed by this pull request
Copy link
Contributor

@Mythicaeda Mythicaeda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far! As mentioned during the walkthrough, I'd like to see two E2E tests to prove that "stale" simulation data will not be used.

The first would test the check on Plan Revision, by having a coexistence goal between two activities (ie for each GrowBanana, PeelBanana). The stale sim run will contain fewer GrowBananas than the "current" Plan at the time of scheduling, allowing the number of PeelBananas to indicate whether the stale data was used.

The second would test that the scheduler will not load a dataset that used a different sim config than it will. This can be done by have a coexistence goal on a resource (ie for each plant count < 5, GrowBanana) and then having that resource differ between when sim was run and when scheduling is run.

@adrienmaillard adrienmaillard temporarily deployed to e2e-test August 8, 2023 20:36 — with GitHub Actions Inactive
@adrienmaillard
Copy link
Contributor Author

@Mythicaeda I have added the 2 e2e-tests.

@adrienmaillard adrienmaillard temporarily deployed to e2e-test August 21, 2023 16:49 — with GitHub Actions Inactive
@adrienmaillard adrienmaillard temporarily deployed to e2e-test August 21, 2023 17:20 — with GitHub Actions Inactive
@adrienmaillard adrienmaillard temporarily deployed to e2e-test August 21, 2023 17:26 — with GitHub Actions Inactive
Copy link
Contributor

@Mythicaeda Mythicaeda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Excited to see this finally get in!

- Harmonize condition between SimulationFacade and ResumableSimulationDriver for restart
- Initialize sim time at negative value to avoid useless simulation when an activity starts at Duration.ZERO.
  Also, avoid actually launching daemon task at initialization, wait for the real start of simulation.
- Remove need for buffer in simulation results of scheduler
The need for buffer was due to a bound checking in Windows.intoSpans failing if one of the bound not included. And, the last segment of all profiles (ending at the end of the simulation horizon) are always open. I have included the special case in the checking of bounds.
Tests were running correctly but the proper call path was not taken
@adrienmaillard adrienmaillard temporarily deployed to e2e-test August 21, 2023 21:34 — with GitHub Actions Inactive
@adrienmaillard adrienmaillard merged commit faf183d into develop Aug 21, 2023
5 checks passed
@adrienmaillard adrienmaillard deleted the 745-pulling-sim-results branch August 21, 2023 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature or feature request performance A code change that improves performance scheduling Anything related to the scheduling domain
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Add iterative option for scheduling goals Load simulation results from DB before scheduling
3 participants