-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Review #85
Comments
Is the faster-fifo library optimization just for mocking the IMU data? If so, then is losing windows support worth it for something that is may or may not be used for an actual launch? Also, do we have an up-to-date profile of the code running on the pi, mocks or not? It would be nice if we could find a way to algorithmically optimize the queue actions instead/in addition. |
The most performance to be gained is in fetching IMU Data packets from the
I have not specifically profiled the code on the pi's hardware, yet, no. But an easy optimization for the queue we can do regardless of This is something we already do for the apogee prediction process. We didn't update the code for actual IMU yet because we would need to test it thoroughly, which we didn't have a chance to do. |
Performance is a critical part of our system. The code needs to do things fast enough to be able to predict apogee, and thus, deploy airbrakes in time, all while running on a tiny computer (the Raspberry Pi). While Python isn't known for it's speed, we do use it because of it's ease of use and massive community and library support.
One of the only ways we can speed up our program is by using
multiprocessing
- but this does add a lot of overhead, particularly with Queues. This is most noticeable when you run the mock sim with-f
(and if you profile the code to check the bottlenecks). In the earlier days, running even a real time simulation on the Pi would result in fetching about 15 packets at a time, which was indicative of performance issues. However after many optimizations across the board (e.g. #40, #45, #47, #69, #81, #82). This is basically down to 1-3 packets. Which means we have fully solved the performance problem! Yay!Or did we?
Running the mock sim at full speed still takes about 10 seconds for every launch. An eternity. This is still unacceptable for me.
So after investigating, I found out the producer side (
MockIMU
), simply takes up 6.5 seconds, just to read the file and create the data packets (and not sending them!). That was brought down to 1.73 seconds by usingpandas
(#82). Simply using pandas and mimicking what was done before does not give you any gains, you have to use it in a thoughtful and intentional way, profiling every step of your process. So now, the theoretical max speed our program can run should be 1.73 seconds (on my computer at least). But we don't get anywhere close to that, because of...Queues
.I would like to propose using the
faster-fifo
library for this purpose. Initial benchmarks show a massive improvement, finishing the purple launch sim in a staggering time under 5 seconds.The only disadvantage of using that would be that it does not work on Windows. I'm not even sure if it works on WSL. That being said, this would be the fork in the road for having Windows support, at least in the short term. Windows support is already not great, evidenced by the tests failing due to multiprocessing semantics. Alternatively, we can also keep backward compatibility, so Windows would continue working as before, while Unix/MacOS can enjoy the benefits more.
Long term, with Python 3.13 free threaded, we can drop multiprocessing altogether, simplifying a bunch of things as well, and that should theoretically gives us the same performance as before. We are simply waiting for
msgspec
to add support for it - jcrist/msgspec#744.The text was updated successfully, but these errors were encountered: