-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: What is simulation speed bottleneck? #680
Comments
I'm also interested in same question. Is it possible to achieve similar level of performance of cloudsim using docker compose and if so what kind of machine should we use (I'm primarily targeting a AWS EC2 instance) My experience has been as follows. (Numbers could be slightly off as these were from my general memory as I remember) For 2x UGV (Same as above)+2x UAV(same as above)+Teambase This local performance is even without solution containers/just using ign launch on catkin without any of solution nodes. Haven't tried using in headless mode also yet. I haven't still tested using docker compose on amazon EC2. I understand that even then, it won't be apple to apple scenario since in cloudsim, simulation container and solution containers are potentially running on different EC2 instances. In general I'm looking for following information, if it's possible to know.
Probably it's not a straightforward relationship between EC2 instance and container count as it's dynamically managed kubernetes cluster spanning across multiple nodes. Just looking for some rough figure so that I could try to recreate using docker compose on AWS. (Probably with even befier EC2 instance that has power equal to combined EC2 instances required for cloudsim simulation. assuming there exist such beefier single EC2 instance type) |
@AravindaDP Most of your questions have answers in https://github.com/osrf/subt/wiki/Cloudsim%20Architecture . This bug is however about finding the bottlenecks on systems that have enough resources. Your local PC tests very probably suffer from resource exhaustion... |
@peci1 Thanks for pointing out the resources. |
Here are the CPU specs for EC2:
|
Not sure if it has any effects but seems some 3D lidars have way too much horizontal resolution than real sensor they are based on. e.g. For X1 Config 8 and EXPLORER_X1 Config 2 it has 10000 horizontal points per ring. Where as VLP-16 (which I believe based on which these were modeled) would only have 1200 horizontal points per ring at 15Hz Probably this is not the actual bottleneck but I guess still needs correction. May be the culprit is camera sensor? I guess I see a difference between COSTAR_HUSKY and EXPLORER_X1 (Single RGBD vs 4x RGBD) is it CPU based rendering? Could it be made to use GPU? |
According to my tests, the simulation does not use more than 4 CPU cores. As for GPU, most of the usage is in the gui - I do all local runs headless - if I don't, the GUI takes all available GPU memory (8GB in my case) and nothing else works on the computer running the simulation. Not knowing anything about the actual implementation, I am also surprised by the drop in performance when simulating multiple robots. From my (possibly naive) point of view (considering current games) the resolution of the cameras are small and the requirements on the quality are not that big either. We are using mostly 640x480 cameras which is 0.3MP. It seems 1920x1080 (2.1MP) is the minimum current games use and the FPS starts at 60Hz - so pixel-wise the ratio is 6.75x and fps-wise 3x (at minimum). Given this comparison it should be possible to get about 20 cameras at 480p resolution at 20Hz real time while in reality we get only about maybe 3% of that. So yes, I'd also like to know where the bottleneck is. It is really difficult to get anything done at 3% of real time. |
How many robots are you talking about? I get like 800 MB GPU memory for the GUI with a single robot, and it seems to scale more or less linearly with more robots. We've actually found out that the Absolem robot is quite a greedy-guts regarding cameras - the main 6-lens omnicamera sums up to something like 4K... So I wouldn't wonder it takes some time to simulate that, but I wonder why the GPU isn't fully used. Or maybe it's just because of the way nvidia-smi computes GPU usage? I know there are many different computation/rendering pipelines in the GPU... |
I have re-run the test. Currently when running headless simulation with as single X2 robot there is one ruby process taking about 2GB of memory and the GPU utilization stays around 5%. When running the same setup with the gui, there are two ruby processes each taking 2GB but the GPU utilization jumps to 100% and even the mouse movement is slowed down (even when the window is not visible). So in my book the gui is still broken for me and I'll continue running headless. I have ubuntu 18.04 system with nvidia driver 450 and GeForce GTX 1050 with 8GB. |
Was this test performed via Docker or in a direct catkin install? |
Docker |
Could you re-do the test with a direct install? Because I'd like to clearly separate performance loss brought in by Docker from the performance of the simulator itself. When I run the simulator directly and headless, there is no noticeable slowdown on my 8th gen core i7 ultrabook with external GPU (as long as there is a single robot with not that many cameras). |
Actually, sorry, no. I am not going to risk messing up the whole computer with installing all the ros and ign stuff directly into the system I depend on. However rviz is working just fine from inside the docker, taking a hardly noticeable hit on the GPU utilization when displaying an image from the front camera and depth cloud at the same time - taking only 18MB of GPU memory - that aligns more with my expectations. |
We might get some improvement in speed by building the plugins in this repository with optimizations enabled. See #688 |
The simulation speed is clearly related to the RGBD cameras. I have modified our models to run without the cameras and only LIDAR, and two robots can be run at about 60-80% realtime. If the same models are used, but with the cameras enabled, the same two robots run at about 20% realtime. Our models use a 64 beam LIDAR versus the 16 beam present on most systems, so a higher resolution LIDAR does not seem to impact performance much. |
I agree the speed goes down very much with cameras. However, I wonder why the computer doesn't utilize more resources in order to keep it running as fast as possible. GPU lidar is basically just a depth camera in Gazebo - resolution would be something like 2048x64. So the preformance impact would be hard to notice (even more with the 16-ray ones). |
My knowledge on ignition gazebo is very limited so take my following observations/hunches with a pinch of salt. I believe ignition gazebo renders camera serially so that might explain why we don't see increase in resource utilization with respect to number of robots/cameras. I'm also curious about usage of manual scene update in RendoringSensor (base class of all cameras as I understand but also of Gpu lidar) https://github.com/ignitionrobotics/ign-sensors/blob/main/src/RenderingSensor.cc#L89 I think best way to analyze where is bottleneck is to use profiler to analyze what's taking time. |
I can confirm that using a release build of subt_ws as suggested in #688 makes a noticeable improvement in performance. (In my case approx. 2x speed up for single X2C6 in GUI mode) |
@AravindaDP How did you pass compiler flags and build parameters to catkin? |
@pauljurczak I just used |
@pauljurczak See #688. |
Thank you. I rediscovered that |
This might help a lot: gazebosim/gz-sensors#95 . I created a PR that would add a service Thanks for the idea @tpet ! |
The ign-sensors PR has been merged and a new version was released in binary distribution. Now, #791 contains the required SubT part through which teams will be able to control rendering rate of sensors via ROS services. Even before #791 is merged, you can already set the rate in locally running simulations by directly calling the Ignition services ( |
With #791, I achieve 50-70% RTF with EXPLORER_X1 (3D lidar + 4x RealSense). |
Hi all, just bumping this issue as I'm using the simulator and encountering similar problems with a speed bottleneck. Is there a summary somewhere of the options to speed up the simulation? I have access to a super computer, but it requires a lot of specialist software to setup, and I've learnt from @peci1 that a supercomputer may not resolve the bottlenecks. The old version of gazebo with Ignition Dome does not work well for parallel processing, so I'm interested in finding out if Gazebo Fortress or any other fixes were used by the competing SubT teams. I'm looking for a way that I can use multiple cores to speed up the simulation. |
We tried to run the simulator on a beefy machine (40 cores, 4 GPUs) with a full team of robots (3 UGVs, 3 UAVs, approx. 30 cameras in total). Neither the CPUs nor any GPU were throttled to max, yet the real-time-factor was between 1-2 percent. Is there any clear performance bottleneck that could be worked on? E.g. aren't the sensors rendered serially (i.e. first one camera, then a second one, and so on)? Or is there something else? I'm pretty sure the physics computations shouldn't be that costly (and the performance doesn't drop lineraly with the number of robots).
The text was updated successfully, but these errors were encountered: