-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network throughput lost after each topology hop #2092
Comments
On Windows 10 with Vmware Workstation 16.2.1, on my 12 year old computer, worth $0, I ran the gns3-internal-topology iperf tests, for
I see no 50% bandwidth drop after the first hop. Since I saw no 5% bandwidth drop, I believe there was no reason for me to proceed. The “actua”l bandwidth numbers I believe can be ignored because my computer is very old. This post should includes custom made reports from Windows 10 and I will not say how I created them – for multiple reasons. Disclaimer: The textual numbers for the tests are: Test 1: Server listening on TCP port 5001 Iperf-client Ubu22-iperf-ssh-101: iperf -c 192.168.21.102 -i 60 -t 180 Client connecting to 192.168.21.102, TCP port 5001 local 192.168.21.101 port 52832 connected with 192.168.21.102 port 5001 Test 2: Server listening on TCP port 5001 Iperf client Ubu22-iperf-ssh-103 iperf -c 192.168.54.104 -i 60 -t 180 Client connecting to 192.168.54.104, TCP port 5001 local 192.168.53.103 port 40768 connected with 192.168.54.104 port 5001 Oh, the .gns3 file has to renamed from If the reader wants to load my ubu-frr-ubu project, the reader is trusted to know how to replace their Ubuntu 22 id with my Ubuntu 22 id – I think it is the template or node id in the ubu-frr-ubu.txt The project is really 2 projects, but, I chose to make it 1 project because my confidence level was extremely high that there is no problem with GNS3. The Window reports I just learned how to make yesterday. Give this 24 hours and I will put in all the stuff I may have forgot to upload and fix all they special characters that might be hidden. Oh, the raw csv files I will upload, but I am not uploading the normalized "csv" files (one is an xlsx because Excel deletes formulas if you save a file as a csv file). 01-ram-cpu-privilege-raw.csv No one should mess around with the raw csv files because they are very time consuming to understand an convert to a ram-cpu-priviliged-normalized.xlsx Oh, the topology diagram here: Oh, during the tests, even moving the mouse might generate an interrupt, so I had to walk away and close down just about everything. I did better and closing things down on the 2nd test. And I run Windows and Windows is always doing crazy stuff on its own. |
Oh, I chose FRR because micush pointed another gns3 community member to using FRR. |
Just an FYI. Running these same KVM VM images in Proxmox and not in GNS3 yields about 20Gbps to all hops in my setup. |
I only saw a 23% drop after the first hop on Windows with FRR as the middle node. Traditionally what you do is assign "host 1" as a client and "host 3" as a server and then you graph out the throughput Wireshark to determine if it is a "network" problem or a "server" problem. So GNS3 AND your host2 would be the "network". So, it can be very complicated if your host2 is paged in and out of memory. I did not include a graph of memory being paged out or any of the other thousands of snmp items because it is very complicated and is above the gns3 user expert level. It gets very complicated because the ubridge hypervisor can be used as a baseline for a system.......but I looked at gns3 the ubridge documentation and
I do not know what kind of vms your host1, host2, host3 are. I do not know your cpu usage. I do not know your operating system. I do not know how constrained your RAM is. My ubu-frr-ubu.txt is concrete. You originally asked for "Can somebody confirm this behavior" and I have ....denied that behavior on Windows 10 Pro. I have done all that I can do. |
Thanks for the input. All images are KVM/QEMU images and can be copied to and run on a Proxmox host as well. When doing so there is no such performance degradation and throughput is about 20Gbps for all attached hosts. There is obviously an issue somewhere. Whether that is with ubridge or GNS3 or whatever. However, if Promox running KVM/QEMU can forward packets at 20Gbps between images and GNS3 cannot when using the same technology, there is an issue there somewhere. My concern isn't running it on Windows with VMware or anything else. My GNS3 host is Ubuntu 22.04 running QEMU 6.2. I see the issue there. My Proxmox host is 7.2 also running QEMU 6.2. The same VM image was copied from my GNS3 environment into my Proxmox environment, connectivity is set up the same in both environments, and the GNS3 environment is not even close to the Proxmox environment performance with the same VM images, also running QEMU 6.2. |
I will need to investigate this more. I think the bottleneck is definitely uBridge here, this small program has no specific optimization (e.g. GNS3/ubridge#2) and runs in userland. It is basically copying packets from one socket to another within a thread representing one unidirectional connection. One major benefit of uBridge is it doesn't need special rights to create new connections (it uses UDP tunnels) and it can run on multiple platforms. Now that we are dropping Windows support for the GNS3 server, starting with version 3.0, we plan to replace uBridge by Linux bridges which are much faster and we allow us to implement some exciting features like advanced filtering etc. I bet this is what Proxmox uses in their backend. |
Hi, Thanks for the reply. Much appreciated. My testing on Proxmox was indeed done with Linux Bridges. I've also tested on OVS Bridges as well with very similar results. Both technologies allow me to forward packets between any host at about 20Gbps where with ubridge (I assume as I just created the topology with the GNS3 GUI on my local host) the best I got was 1Gbps between directly connected hosts and it only goes down from there the more hops you add in between hosts. Regards |
@grossmj FYI - I've been playing around with FRR and vxlan and think I've found Linux bridge doesn't have full acceleration as compared to ovs bridges. I haven't switched over to ovs bridge yet. I'll try to get something together on that so I can get a better idea of how much overhead linux bridge has vs ovs bridge. Ean Towne made a spreadsheet showing the effects of different nic drivers also. https://docs.google.com/spreadsheets/d/1lEY1P6xTwDePtR0eqRhtxUlrZ72qgIZ5nnrRvnjWeMQ/edit?usp=sharing |
Yeah, it makes better sense. Oooops, I thought gns3server.exe was calling ubridge.exe. So, my %processor in my graph is not extremely important. What is extremely important is GNS3 VM Cpu usage. But...I am not touching anything in my GNS3 VM. |
I tweeked my VMware workstation and consequently my speed throughput improved by 35% and my overall cpu usage on my host went up by 300%.. The graphs I created would have to be done with powershell, as far as I know. Furthermore, as a rule, I do not help people tweek their host machines. As far as you showing 50% drop on your Ubuntu host and I am showing 23% drop on my 12 year old Windows computer with the GNS3 VM, on VMware Workstation, there's nothing I can do about that. |
@josephmhiggins, I'm glad you are not seeing an issue on your Windows host running VMware. However, your environment is not comparable to what I have outlined above and does not fit the use case. Run the same tests on a Linux host (server) running Qemu 6.2 for a closer comparison. As @grossmj pointed out, starting with GNS3 v3 Windows will not be supported on the server side anymore and ubridge will (eventually) be replaced with Linux bridges, which will hopefully alleviate the issue I am seeing. At least in my testing on Proxmox with Linux bridges (and OVS bridges as well) it points to this conclusion. I'll wait patiently for the change to be made and retest when appropriate. Thanks for the discussion all, much appreciated. |
Unless I am mistaken, gns3 v3 will not support windows without the gns3 vm, but will support windows with the gns3 vm. my gns3 vm is ubuntu 20.04 Edit: Let me clarify. With the all VMs running on the GNS3 VM on Windows, the GNS3 VM 20.04 is directly comparable to a Ubuntu 20.04. (An Ubuntu 22.04 running KVM is, strictly by the book consider a level 2 hypervisor like the GNS3 VM and it is the same as VMware Workstation. People get very ticklish up differentiating level 1 and level 2 hypervisors. To turn Ubuntu KVM into a true level-1 hypervisor a person would have to do something, but I forget what they have to do.) In other words, the GNS3 VM Windows adds an unknown amount of overhead, but I do not think it is that much. |
@spikefishjohn Ethernet switches are simulated by Dynamips so it may reduce the throughput even more. Just for info I remember I made a document about network throughput in GNS3 a long time ago, probably in 2015, however I hadn't tested multiple topology hops. |
Thanks again for opening this issue. Using Linux bridges with VXLAN (for overlay network across multiple compute/hosts) is our current preferred solution to fix it. Similar to what is described in these pages https://programmer.help/blogs/practice-vxlan-under-linux.html and https://vincent.bernat.ch/en/blog/2017-vxlan-linux |
Yeah it is, to the point to where I was like WTF?!?! :) Like @grossmj said, if I take my VMs and move them out of GNS3 and into Proxmox and use Linux bridges for the connectivity between them the issue disappears entirely. Full speed to all hosts no matter how deep the hop count is. Hopefully this gets changed in 3.x. Right now it's pretty painful when you're 5 or 6 hops deep from one host to another and the app/feature/thing you are trying to test is performing terribly for no apparent reason. Again, thanks everybody. GNS3 is a great product. I've enjoyed it a lot over the years. This fix will make it even better. |
@grossmj Don't know if this helps, but I have 3 GNS3 server that i'm already using FRR to make a evpn l2 network. I'm using ansible for the deployment. This way if I add a bridge its added to all servers. I will say netplan is a big problem. I've had a lot of issues where I needed to create network configs using systemd. For example vxlan isn't supported in netplan. I based my lab more or less of this. https://vincent.bernat.ch/en/blog/2017-vxlan-bgp-evpn Only I don't have a route reflector. I'm also peering my vxlans interfaces from a loop interface and using OSPF to advertise the loop interfaces then MP-BGP peers across the loop IPs. Vxlan is a bit odd since it doesn't have any loop prevention aside from split horizon. My first attempt I made a massive loop and the very first multicast packet would loop for forever. Thats when I moved each host to use a single vxlan peer. Note I also don't have a 100gb switch which is why I have a triangle between the servers. OSPF, BGP, vxlan and bridge configs are all pushed from ansible. |
@spikefishjohn , I've found Netplan to be a big problem as well. So, I remove it from all my hosts and replace it with ifupdown-ng. Ifupdown-ng makes vxlan simple. Give it try. It's like the original ifupdown, but with all the "advanced" networking support built right into it in a way that is compatible with the original ifupdown. One caveat though, install ifupdown-ng first, and then uninstall netplan. Ask me how I know. :) |
@grossmj This solution seems much like the solution in openstack neutron. But I think we can make it a little easy, for the vms in the same physical server, the link can be achived by VLAN of linux bridge, and for the vms on different pysical server, the link can be achieved by VXLAN connected with linux bridge. |
Yeah, my icon for my VMCI driver went yellow in Windows 10 Device manager with an error code 31. I looked up this thing called VMCI and VMware is reporting in VMCI Socket Performance - 15 years ago - throughput of 29Gbps from unix vm to unix vm and 6Gbps from windows vm to windows vm. But that was 15 years ago, and it depends on tcp message size, etc. Ubridge is userland and there is a tremendous penalty for that and it depends on the model of the cpu and the processor speed, etc. And all this stuff has to work on different operating systems, etc. Off the top of my head, with no data to back it up, inter-vm communcation inside the GNS3 VM has to be at least 50Gbps for unix-to-unix because those tests were 15 years ago - but that does not include any hops. I only have a 1Gbps pipe so, 50Gbps does nothing for me. |
And be careful, many gns3 users run a gns3 server on a laptop. |
Depends on the quality of the laptop..... |
Hi,
I'm currently running GNS3 v2.2.33.1 on Ubuntu 22.04.
If I have a back-to-back directly connected topology like this:
host1 <-> host2
Using iperf3 between the two hosts I can achieve throughput close to 1Gbps, which is fine.
However, if I have a back-to-back-to-back topology like this:
host1 <-> host2 <-> host3
Using iperf3 between the two outermost host1 and host3 and using host2 as a transit I can achieve throughput of around 500Mbps, which is roughly half.
Additionally, if I have a back-to-back-to-back-to-back topology like this:
host1 <-> host2 <-> host3 <-> host4
Using iperf3 between the two outermost host1 and host4 and using host2 and host3 as a transit I can achieve throughput of around 250Mbps, which is roughly half again.
If I continue in this manner I can get down into the single digits for throughput between hosts, which is obviously not good.
This performance degradation has existed for many years over many different versions of GNS3. It doesn't matter the type of host I use, Windows, Linux, IOS, IOS XR, etc. They all experience the same type of throughput degradation the more hosts they traverse.
This is obviously some sort of issue with the underlying virtualization layer. Can somebody confirm this behavior and perhaps suggest a fix? I do realize this is a virtualized topology and I don't expect full throughput between devices, however, a 50% throughput hit for each traversed device is a bit excessive.
Any suggestions are welcome.
Thanks much.
The text was updated successfully, but these errors were encountered: