yggdrasil-network / yggdrasil-go

An experiment in scalable routing as an encrypted IPv6 overlay network

Home Page:https://yggdrasil-network.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

It seems that the priority is not working

ufm opened this issue · comments

We have three links with different priorities:

root@ygg5:/# yggdrasilctl getpeers|grep dada
tls://[fe80::38b7:e8ff:fe5b:3be7%25ens20]	Up   	Out	200:dada:feda:f443:52ec:d2e4:b853:60bc	3m22s 	 5kb 	93kb 	0 	-                                                    	
tls://[fe80::be24:11ff:fe78:89f5%25ens19]	Up   	Out	200:dada:feda:f443:52ec:d2e4:b853:60bc	3m22s 	15kb 	71kb 	3 	-                                                    	
tcp://193.111.115.215:54230              	Up   	In 	200:dada:feda:f443:52ec:d2e4:b853:60bc	3m22s 	62kb 	359kb	4 	-                                                    

Send traffic to this address:

root@ygg5:/# ping -s 10000 200:dada:feda:f443:52ec:d2e4:b853:60bc
PING 200:dada:feda:f443:52ec:d2e4:b853:60bc(200:dada:feda:f443:52ec:d2e4:b853:60bc) 10000 data bytes
10008 bytes from 200:dada:feda:f443:52ec:d2e4:b853:60bc: icmp_seq=1 ttl=64 time=7.09 ms
10008 bytes from 200:dada:feda:f443:52ec:d2e4:b853:60bc: icmp_seq=2 ttl=64 time=2.27 ms
10008 bytes from 200:dada:feda:f443:52ec:d2e4:b853:60bc: icmp_seq=3 ttl=64 time=1.65 ms
10008 bytes from 200:dada:feda:f443:52ec:d2e4:b853:60bc: icmp_seq=4 ttl=64 time=1.17 ms
10008 bytes from 200:dada:feda:f443:52ec:d2e4:b853:60bc: icmp_seq=5 ttl=64 time=1.10 ms
--- 200:dada:feda:f443:52ec:d2e4:b853:60bc ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4006ms
rtt min/avg/max/mdev = 1.100/2.655/7.088/2.255 ms

And now it's evident that the traffic was distributed across all three links:

root@ygg5:/# yggdrasilctl getpeers|grep dada
tls://[fe80::38b7:e8ff:fe5b:3be7%25ens20]	Up   	Out	200:dada:feda:f443:52ec:d2e4:b853:60bc	3m48s 	26kb 	93kb 	0 	-                                                    	
tls://[fe80::be24:11ff:fe78:89f5%25ens19]	Up   	Out	200:dada:feda:f443:52ec:d2e4:b853:60bc	3m48s 	25kb 	71kb 	3 	-                                                    	
tcp://193.111.115.215:54230              	Up   	In 	200:dada:feda:f443:52ec:d2e4:b853:60bc	3m48s 	83kb 	410kb	4 	-                                                    	

commented

Priority is for when there are multiple links to the same node, i.e. Ethernet and WiFi peerings, and you want to prefer one over the other.

It does not change routing decisions between different peers.

Hmm. These are three links between the same nodes.
Two - over ethernet (physically different interfaces)
One - over tcp.

commented

Also protocol traffic can and will be sent over all peerings, it should just be that actual application traffic is prioritised accordingly, so you may need to send more than just pings to see it in action / separate from the protocol traffic background noise.

root@ygg5:~# yggdrasilctl getpeers|grep dada
tls://[fe80::38b7:e8ff:fe5b:3be7%25ens20]	Up   	Out	200:dada:feda:f443:52ec:d2e4:b853:60bc	8h28m20s	 2mb  	67mb 	0 	-                                                      	
tls://[fe80::be24:11ff:fe78:89f5%25ens19]	Up   	Out	200:dada:feda:f443:52ec:d2e4:b853:60bc	8h28m20s	 2mb  	64mb 	3 	-                                                      	
tcp://193.111.115.215:54230              	Up   	In 	200:dada:feda:f443:52ec:d2e4:b853:60bc	8h28m20s	14mb  	390mb	4 	-                                                      	
root@ygg5:~# 
root@ygg5:~# iperf3 -c 200:dada:feda:f443:52ec:d2e4:b853:60bc
Connecting to host 200:dada:feda:f443:52ec:d2e4:b853:60bc, port 5201
[  5] local 22d:d3dd:3afe:9599:3da9:d89f:6ae:8401 port 41842 connected to 200:dada:feda:f443:52ec:d2e4:b853:60bc port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  58.4 MBytes   490 Mbits/sec  236    146 KBytes       
[  5]   1.00-2.00   sec  55.0 MBytes   461 Mbits/sec   81    634 KBytes       
[  5]   2.00-3.00   sec  67.5 MBytes   566 Mbits/sec   74    829 KBytes       
[  5]   3.00-4.00   sec  82.5 MBytes   692 Mbits/sec   64   1.05 MBytes       
[  5]   4.00-5.00   sec  75.0 MBytes   629 Mbits/sec   98    634 KBytes       
[  5]   5.00-6.00   sec  50.0 MBytes   419 Mbits/sec  103    488 KBytes       
[  5]   6.00-7.00   sec  47.5 MBytes   398 Mbits/sec   74    488 KBytes       
[  5]   7.00-8.00   sec  32.5 MBytes   273 Mbits/sec   41    488 KBytes       
[  5]   8.00-9.00   sec  42.5 MBytes   356 Mbits/sec   74    390 KBytes       
[  5]   9.00-10.00  sec  50.0 MBytes   419 Mbits/sec   68   97.5 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   561 MBytes   470 Mbits/sec  913             sender
[  5]   0.00-10.00  sec   560 MBytes   470 Mbits/sec                  receiver

iperf Done.
root@ygg5:~# yggdrasilctl getpeers|grep dada
tls://[fe80::38b7:e8ff:fe5b:3be7%25ens20]	Up   	Out	200:dada:feda:f443:52ec:d2e4:b853:60bc	8h28m44s	 3mb  	144mb	0 	-                                                      	
tls://[fe80::be24:11ff:fe78:89f5%25ens19]	Up   	Out	200:dada:feda:f443:52ec:d2e4:b853:60bc	8h28m44s	 3mb  	140mb	3 	-                                                      	
tcp://193.111.115.215:54230              	Up   	In 	200:dada:feda:f443:52ec:d2e4:b853:60bc	8h28m44s	16mb  	844mb	4 	-                                                      	

In my opinion, there is enough traffic to see that it is distributed among all peers regardless of priority.

P.S.The fact that the reverse traffic goes through some other path is a separate conversation. I always believed that having a direct link between two nodes is a sufficient guarantee for the traffic to always flow directly between these nodes.

P.S.The fact that the reverse traffic goes through some other path is a separate conversation. I always believed that having a direct link between two nodes is a sufficient guarantee for the traffic to always flow directly between these nodes.

That is concerning, as you're absolutely right, having a direct peer should be the one case where we can be certain that the network will avoid unnecessary intermediate hops. There was a bug in the early v0.5.X versions that could have prevented that in some cases, but I thought I managed to fix it -- maybe not.

If possible, would you provide the yggdrasil --version and any important info about how it was installed? (built from git, downloaded an "official" build from our github, installed a distro package, etc) I just want to make sure it's running the right thing before I get too carried away trying to reproduce/debug it on my end.

Both node:

Build name: yggdrasil
Build version: 0.5.4

Installed from

deb http://neilalexander.s3.dualstack.eu-west-2.amazonaws.com/deb/ debian yggdrasil

One node - ubunty jammy. Other - ubuntu focal
In local (broadcast) mesh - 15 nodes.

One of the nodes is the current root of our long-suffering network and public peer, so it's better not to shake it too much. :)

One of the nodes is the current root of our long-suffering network and public peer, so it's better not to shake it too much. :)

Ah, that's useful to know! I haven't been able to reproduce it in local tests, but it's not too hard to believe that there could be an edge case where things stop working right for the root node specifically, so I guess I should test from an isolated network.

Perhaps there are issues with link priorities? Although @neilalexander claims that everything is working correctly, it's evident that it's not.
If you need my configurations, let me know, and I'll provide them to you (I can even include the keys, but not here. We can do it in Matrix, for example).

commented

Although @neilalexander claims that everything is working correctly, it's evident that it's not.

Not claiming that everything is working correctly for sure, but at the moment I can't reproduce what you are seeing. I have three peerings to the same node with different priorities and traffic is definitely taking the path that I expect (the one that is the lowest priority and, in case two links have the same priority, the highest uptime).

Can you please confirm that getPeers on both sides show the same priority values for each peering?

First side:

root@yds:/home/ufm# yggdrasilctl getpeers|grep 22d
tls://[fe80::be24:11ff:fe47:857b%25ens20]	Up   	In 	22d:d3dd:3afe:9599:3da9:d89f:6ae:8401 	154h37m57s	292mb	13mb 	0 	-         	
tls://[fe80::6487:beff:fecb:9145%25ens22]	Up   	In 	22d:d3dd:3afe:9599:3da9:d89f:6ae:8401 	154h37m57s	147mb	13mb 	3 	-         	
tcp://193.93.119.42:14244                	Up   	Out	22d:d3dd:3afe:9599:3da9:d89f:6ae:8401 	154h37m57s	883mb	71mb 	4 	-         	

Second side:

root@ygg5:/home/ufm# yggdrasilctl getpeers|grep dada
tls://[fe80::38b7:e8ff:fe5b:3be7%25ens20]	Up   	Out	200:dada:feda:f443:52ec:d2e4:b853:60bc	154h38m49s	13mb  	292mb	0 	-                                                      	
tls://[fe80::be24:11ff:fe78:89f5%25ens19]	Up   	Out	200:dada:feda:f443:52ec:d2e4:b853:60bc	154h38m49s	13mb  	147mb	3 	-                                                      	
tcp://193.111.115.215:54230              	Up   	In 	200:dada:feda:f443:52ec:d2e4:b853:60bc	154h38m49s	71mb  	883mb	4 	-                                                      	

interface ens20 on the first side equal ens20 on the second.
interface ens22 on the first side equal ens19 on the second.
TCP is TCP.
As you can see, everything matches. The transmitted/received byte count, uptime and priority.

Just in case, I'll repeat: the second side is the root of the tree. Perhaps the issue is indeed related to this?

commented

Thanks for the confirmation. I've been looking over the code and not really sure how this can happen, unless it's not actually session traffic but protocol traffic instead. The order of evaluation when selecting the next-hop is:

  1. Shortest tree distance
  2. If distance equal, lower key
  3. If key equal, lower priority
  4. If priority equal, higher uptime

... which makes me think that something is wrong in the first two steps. Being root might have something to do with it, I'll keep digging.

Could both issues be related? The fact that the peer with the lowest priority isn't selected and that the reverse traffic is taking a different path despite the existence of a direct connection.

I made some changes to the configuration and re-ran the tests. I removed the root role from my node (moved it to a separate instance connected only to my node), and I have a few updates.

  1. @Arceliar Regarding non-symmetric traffic, it seems to have been my mistake. I forgot that iperf3, by default, doesn't perform a bidirectional test.
  2. The fact that a node is the root does not affect the selection of the link priority.
  3. The priority is chosen incorrectly.

Here is how the test results look now:

root@ygg5:~# yggdrasilctl getpeers|grep dada
tls://[fe80::38b7:e8ff:fe5b:3be7%25ens20]	Up   	In 	200:dada:feda:f443:52ec:d2e4:b853:60bc	27m31s	 5gb  	 5gb 	0 	-                                                   	
tls://[fe80::be24:11ff:fe78:89f5%25ens19]	Up   	In 	200:dada:feda:f443:52ec:d2e4:b853:60bc	27m31s	31gb  	32gb 	3 	-                                                   	
tcp://193.111.115.215:53354              	Up   	In 	200:dada:feda:f443:52ec:d2e4:b853:60bc	27m29s	 6kb  	 9kb 	4 	-                                         	

@neilalexander It's evident that the link with priority 3 is being selected. However, some traffic still goes through the link with priority 0, but at around 20% (I hope I didn't make a mistake here, and priority 0 is higher than priority 3, right?)

commented

OK, I think we've narrowed down the problem and I've pushed a new yggdrasil-develop package build to my S3 repo that contains the fix. If you don't mind testing, the fixed version is 0.5.4-5.

I think the reason I was unable to reproduce it was that it was predicated on both the priority value and the order of connections, I guess in my environment they matched up but in yours they didn't.

It seems that in 0.5.4-5-g5da4c11 this issue is not present. Thank you very much!

Looks like it's fixed.

commented

Thanks for testing & confirming!