zerotier / libzt

Encrypted P2P sockets over ZeroTier

Home Page:https://zerotier.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

malloc/free issues in latest release on linux

janvanbouwel opened this issue · comments

I've been trying to integrate libzt in a project, but I have issues when trying to use the latest versions.
Version 1.8.4 seems to work fine. Version 1.8.10 crashes the moment I try to do anything network related, like pinging the program or creating a socket. Very rarely I am able to ping it, in which case it works for at most a minute before still crashing. The message I get is one of the following two. (free() has also been mentioned, but not as often, and I don't remember the exact message.)

malloc(): invalid size (unsorted)
malloc(): corrupted top size

This happens in exactly the same way with both Rust and Java bindings.
My system is
Linux 5.15.0-58-generic #64-Ubuntu SMP x86_64 x86_64 x86_64 GNU/Linux, running Ubuntu 22.04.1 LTS.

The breaking commit appears to be 0bf302a581f641b1ac5a1915e4cc59cde4ae8cdb, which updates the ZTO submodule to v1.8.10. Before that it works. Later commits and later versions don't fix the problem.

For now I can continue using v1.8.4, but I was considering contributing and that's difficult if I can't try the latest version.

I am having the same trouble on linux armv7l infrastructure with java code that was working previously on 1.8.4. For now, neither of 1.8.4 and 1.8.10 is working for me and the moment I send listener.connect from one of the clients the server crashes with one of the malloc() errors reported above. I am not sure if the submodules from 1.8.10 were used on 1.8.4 by mistake when I switched the branch. I will try a fresh build using 1.8.4 and report my results. Thanks for reporting this!

I did a git bisect on the ZTO submodule.
(Found the free error again, and another one:

double free or corruption (!prev)
corrupted size vs. prev_size while consolidating

)

In some revisions the errors seemed to be less consistent (say 1 in 2 instead of 5 in 6 runs), but the first breaking commit appears to be zerotier/ZeroTierOne@e9375b5.

I don't have time right now to look at it more, but it's a start.

I confirmed today that using the v1.8.4 submodule fixed the issue for me which is weird because v1.8.10 was working for me in November which seems to be after the submodule was updated to v1.8.10 last May. Hoping there are no breaking fixes to the java binding since then which could surface later on for me...

Promising lead:

The aforementioned commit zerotier/ZeroTierOne@e9375b5 sets

https://github.com/zerotier/ZeroTierOne/blob/e9375b50b0b7228f77dc8ef3d99d1021f89675ff/include/ZeroTierOne.h#L166

#define ZT_MAX_PEER_NETWORK_PATHS 64

while in libzt there is:

#define ZTS_MAX_PEER_NETWORK_PATHS 16

This seems to be a discrepancy (although they have a slightly different name). I'll test in about 2 hours whether setting the latter to 64 fixes the issue.

Good news, not tested super extensively but this seems to fix it, at least for Java!

For Rust I assume the same change has to be made, in:

#define ZTS_MAX_PEER_NETWORK_PATHS 16

For Java, rather confusingly, this is also included in

public static int ZTS_MAX_PEER_NETWORK_PATHS = 16;

but the only reference to it is
public ZeroTierSocketAddress[] paths = new ZeroTierSocketAddress[ZeroTierNative.ZTS_MAX_PEER_NETWORK_PATHS];

and this class doesn't seem to be used.


I have found three other inconsistencies in ZeroTierSockets.h

The first two:

/**
* Maximum number of pushed routes on a network
*/
#define ZTS_MAX_NETWORK_ROUTES 32
/**
* Maximum number of statically assigned IP addresses per network endpoint
* using ZT address management (not DHCP)
*/
#define ZTS_MAX_ASSIGNED_ADDRESSES 16

In zerotier/ZeroTierOne@440568a these were changed to 128 and 32 respectively.

And

#define ZTS_MAX_MULTICAST_SUBSCRIPTIONS 1024

which is set to 4096 in ZeroTierOne, although this one has been there since it was introduced, and may be intentional.

As far as I'm aware I've not run into issues with these, but they could form a problem.


I'm gonna test a bit more (and I'll try to get to the Rust version as well), and if it works I'll set up a PR.

It does seem strange to me that these constants are redefined and the ones in https://github.com/zerotier/ZeroTierOne are not used, but I'm not that skilled at C to know the reasoning behind it. Maybe a project maintainer could shine some light on this?

which is weird because v1.8.10 was working for me in November

As it is a memory issue, maybe your network topology changed?

Yeah these constants should match. The fact that they currently don't is just that I must have forgotten to update them in this project when we bumped up the number of allowed paths in ZeroTierOne. I'll see about either updating them or just using the regular ZeroTierOne constants.

The reasoning was that we wanted to use a smaller subset of the definitions from ZeroTierOne.h since may of them shouldn't be exposed alongside libzt's API

I'm going to close this issue since your PR addresses this. Feel free to open again if the issue persists.