openssl / openssl

TLS/SSL and crypto library

Home Page:https://www.openssl.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

memory leak SSL_connect 3.2.x

jay98234 opened this issue · comments

I have a client application calling SSL_get_peer_certificate() on an SSL / open connection towards a server, then X509_free() directly after without doing anything.
Valgrind says there is a 128byte memory leak in SSL_get_peer_certificate().
Running the same application with OpenSSL 3.1.x doesnt give me this error.
Is Valgrind correct ?

==954423== 208 (128 direct, 80 indirect) bytes in 1 blocks are definitely lost in loss record 1,038 of 1,344
==954423== at 0x484279B: malloc (vg_replace_malloc.c:381)
==954423== by 0x4BB95C0: CRYPTO_malloc (in /usr/app/openssl/3.2.2/lib/libcrypto.so.3)
==954423== by 0x4BB96F8: CRYPTO_zalloc (in /usr/app/openssl/3.2.2/lib/libcrypto.so.3)
==954423== by 0x4A87798: BIO_new_ex (in /usr/app/openssl/3.2.2/lib/libcrypto.so.3)
==954423== by 0x4919644: ssl_set_new_record_layer (in /usr/app/openssl/3.2.2/lib/libssl.so.3)
==954423== by 0x48EAD5B: tls13_change_cipher_state (in /usr/app/openssl/3.2.2/lib/libssl.so.3)
==954423== by 0x49347D8: tls_process_server_hello (in /usr/app/openssl/3.2.2/lib/libssl.so.3)
==954423== by 0x49311C9: state_machine (in /usr/app/openssl/3.2.2/lib/libssl.so.3)
==954423== by 0x487CF5C: MyApplication::get_peer_certificate() (in myApplication)

Tested on 3.2.1 and 3.2.2

Your description doesn't seem to match the stack trace you've provided. The stack trace you've provided seems to indicate that an allocated BIO struct has leaked during the creation of a new ssl record layer. As to how that may have happened is unknown.

Questions:

  1. Is there some other data that makes you think this leak might be in SSL_get_peer_certificate?

  2. Can you condense this down into a reproducer that we can use to further analyze the problem here?

I'm not able to reproduce the leak. I use a modified sslecho client found in demos. I've added a call to SSL_get1_peer_cert() right after successful call to SSL_connect():

369         /* Now do SSL connect with server */
370         if (SSL_connect(ssl) == 1) {
371             X509 *x509;
372
373             x509 = SSL_get1_peer_certificate(ssl);
374             if (x509 == NULL)
375                 printf("could get certificate\n");
376             else
377                 X509_free(x509);

The program does nothing else then obtains a certificate and then shuts down the SSL connection and exits. no memory leak reported by valgrind for any version (master, 3.3, 3.2, 3.1) This is version 3.3:

sashan@debian-12:~/work.openssl/sslecho$ openssl.env 3.3
sashan@debian-12:~/work.openssl/sslecho$ ldd ./sslecho 
        linux-vdso.so.1 (0x00007ffc0efac000)
        libssl.so.3 => /home/sashan/work.openssl/openssl.binaries/openssl-3.3/lib64/libssl.so.3 (0x00007fb71625b000)
        libcrypto.so.3 => /home/sashan/work.openssl/openssl.binaries/openssl-3.3/lib64/libcrypto.so.3 (0x00007fb715c00000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb715a1f000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fb716361000)
sashan@debian-12:~/work.openssl/sslecho$ valgrind --leak-check=full --show-leak-kinds=all ./sslecho c 127.0.0.1 
==111483== Memcheck, a memory error detector
==111483== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==111483== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==111483== Command: ./sslecho c 127.0.0.1
==111483== 

sslecho : Simple Echo Client/Server : Jun  6 2024 : 05:56:50

We are the client

TCP connection to server successful
SSL connection to server successful

Client exiting...
sslecho exiting
==111483== 
==111483== HEAP SUMMARY:
==111483==     in use at exit: 0 bytes in 0 blocks
==111483==   total heap usage: 11,177 allocs, 11,177 frees, 1,359,459 bytes allocated
==111483== 
==111483== All heap blocks were freed -- no leaks are possible
==111483== 
==111483== For lists of detected and suppressed errors, rerun with: -s
==111483== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Can you share more details about your application? are you using blocking or non-blocking socket? also are there parameters set on socket? also what kind of SS-server does your client attempt to connect to? Does server run OpenSSL too?

Sorry, yes there was a bit of extra work being done in between I didnt notice.
The first SSL_get_peer_certificate() call fail due to SSL_ERROR_WANT_READ. Socket is non-blocking.
SSL_connect is therefore called returning 1, then back to SSL_get_peer_certificate() + X509_free

So the valgrind error is caused by the SSL_connect..
The SSL object is freed with SSL_shutdown + SSL_free

My (linux x64) application dynamically links to openssl 3.2.x, and when I switched .so-files to 3.1.x Valgrind didnt complain about this.

So the client is non-blocking? Or you did not call SSL_connect() at all before SSL_get_peer_certificate()?

The client is non-blocking, yes.
SSL_get_peer_certificate is called first, returning NULL.
Then SSL_connect
Then SSL_get_peer_certificate returning the X509

I wrote a small program to reproduce this.
With the code below:

Valgrind output with openssl 3.1.4
<<< All heap blocks were freed -- no leaks are possible

Valgrind output with openssl 3.2.1 and 3.2.2
<<< definitely lost: 512 bytes in 4 blocks
<<< indirectly lost: 8,512 bytes in 10 blocks

c++ -o myapp myapp.cpp -lssl -lcrypto

int main(int argc, char **argv)
{
	if (argc < 3)
	{
		printf("Need host and port\n");
		return -1;
	}
	SSL_CTX *ctx = SSL_CTX_new (TLS_method());
	SSL_CTX_load_verify_locations(ctx, "/usr/app/jlinux/current/etc/cacerts.pem", NULL);
	SSL_CTX_set_verify(ctx, SSL_VERIFY_NONE, NULL);
	SSL_CTX_set_session_cache_mode(ctx, SSL_SESS_CACHE_OFF);
	SSL_CTX_set_options(ctx, SSL_OP_SINGLE_DH_USE);
	SSL_CTX_set_options(ctx, SSL_OP_ENABLE_KTLS);
	SSL_CTX_set_mode(ctx, SSL_MODE_ENABLE_PARTIAL_WRITE | SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER);
	SSL_CTX_set_cipher_list(ctx, "HIGH:!LOW:!MEDIUM:!aNULL:!MD5:!DSS:!SHA1:!kRSA");
	SSL_CTX_set_ecdh_auto(ctx, 1);


	addrinfo addrHints
		, *addr = NULL;

	memset(&addrHints, 0, sizeof(addrinfo));
	addrHints.ai_family = PF_UNSPEC;
	addrHints.ai_socktype = SOCK_STREAM;
	addrHints.ai_flags = AI_PASSIVE;
	int err = getaddrinfo(argv[1], argv[2], &addrHints, &addr);
	if (err)
	{
		printf(gai_strerror(err));
		return -1;
	}
	int sock = socket(addr->ai_family, addr->ai_socktype | SOCK_CLOEXEC | SOCK_NONBLOCK, addr->ai_protocol);
	if (sock == -1)
	{
		printf("Socket failed\n");
		return -1;
	}
	int yes = 1;
	setsockopt(sock, IPPROTO_TCP, TCP_FASTOPEN_CONNECT, &yes, sizeof(yes));
	ioctl(sock, FIONBIO, &yes);

	if (connect(sock, (const sockaddr*) addr->ai_addr, addr->ai_addrlen))
	{
		if (errno == EINPROGRESS)
		{
			struct pollfd pfd;
			memset(&pfd, 0, sizeof(struct pollfd));
			pfd.fd = sock;
			pfd.events = POLLOUT;
			if (poll(&pfd, 1, 10000) == 1
				&& !(pfd.revents & POLLHUP)
				&& !(pfd.revents & POLLERR)
				)
			{
				printf("Connected!\n");
			}
			else
			{
				printf("Connect failed\n");
				return -1;
			}

		}
		else
		{
			printf("Socket connect failed\n");
			close(sock);
			return -1;
		}
	}
	else
		printf("Connected\n");
	freeaddrinfo(addr);


	SSL *ssl = SSL_new(ctx);
	SSL_set_fd(ssl, sock);
	SSL_set_connect_state(ssl);
	SSL_set_tlsext_host_name(ssl, argv[1]);

	X509 *cert = SSL_get_peer_certificate(ssl);
	if (!cert)
	{
		int err(0);
		for (int i(0); 10000 > i; ++i)
		{
			if ((err = SSL_connect(ssl)) == 1)
			{
				cert = SSL_get_peer_certificate(ssl);
				break;
			}
			else
			{
				printf("SSL_connect temporarily failed.. sleeping\n");
				sleep(1);
			}
		}
	}
	if (cert)
	{
		printf("Got peer certificate found\n");
		X509_free(cert);
	}

	SSL_shutdown(ssl);
	SSL_free(ssl); 
	close(sock);

	SSL_CTX_free(ctx);
	return 0;
}

@jay98234 thank you for reproducer. I gave it a try on openssl 3.4-dev and openssl 3.3 too. unfortunately I was not lucky. I did use valgrind on debian-12. I did use server running on 127.0.0.1:4433 and also openssl.org:443. Using openssl.org makes code to run through the retry loop. Here is an output on terminal:

sashan@debian-12:~/work.openssl/memleak$ valgrind --leak-check=full --show-leak-kinds=all ./main openssl.org 443
==112987== Memcheck, a memory error detector
==112987== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==112987== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==112987== Command: ./main openssl.org 443
==112987== 
Connected
SSL_connect temporarily failed.. sleeping
Got peer certificate found
==112987== 
==112987== HEAP SUMMARY:
==112987==     in use at exit: 0 bytes in 0 blocks
==112987==   total heap usage: 10,013 allocs, 10,013 frees, 1,317,368 bytes allocated
==112987== 
==112987== All heap blocks were freed -- no leaks are possible
==112987== 
==112987== For lists of detected and suppressed errors, rerun with: -s
==112987== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
sashan@debian-12:~/work.openssl/memleak$ valgrind --leak-check=full --show-leak-kinds=all ./main 127.0.0.1 4433 
==112988== Memcheck, a memory error detector
==112988== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==112988== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==112988== Command: ./main 127.0.0.1 4433
==112988== 
Connected!
Got peer certificate found
==112988== 
==112988== HEAP SUMMARY:
==112988==     in use at exit: 0 bytes in 0 blocks
==112988==   total heap usage: 9,780 allocs, 9,780 frees, 1,200,363 bytes allocated
==112988== 
==112988== All heap blocks were freed -- no leaks are possible
==112988== 
==112988== For lists of detected and suppressed errors, rerun with: -s
==112988== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

no leaks found in both attempts. Here is a diff I've applied to get your unit test compiled

diff --git a/Makefile b/Makefile
new file mode 100644
index 0000000..b7a4bc1
--- /dev/null
+++ b/Makefile
@@ -0,0 +1,2 @@
+main: main.c
+	$(CC) -g -O0 -o main main.c -lssl -lcrypto
diff --git a/main.c b/main.c
index 06acdea..2cb3a3c 100644
--- a/main.c
+++ b/main.c
@@ -1,3 +1,16 @@
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <netdb.h>
+#include <errno.h>
+#include <poll.h>
+#include <unistd.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+#include <netinet/in.h>
+#include <netinet/tcp.h>
+
+#include <openssl/ssl.h>
+
 int main(int argc, char **argv)
 {
 	if (argc < 3)
@@ -16,10 +29,10 @@ int main(int argc, char **argv)
 	SSL_CTX_set_ecdh_auto(ctx, 1);
 
 
-	addrinfo addrHints
+	struct addrinfo addrHints
 		, *addr = NULL;
 
-	memset(&addrHints, 0, sizeof(addrinfo));
+	memset(&addrHints, 0, sizeof(struct addrinfo));
 	addrHints.ai_family = PF_UNSPEC;
 	addrHints.ai_socktype = SOCK_STREAM;
 	addrHints.ai_flags = AI_PASSIVE;
@@ -27,19 +40,19 @@ int main(int argc, char **argv)
 	if (err)
 	{
 		printf(gai_strerror(err));
-		return -1;
+		goto error;;
 	}
 	int sock = socket(addr->ai_family, addr->ai_socktype | SOCK_CLOEXEC | SOCK_NONBLOCK, addr->ai_protocol);
 	if (sock == -1)
 	{
 		printf("Socket failed\n");
-		return -1;
+		goto error;
 	}
 	int yes = 1;
 	setsockopt(sock, IPPROTO_TCP, TCP_FASTOPEN_CONNECT, &yes, sizeof(yes));
 	ioctl(sock, FIONBIO, &yes);
 
-	if (connect(sock, (const sockaddr*) addr->ai_addr, addr->ai_addrlen))
+	if (connect(sock, (const struct sockaddr*) addr->ai_addr, addr->ai_addrlen))
 	{
 		if (errno == EINPROGRESS)
 		{
@@ -57,7 +70,8 @@ int main(int argc, char **argv)
 			else
 			{
 				printf("Connect failed\n");
-				return -1;
+				freeaddrinfo(addr);
+				goto error;
 			}
 
 		}
@@ -65,7 +79,8 @@ int main(int argc, char **argv)
 		{
 			printf("Socket connect failed\n");
 			close(sock);
-			return -1;
+			freeaddrinfo(addr);
+			goto error;
 		}
 	}
 	else
@@ -81,8 +96,8 @@ int main(int argc, char **argv)
 	X509 *cert = SSL_get_peer_certificate(ssl);
 	if (!cert)
 	{
-		int err(0);
-		for (int i(0); 10000 > i; ++i)
+		int err = 0;
+		for (int i = 0; 10000 > i; ++i)
 		{
 			if ((err = SSL_connect(ssl)) == 1)
 			{
@@ -104,6 +119,7 @@ int main(int argc, char **argv)
 
 	SSL_shutdown(ssl);
 	SSL_free(ssl); 
+error:
 	close(sock);
 
 	SSL_CTX_free(ctx);

the list of changes is as follows:

  • include header files
  • fix potential memory leaks in error paths
  • adjust code so it is possible to compile it with cc

Thats odd. Is your kernel KTLS-enabled ? (mine is)
Im on 6.6, but got the same error on 5.10

Not sure if it matters, but Im also having fastopen enabled

Interesting. I tried connecting to the same hostname multiple times.
I only get the leak error ~90%.
Other hostnames, like cnn.com, do not give leak errors. TLS 1.3 specific issue ?
You can try hostname 24sevenoffice.com as I get mostly leak errors from it.

I'm using kernel which comes with debian-12. the TCP_FASTOPEN_CONNECT might be a contributing factor. I'm using your code which enables fast open at client. I've briefly read through fast open RFC 7413 and fast open must be also supported at server. Can you trigger the leak without TCP_FASTOPEN_CONNECT on socket?

Yup there we go. This is tcp fastopen related :)
No errors when I comment out TCP_FASTOPEN_CONNECT

It seems this is not fastopen related afterall, but it is easier to reproduce it with fastopen.

The leak is caused by using SSL_OP_ENABLE_KTLS and a kernel supporting ktls.
When I comment out SSL_OP_ENABLE_KTLS I never get valgrind errors

I have run the tests many times now and am confident that it is caused by ktls.
I believe you can reproduce this on any kernel version with ktls support.

Thanks for clarification. I will enable ktls on my test system.