intel / ehsm

An End-to-End Distributed and Scalable Cloud KMS (Key Management System) built on top of Intel SGX enclave-based HSM (Hardware Security Module), aka eHSM.

Home Page:https://community.intel.com/t5/Blogs/Tech-Innovation/open-intel/An-Intel-SGX-based-Hardware-Security-Module-backed-Key/post/1360130?wapkw=eHSM

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bugs found in ehsm

LeoneChen opened this issue · comments

Hello~

Heap OOB

enclave_decrypt defined in EDL set cmk_size as cmk's size, but attacker can feed cmk_size smaller then sizeof(ehsm_keyblob_t), and TBridge only malloc cmk_size for it.

public sgx_status_t enclave_decrypt([in, size=cmk_size] ehsm_keyblob_t* cmk, size_t cmk_size,
                            [in, size=aad_size] ehsm_data_t *aad, size_t aad_size,
                            [in, size=ciphertext_size] ehsm_data_t *ciphertext, size_t ciphertext_size,
                            [in, out, size=plaintext_size] ehsm_data_t *plaintext, size_t plaintext_size);

Then in function enclave_decrypt, cmk->keybloblen will out-of-bound access invalid memory, e.g. cmk->keybloblen is at offset 0x30, but cmk_size is fed with 0x20.

sgx_status_t enclave_decrypt(ehsm_keyblob_t *cmk, size_t cmk_size,
                             ehsm_data_t *aad, size_t aad_size,
                             ehsm_data_t *ciphertext, size_t ciphertext_size,
                             ehsm_data_t *plaintext, size_t plaintext_size)
{
    sgx_status_t ret = SGX_ERROR_UNEXPECTED;

    if (cmk == NULL ||
        cmk_size != APPEND_SIZE_TO_KEYBLOB_T(cmk->keybloblen) ||
        cmk->keybloblen == 0 ||
        cmk->metadata.origin != EH_INTERNAL_KEY)
        return SGX_ERROR_INVALID_PARAMETER;
    ...
}

Stack OOB

static bool rsa_decryption_with_pkcs_oaep(map<string, string> test_vector)
{
    uint8_t _plaintext[VECTOR_LENGTH("plaintext")] = {0};

    RSA_private_decrypt(RSA_size(key), &*ciphertext, _plaintext, key, 4);
}

_plaintext is a stack buffer with size VECTOR_LENGTH("plaintext"), however openssl ask at least RSA_size(rsa) - 42 (RSA_PKCS1_OAEP_PADDING mode) for to buffer(i.e. _plaintext). (See doc.)

RSA_private_decrypt() decrypts the flen bytes at from using the private key rsa and stores the plaintext in to. flen should be equal to RSA_size(rsa) but may be smaller, when leading zero bytes are in the ciphertext. Those are not important and may be removed, but RSA_public_encrypt() does not do that. to must point to a memory section large enough to hold the maximal possible decrypted data (which is equal to RSA_size(rsa) for RSA_NO_PADDING, RSA_size(rsa) - 11 for the PKCS #1 v1.5 based padding modes and RSA_size(rsa) - 42 for RSA_PKCS1_OAEP_PADDING). padding is the padding mode that was used to encrypt the data. to and from may overlap.

In case 1 in aes_gcm_crypto_test_vectors, VECTOR_LENGTH("plaintext") is 28 while RSA_size(rsa) is 128, then 28 < 128 - 42

1685647456963

Thus, in RSA_private_decrypt, a stack OOB has occurred.

Tips: uint8_t _plaintext[VECTOR_LENGTH("plaintext")] = {0} is unsafe since variable-sized object may not be initialized, better to memset to 0 for _plaintext, like memset(_plaintext, 0, VECTOR_LENGTH("plaintext"));

Thanks for pointing the issues.

for the 2nd one, we will take a look and fix it soon. (also welcome for your contribution if you want to..)

for the 1st one, could you provide more details, since i'm not quite understand your point. Because it will return error directly when it hits the below condition: cmk_size != APPEND_SIZE_TO_KEYBLOB_T(cmk->keybloblen)

Above EDL rule in enclave_hsm.edl for enclave_decrypt will create funtion sgx_enclave_decrypt in enclave_hsm_t.c, sgx_enclave_decrypt is a wrapper in TBridge called by SGXSDK's trts_ecall (with a function lookup operation), and sgx_enclave_decrypt will call real enclave_decrypt.

If you specify size of cmk with [in, size=cmk_size] ehsm_keyblob_t* cmk, TBridge will malloc size of ms->ms_cmk_size for it, and followed by a memory clone operation to ensure [in] pointer is resided in Enclave. Here, _in_cmk is the cmk for real enclave_decrypt, and malloc-ed size is cmk_size. More detail you can refer to SGXSDK's manual.

static sgx_status_t SGX_CDECL sgx_enclave_decrypt(void* pms)
{
	...
	ehsm_keyblob_t* _tmp_cmk = ms->ms_cmk;
	size_t _tmp_cmk_size = ms->ms_cmk_size;
	size_t _len_cmk = _tmp_cmk_size;
	...

	if (_tmp_cmk != NULL && _len_cmk != 0) {
		_in_cmk = (ehsm_keyblob_t*)malloc(_len_cmk);
		if (_in_cmk == NULL) {
			status = SGX_ERROR_OUT_OF_MEMORY;
			goto err;
		}

		if (memcpy_s(_in_cmk, _len_cmk, _tmp_cmk, _len_cmk)) {
			status = SGX_ERROR_UNEXPECTED;
			goto err;
		}

	}
	...

	ms->ms_retval = enclave_decrypt(_in_cmk, _tmp_cmk_size, _in_aad, _tmp_aad_size, _in_ciphertext, _tmp_ciphertext_size, _in_plaintext, _tmp_plaintext_size);
	...
}

I think if you want to specify ehsm_keyblob_t *cmk only point to one ehsm_keyblob_t object, you can only set [in] EDL attibute, if you want to transfer more then one element, you can set [in, count=xxx], this will malloc count * sizeof(ehsm_keyblob_t) for cmk

for the 1st one, could you provide more details, since i'm not quite understand your point. Because it will return error directly when it hits the below condition: cmk_size != APPEND_SIZE_TO_KEYBLOB_T(cmk->keybloblen)

About this, before APPEND_SIZE_TO_KEYBLOB_T is executed, cmk will firstly index it's member variable keybloblen, and at this point, cmk may be malloc-ed with insufficient heap memory, and then cmk->keybloblen will cause Out-of-Bound access

for the 2nd one, we will take a look and fix it soon. (also welcome for your contribution if you want to..)

OK, I'll try to fix

for the 1st one, could you provide more details, since i'm not quite understand your point. Because it will return error directly when it hits the below condition: cmk_size != APPEND_SIZE_TO_KEYBLOB_T(cmk->keybloblen)

About this, before APPEND_SIZE_TO_KEYBLOB_T is executed, cmk will firstly index it's member variable keybloblen, and at this point, cmk may be malloc-ed with insufficient heap memory, and then cmk->keybloblen will cause Out-of-Bound access

Yes, Thanks.
It seems is a common issue for all of the interfaces defined in the EDL file. Currently, we transferred a data structure but the size is used another variable rather than the size of original structure.

But for your suggestion "you can set [in, count=xxx], this will malloc count * sizeof(ehsm_keyblob_t) for cmk", it may not suitable for this case, because the buffer appended to the structure is not an integer multiple of an existing structure.

How do you think if we add more checks in each interfaces to check whether the in_size is larger than the size of structure before doing the existing checks? or do you have some better idea to handle it?

Yes, I find [in, count=xxx] is not suitable for this case. Since type of cmk that ehsm_keyblob_t has a variable length array member

typedef struct {
    ehsm_keymetadata_t  metadata;
    uint32_t            keybloblen;
    uint8_t             keyblob[0];
} ehsm_keyblob_t;

Maybe we can pass another parameter to specify len of keyblob, and add check at enclave whether "keyblob_len * sizeof(uint8_t) + sizeof(ehsm_keyblob_t) == cmk_size" and then "cmk->keybloblen == keyblob_len", but it seems ugly.

I create a PR #267 for above stack OOB

Oops, lot's of interfaces in EDL like enclave_sign has same problem as enclave_decrypt, better to figure out a general way to fix them

  • enclave_create_key
  • enclave_get_public_key
  • enclave_encrypt
  • enclave_decrypt
  • enclave_asymmetric_encrypt
  • enclave_asymmetric_decrypt
  • enclave_sign
  • enclave_verify
  • enclave_generate_datakey
  • enclave_export_datakey

p_sgx_quote->report_body.mr_signer.m[i] and p_sgx_quote->report_body.mr_enclave.m[i] can overflow quote

sgx_status_t enclave_verify_quote_policy(uint8_t *quote, uint32_t quote_size,
const char *mr_signer_good, uint32_t mr_signer_good_size,
const char *mr_enclave_good, uint32_t mr_enclave_good_size)
{
if (quote == NULL || mr_signer_good == NULL || mr_enclave_good == NULL)
{
log_d("quote or mr_signer_good or mr_enclave_good is null");
return SGX_ERROR_INVALID_PARAMETER;
}
string mr_signer_str;
string mr_enclave_str;
char mr_signer_temp[3] = {0};
char mr_enclave_temp[3] = {0};
sgx_quote3_t *p_sgx_quote = (sgx_quote3_t *)quote;
for (int i = 0; i < SGX_HASH_SIZE; i++)
{
snprintf(mr_signer_temp, sizeof(mr_signer_temp), "%02x", p_sgx_quote->report_body.mr_signer.m[i]);
snprintf(mr_enclave_temp, sizeof(mr_enclave_temp), "%02x", p_sgx_quote->report_body.mr_enclave.m[i]);

Yes, it's a common issue now.

A quick fix is to add a check at the beginning of the original checks, to make sure the size is larger than the expected structure size in each interfaces.

Forget to set return value

You seems fortget to set ret to -1 at line 92, it's already failed, and fiforesponse will not set to an valid memroy

goto CLEAN;

ehsm/core/App/fifo.cpp

Lines 61 to 144 in 7a54667

int client_send_receive(FIFO_MSG *fiforequest, size_t fiforequest_size, FIFO_MSG **fiforesponse, size_t *fiforesponse_size)
{
int retry_count = 10;
int ret = 0;
long byte_num;
char recv_msg[BUFFER_SIZE + 1] = {0};
FIFO_MSG *response = NULL;
struct sockaddr_un server_addr = {0};
int server_sock_fd = socket(PF_UNIX, SOCK_STREAM, 0);
if (server_sock_fd == -1)
{
log_e("socket error");
return -1;
}
server_addr.sun_family = AF_UNIX;
strncpy(server_addr.sun_path, UNIX_DOMAIN, strlen(UNIX_DOMAIN));
do
{
if (connect(server_sock_fd, (struct sockaddr *)&server_addr, sizeof(server_addr)) >= 0)
break;
else if (retry_count > 0)
{
log_w("failed to connect, sleep 0.5s and try again...");
usleep(500000); // 0.5 s
}
else
{
log_e("connection error, %s, line %d.", strerror(errno), __LINE__);
goto CLEAN;
}
} while (retry_count-- > 0);
if ((byte_num = send(server_sock_fd, reinterpret_cast<char *>(fiforequest), static_cast<int>(fiforequest_size), 0)) == -1)
{
log_e("connection error, %s, line %d..", strerror(errno), __LINE__);
ret = -1;
goto CLEAN;
}
byte_num = recv(server_sock_fd, reinterpret_cast<char *>(recv_msg), BUFFER_SIZE, 0);
if (byte_num > 0)
{
if (byte_num > BUFFER_SIZE)
{
byte_num = BUFFER_SIZE;
}
recv_msg[byte_num] = '\0';
response = (FIFO_MSG *)malloc((size_t)byte_num);
if (!response)
{
log_e("memory allocation failure.\n");
ret = -1;
goto CLEAN;
}
memset(response, 0, (size_t)byte_num);
memcpy(response, recv_msg, (size_t)byte_num);
*fiforesponse = response;
*fiforesponse_size = (size_t)byte_num;
ret = 0;
}
else if (byte_num < 0)
{
log_e("server error, error message is %s!\n", strerror(errno));
ret = -1;
}
else
{
log_i("server exit!\n");
ret = -1;
}
CLEAN:
close(server_sock_fd);
return ret;
}

However, if not return -1, msgresp is still null, and will cause crash in memcpy at line 167

if (client_send_receive(msgreq, reqsize, &msgresp, &respsize) != 0)
{
free(msgreq);
log_e("fail to send and receive message.\n");
return INVALID_SESSION;
}
//TODO copy to output message pointer
memcpy(resp_message, msgresp->msgbuf, msgresp->header.size < resp_message_size ? msgresp->header.size : resp_message_size);

Another example is at line 72, cause free an invalid pointer.

if ((client_send_receive(&msg1_request, sizeof(FIFO_MSG), &msg1_response, &msg1_resp_size) != 0)
|| (msg1_response == NULL))
{
log_e("fail to send and receive message.\n");
return INVALID_SESSION;
}
msg1_respbody = (SESSION_MSG1_RESP *)msg1_response->msgbuf;
memcpy(dh_msg1, &msg1_respbody->dh_msg1, sizeof(sgx_dh_msg1_t));
*session_id = msg1_respbody->sessionid;
free(msg1_response);

And msg1_response is not initialized, better to initialized to null.

FIFO_MSG *msg1_response;

Thanks for pointing the issues. Could you provide a patch to fix them? Thanks

Thanks @LeoneChen. We may need a static code scan in the future.

Good! Hope EHSM become more and more popular

Thanks for your contribution.

Hi. I am trying to test this sgx-based project. May I ask how you discovered these bugs mentioned above. Did you use some fuzzing tools or just manually check the code?

Hi~

We've developed a fuzzing tool to detect this bugs.

Thanks~ Is this tool public? If so, could u share a link?

We will open source after paper is accepted

ok. Looking forward to it~