lsds / sgx-lkl

SGX-LKL Library OS for running Linux applications inside of Intel SGX enclaves

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Review use of shared memory

wintersteiger opened this issue · comments

All code that depends on the memory shared between host and enclave needs to be reviewed thoroughly. There were no checks to ensure shared memory regions reside outside of enclave memory. Since a malicious host can change the content of the shared memory at any time, we need to guard every access to shared memory regions to make sure the host doesn't trick the enclave into modifying its state with host-controlled content. This applies to src/enclave/*, src/shared/* as well as lkl/*.

Related to #732 and #743.

We should also avoid using args->shm->env and instead copy the environment variables referenced in the enclave_config explicitly.

@prp all variables are copied to make sure they are in enclave memory and the host can't change them after initialization. We can't rely on the host giving us just the variables we're interested in. We could do the filtering currently done in _prepare_elf_stack into _copy_shared_memory though.

virto_dev->virtq->virtq_desc contains the address/length of the buffer chain (ring buffers).

struct virtio_dev
{
    uint32_t device_id;
    uint32_t vendor_id;
    uint64_t device_features;
    uint32_t device_features_sel;
    uint64_t driver_features;
    uint32_t driver_features_sel;
    uint32_t queue_sel;
    struct virtq* queue;
    uint32_t queue_notify;
    uint32_t int_status;
    uint32_t status;
    uint32_t config_gen;
    struct virtio_dev_ops* ops;
    int irq;
    void* config_data;
    int config_len;
    void* base;
    uint32_t virtio_mmio_id;
};

struct virtq
{
    uint32_t num_max;
    uint32_t num;
    uint32_t ready;
    uint32_t max_merge_len;

    struct virtq_desc* desc;
    struct virtq_avail* avail;
    struct virtq_used* used;
    uint16_t last_avail_idx;
    uint16_t last_used_idx_signaled;
};

struct virtq_desc
{
    /* Address (guest-physical). */
    uint64_t addr;
    /* Length. */
    uint32_t len;
    /* The flags as indicated above. */
    uint16_t flags;
    /* We chain unused descriptors via this, too */
    uint16_t next;
};

These buffers should be outside of the enclave so the host virtio driver can access them. _copy_shared_memory() should parse all virtio devices and make sure each device's buffer chain are all outside of the enclave.

virto_dev->config_data and virto_dev->config_len specifies the virtio device MMIO registers. _copy_shared_memory() should makes sure each device's config_data buffer is outside of the enclave.

virto_dev->virtio_dev_ops seems to point to a set of function pointers.

@davidchisnall, do you know whether virtio_dev_ops are invoked inside enclave or outside of the enclave?

@davidchisnall, how is 'virto_dev->base` used? Is it also a variable length buffer?

It doesn't help to verify most of these things early because they can be modified by the time that they are used.

We can verify the config_data and config_len according to the device type and keep a copy of those inside the enclave. LKL only reads and writes these via our code, so we can do the validation there. That already looks up the device via an index, it could look up metadata if we added an indirection structure. The config data is after the other fields in the IO address space, so we should probably get rid of the config_data field, allocate this memory after the end of the structure.

The base field should go away: it is used to store the return value from register_iomem (LKL function) and then pass it into other LKL functions. This is always &base and the extra indirection is unhealthy.

There are two things I'm concerned about on the LKL side:

  • Is the descriptor region read more than once? If so, it needs validating on each access. It looks as if it may not validate these, but I'm not sure. We may need to modify Linux to defensively copy the base address and allocate from the copy.
  • The addresses in the descriptors should all be in the region that we initialise the swiotlb with. LKL ever read these values back (I'd hope not, but it's not completely clear. If it uses these values only to identify responses, that's also probably fine: the host can pretend that a different response has completed, but that's no different than any other lie the host can tell the guest).

@davidchisnall, does the host need to modify the address of the allocated buffer chains specified in virtq_desc after the virtio device is initialized? If host only needs to modify virtq->avail and virtq->used, virtq_desc content can be copied into the enclave during the initialization time.

The host doesn't need to modify the descriptors at all, but it does need to read them and SGX doesn't have a mechanism for read-only sharing and the trust model that virtio was originally designed for doesn't assume malicious devices so the kernel's drivers may trust the device to not modify them.

OK, so the enclave code sets up the descriptors. Does the enclave code need to modify the descriptors after the descriptors are initialized? If not, the solution can be copying the initialized descriptors (inside enclave) to outside the enclave.

I don't understand the question in the context of the virtio spec. Descriptors need to be written in the descriptor table and the pointer to them needs to be in the ring request and response.

@davidchisnall, assuming the enclave code initializes the descriptor table and doesn't need to change it at run time, here is the design I'm thinking about:

  • The enclave code initializes the address/length/flag of each buffer in the descriptor table, with the buffers outside of enclave, keeps a version of the descriptor table inside the enclave for its own use, and copies the content of the initialized descriptor table to outside of enclave for the host to use;
  • Host only needs to read the initialized descriptor table residing outside of the enclave when handling ring request/response. If a malicious host modifies the initialized descriptor table residing outside of the enclave, the version inside the enclave accessed by the enclave code is not affected. When the descriptor table outside enclave and the descriptor table inside the enclave mismatch, virtio function is affected, but the malicious host can not use this attack to trick the virtio code inside enclave to read/write enclave memory unexpectedly.

The descriptors point to memory in the region managed by the SWIOTLB. This is allocated in different-sized chunks depending on the request being sent. Preallocating the descriptors would require significant changes to the virtio driver and would not provide any benefit because the virtio layer does not need to read the contents of the descriptors back (though it may as an optimisation, we need to check whether the Linux drivers do this).

Is add_dev_buf_from_vring_desc(...) called inside enclave and reading buffer address/length from the share memory?

All LKL code is inside the enclave, but I don't believe we're using the virio code in lkl/tools. @prp?

I think that’s correct.

@prp , can you point me to the virtio code inside enclave?

It should be src/lkl/virtio*

@bodzhang, I already pointed you at the code above.

AFAIK, there's code from lkl/tools that is in use. In particular lkl_umount_timeout. There might be other code.

I tried to look for the register_iomem() call in src/lkl/virtio.c, and can only find it in lkl_sub_module/tools/lkl/lib/iomem.c. Is the iomem.c under tools/lkl/lib indeed the code used by the enclave? If that's the case, David's explanation about base redirection above makes sense to me now.

You're right. It seems that we're using lkl/tools/lkl/lib/iomem.c here.

One solution to mitigate the potential attack of host side manipulating base, config_data, config_len, queue (ring buffer descriptors needs further protection, not fully addressed in this proposal), without changing the upstream virtio code, is to implement a "shadow" virtio_dev structure inside the enclave and use sgx-lkl version of reads and writes to copy "benign" fields between the "shadow" virt_dev structure and the host side virt_dev structure.

The comments added to the virtio_dev definition below clarify how each field in the "shadow" virtio_dev should be handled. Pointer to the "shadow" virtio_dev structure will be provided to the rest of the virtio code inside enclave, and the read/write and other functions in virtio.c need to have a mapping between the the "shadow" virtio_dev and the host-side virtio_dev. In the read/write and other functions, access to fields marked as "pass through" will be performed on the host-side virtio_dev, and access to other fields will be performed on the the "shadow" virtio_dev.

struct virtio_dev
{
    uint32_t device_id; /*pass-through*/
    uint32_t vendor_id; /*pass-through*/
    uint64_t device_features; /*pass-through*/
    uint32_t device_features_sel; /*pass-through*/
    uint64_t driver_features; /*pass-through*/
    uint32_t driver_features_sel; /*pass-through*/
    uint32_t queue_sel; /*pass-through*/
    struct virtq* queue; /*read-only after initialization with security check*/
    uint32_t queue_notify; /*pass-through*/
    uint32_t int_status; /*pass-through*/ 
    uint32_t status; /*pass-through*/
    uint32_t config_gen; /*pass-through*/
    struct virtio_dev_ops* ops; /* host use only, set to NULL */
    int irq; /* pass-through*/
    void* config_data; /* read-only after initialization with security check*/
    int config_len;  /* read-only after initialization with security check*/
    void* base; /* guest use only, no pass-through */
    uint32_t virtio_mmio_id; /* pass-through*/  
};

This solution assumes the virtio code inside enclave does not directly access the "pass-through" fields directly without going through read/write and other functions in virtio.c. If that assumption does not hold, setting up the "shadow" virtio_dev address to #PF on enclave access (when enclave code access the memory) with secure #FP exception delivery (not supported on Coffeelake SGX CPU) can be used to trigger necessary code to "sanitize" virtio_dev structure access.

With the shadow approach, wouldn't all virtio_dev access need to go through the mapping routines?

The shadow routines are used for the memory mapped control registers. This includes the location that the driver writes to when it notifies the host that it has written entries to the ring. We could use that to copy the messages into the external ring. The only additional copying would be on the control plane, so this wouldn’t add very much overhead.

@davidchisnall , can you confirm my understanding about virtq below?

struct virtq
{
    uint32_t num_max; /* device capability, set by host, read by guest */
    uint32_t num; /* virtio driver selection, set by guest, read by host, pass_through */
    uint32_t ready; /* set by guest, read by host, pass-through */
    uint32_t max_merge_len; /* host use only? */

    struct virtq_desc* desc; /* host read-only */
    struct virtq_avail* avail; /* host read-only */
    struct virtq_used* used; /* guest read-only */
    uint16_t last_avail_idx; /* host use only? */
    uint16_t last_used_idx_signaled; /* host use only? */
};

@bodzhang, I believe this is correct. See src/lkl/virtio.c for the places where the enclave can read / write these fields. All accesse to this from LKL go via that interface. We can add any validation here that we need to (in particular, we should do a copy of the fields once and use that in the callback from LKL, not the raw non-enclave pointer).

If we are able to keep a shadow copy of the virtq_desc array and virtq_avail array inside the enclave and copy-through guest update to the shadow copy to the virtq_desc/virtq_avail arrays outside the enclave, "host read-only" policy can be enforced and the virtio code inside the enclave can't be affected by any potential malicious manipulation of virtq_desc/virtq_avail array outside the enclave. But the scheme depends on an efficient copy-through mechanism.

virtqueue_kick->virtqueue_notify->...->virtio_notify_host_device offers a copy-through opportunity in src/lkl/virtio.c, but it seems that virtqueue_kick is not invoked on every instance of update to virtq_desc/virtq_avail arrays.

In lkl/drivers/block/virtio_blk.c, it's only called on error condition. In lkl/drivers/virtio/virtio_ring.c. it's only called under a rare condition.

@davidchisnall , can you take a look at the calling code of virtio_notify_host_device and see whether it's always invoked after any update to virtq_desc/virtq_avail arrays?

It seems that the virtio_blk, virtio_console drivers do invoke virtqueue_notify (some code flow calls virtqueue_kick, some calls virtqueue_notify directly) after updating desc and avail arrays. But the virtio_net code is harder to trace to be sure about virtqueue_notify call following desc and avail update.

I realized another challenge. The virtio driver code in the Linux kernel allocate the desc/avail/used rings in the host memory and keep those pointers in kernel side data structure. How can we make sure the kernel code use the shadow copy of the desc/avail ring inside enclave instead?

It seems we have to modify the kernel code, right?

For packed ring, the code in lkl/drivers/virtio/virtio_ring.c does allocate the desc ring in the host memory and read/write the desc ring, but it seems to keep a copy of the buffer addr/len in a private data structure 'desc_extra', and only reads the buffer addr from the private data structure. The code does read flag and used buffer length from the host side desc ring. If we can confirm the above observation of virtio_ring.c packed ring implementation, and that no other virtio code inside the enclave directly access the host side ring desc, ensuring the LKL code inside the enclave only use packed ring (host side code needs to be changed to also use the packed ring) will be sufficient to protect against attack from the host side that modifies the buffer address in the packed ring desc.

Attack from the host side through the used buffer length value might still be able to trigger buffer overflow inside the enclave, but the mitigation is to add more sanity check code inside virtio code, which is likely easier to upstream to the kernel tree.