SGX startup is slow due to quadratic TOML processing
pwmarcz opened this issue · comments
Description of the problem
Graphene-SGX startup is slow for manifests that have a lot of trusted_files
.
Steps to reproduce
On an Ubuntu 18.04 machine, current master branch (c602e56). Try to run Python example with graphene-sgx python -c "print('hello')"
.
Expected results
This should be relatively quick.
Actual results
The command takes 10 seconds:
real 0m10.816s
user 0m9.311s
sys 0m1.473s
It looks like python.manifest.sgx
contains a lot of files (all of /usr/lib/python3
, /lib/x86_64-linux-gnu
, /usr/lib/x86_64-linux-gnu
):
$ wc -l python.manifest.sgx
30049 python.manifest.sgx
$ grep /usr/lib/python3/ python.manifest.sgx | wc -l
10263
$ grep x86_64-linux-gnu python.manifest.sgx | wc -l
3621
Stopping in GDB shows that time is spent in this loop:
for (ssize_t i = 0; i < toml_trusted_files_cnt; i++) {
const char* toml_trusted_file_key = toml_key_in(toml_trusted_files, i);
assert(toml_trusted_file_key);
toml_raw_t toml_trusted_file_raw = toml_raw_in(toml_trusted_files, toml_trusted_file_key);
// ...
}
It looks like toml_raw_in
does a linear traversal of the whole trusted_files
table.
Yup, because we're using wrong TOML constructs for this, we should use arrays, not dictionaries (which are slow, and the keys make no sense here). But this will be resolved when we completely fix #2076.
Using TOML tables instead of TOML arrays also blocks my other PR: #2484
I started working on this transition. Here is the idea:
- Refactor all relevant places in code (#2607)
- First try to parse legacy TOML-table syntax
sgx.allowed_files.bla = "file"
, if not found, try new TOML-array syntaxsgx.allowed_files = ["file1", ...]
- We keep legacy TOML-table syntax purely for compatibility reasons; we deprecate it and at some point we may drop it
- Add a better name
sgx.passthrough_files
to the legacy unclearsgx.allowed_files
- We keep legacy
sgx.allowed_files
name purely for compatibility reasons; we deprecate it and at some point we may drop it
- We keep legacy
@dimakuv What about sgx.trusted_checksum
though? Will it remain a table (with the same quadratic-lookup problem), or will it be an array (and the parsing code will need to "zip" both arrays), or...?
sgx.trusted_checksum
will be an array. Effectively, sgx.trusted_files[index] = "file:bla"
has a corresponding item sgx.trusted_checksum[index] = "12345..."
.
I have a branch in my local repo, I'll publish it after #2607 is merged.
There's those other efforts related to partial manifest and HSM signing and I'm not sure how the manifest structure should look like. In case of partial manifests (i.e. situation, when you don't have all the trusted/protected files on your machine and you rely on externally provided hashes), don't you want something like:
sgx.trusted_files = [
{ 'path' = '/q/werty', 'sha256' = 'deadbeef' },
]
# or maybe
[[sgx.trusted_files]]
path = '/asdf/zxcv'
sha256 = 'abcd'
?
Because managing parallel arrays, while certainly possible to get right, might be more error-prone.
sgx.trusted_files = [ { 'path' = '/q/werty', 'sha256' = 'deadbeef' }, ]
Definitely doable, though I wouldn't consider it important. sgx.trusted_checksum
is a Graphene-SGX-internal feature which users never use or even know about. How exactly this is implemented in the final .manifest.sgx
and in Graphene-SGX code, should be irrelevant to the users/developers.
Forcing users to use an "array of two-field tables" sound much more complicated than my current "array of file paths":
sgx.trusted_files = [
"file:{{ graphene.runtimedir() }}/",
"file:{{ entrypoint }}",
]
sgx.allowed_files = [
"file:tmp/",
"file:root", # for getdents test
"file:testfile" # for mmap_file test
]
Anyway, my points are:
- I want to have the new syntax for
sgx.{allowed/trusted/protected}_files
as above, just a TOML array - The part with SHA256 hashes (historically called
sgx.trusted_checksum
) is Graphene-internal and it doesn't matter much how it is implemented; we can change it later without anyone noticing
In case of partial manifests (i.e. situation, when you don't have all the trusted/protected files on your machine and you rely on externally provided hashes)
I am not aware of such scenarios. Can this really happen for sgx.trusted_files
? (Please note that sgx.protected_files
works in a completely different way, there is no SHA256 hash associated with them.)
Yes, there are at least two scenarios for trusted_files
:
- the file is confidential and we don't want to keep around ML weights;
- the file is very big and we don't want to keep a copy on build server for no other reason than to recalculate it's hash.
So we need to have a possibility of "partially finalised" manifest and to merge several manifests in various stages of finalisation. From this POV it's not internal anymore, unless you want some manifests that look like manifests, still unsigned, but you'd better not touch them by hand.
If you'd like to preserve simplicity of an array of strings, trusted_files
could be an array of (string or two-key hash), if that's not too much work.
If you'd like to preserve simplicity of an array of strings, trusted_files could be an array of (string or two-key hash), if that's not too much work.
I like this idea, it preserves simplicity for usual use-cases, but doesn't block more complicated ones.
And I think we already had someone asking to support providing hashes without the corresponding data to some of the trusted files.
Ok, let me implement Woju's approach.
So I tried this:
sgx.trusted_files = [
"file:exec_victim",
{uri = "file:trusted_testfile", hash = "deadbeef"}
]
And got Python TOML error:
File "/home/dimakuv/graphene/built/bin/graphene-sgx-sign", line 5, in <module>
sys.exit(main())
File "/home/dimakuv/graphene/built/lib/python3.6/site-packages/graphenelibos/sgx_sign.py", line 825, in main
manifest = read_manifest(manifest_path)
File "/home/dimakuv/graphene/built/lib/python3.6/site-packages/graphenelibos/sgx_sign.py", line 683, in read_manifest
manifest = toml.load(path)
File "/home/dimakuv/.local/lib/python3.6/site-packages/toml/decoder.py", line 134, in load
return loads(ffile.read(), _dict, decoder)
File "/home/dimakuv/.local/lib/python3.6/site-packages/toml/decoder.py", line 512, in loads
multibackslash)
File "/home/dimakuv/.local/lib/python3.6/site-packages/toml/decoder.py", line 778, in load_line
value, vtype = self.load_value(pair[1], strictly_valid)
File "/home/dimakuv/.local/lib/python3.6/site-packages/toml/decoder.py", line 880, in load_value
return (self.load_array(v), "array")
File "/home/dimakuv/.local/lib/python3.6/site-packages/toml/decoder.py", line 1002, in load_array
a[b] = a[b] + ',' + a[b + 1]
IndexError: list index out of range
So yeah, Python's TOML parser doesn't support mixed arrays: uiri/toml#270. Actually, looking at this GitHub repo, the project seems to be dying? There was no commit activity in the last couple months (I think from January 2021).
But this workaround works:
sgx.trusted_files = [
"file:exec_victim",
]
[[sgx.trusted_files]]
uri = "file:trusted_testfile"
hash = "deadbeef"
Oh nice, our C TOML parser doesn't support mixed arrays:
$ graphene-direct ./helloworld
error: PAL failed at parsing the manifest: line 35: array mismatch
Well, the latest version supports it: cktan/tomlc99#51
I will update our TOML C parser to this latest version then.
Ok, I implemented everything in my local branch.
My Python SGX manifest is similar to Pawel's in terms of number of Python-internal files:
$ wc -l python.manifest.sgx
33359 python.manifest.sgx
Old times:
$ time graphene-sgx python -c "print('hello')"
hello
real 0m6.026s
user 0m4.254s
sys 0m1.737s
New times:
$ time graphene-sgx python -c "print('hello')"
hello
real 0m3.007s
user 0m0.873s
sys 0m2.098s
About 5x improvement (looking at user
time).