diwata11 / BiORAM-SGX

A Practical Privacy-Preserving Data Analysis for Personal Genome by Intel SGX. Latest version is in https://github.com/cBioLab/BiORAM-SGX

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BiORAM-SGX

A Practical Privacy-Preserving Data Analysis for Personal Genome by Intel SGX.

Abstract

Intel SGX is a technology that can executes programs securely using Enclave, secure region on DRAM created by Intel's CPU. But, it is difficult to implement programs using Intel SGX. BiORAM-SGX enable to implement statistical analysis for personal genome data easily and flexibly using Intel SGX.

In this system, when client request to analyze personal genome data, they get only result. During analysis, data do not leak to client and server, and the analysis procedures do not leak to the server. BiORAM-SGX deploys JavaScript interpreter on Enclave to analyze data flexibly and protect personal genome data. Interpreter has functions of statisical analysis for bioinformatics. Therefore, it is easy for client to imprement various kind of statistical programs. BiORAM-SGX stores personal genome data with encryption, and decrypt it only on Enclave. BiORAM-SGX uses Path ORAM to get encrypted personal genome data quickly and securely.

  • Client: people who analyze personal genome data.
  • Data Owner: people who provide SGX Server with personal genome data.
  • SGX Server: server that has environment using Intel SGX. We assume that SGX Server is malicious.

Demo

BiORAM-SGX ※ This demo movie is older than latest version of BiORAM-SGX. Therefore, some of implementation on this movie are a little different from latest specification.

Installation Requirements

  • BiORAM-SGX needs "linux-sgx" and "linux-sgx-driver". Install them from following site.

  • BiORAM-SGX also needs following libraries.

apt install sqlite3
apt install libsqlite3-dev
apt-get install libcurl4-openssl-dev
  • Run the following command to get your system's OpenSSL version. It must be at least 1.1.0:
openssl version
  • If necessary, download the source for the latest release of OpenSSL 1.1.0, then build and install it into a non-system directory such as /opt (note that both --prefix and --openssldir should be set when building OpenSSL 1.1.0). For example:
wget https://www.openssl.org/source/openssl-1.1.0i.tar.gz
tar xf openssl-1.1.0i.tar.gz
cd openssl-1.1.0i
./config --prefix=/opt/openssl/1.1.0i --openssldir=/opt/openssl/1.1.0i
make
sudo make install

Installation

cd ~
git clone git@github.com:cBioLab/BiORAM-SGX.git
cd BiORAM-SGX
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/BiORAM-SGX/sample_libcrypto
./bootstrap
./configure --with-openssldir=/opt/openssl/1.1.0i
make
mkdir SGXserver_data
cd SGXserver_data
mkdir upload_data
mkdir ORAM_table
  • You should get your service provider id(SPID) and Attestation Report Root CA Certificate(Intel_SGX_Attestation_RootCA.pem).

    • If you get SPID, write it on setting. Check HERE for detail.
    • Intel_SGX_Attestation_RootCA.pem can get following way.
    cd ~/BiORAM-SGX/
    wget https://certificates.trustedservices.intel.com/Intel_SGX_Attestation_RootCA.pem
    
  • If you have any problem, you should check sgx-ra-sample.

Sample Running

Create database for user verification

At first, create table on ~/BiORAM-SGX/.

cd ~/BiORAM-SGX/
sqlite3 testdb
$ SQLite version x.xx.x 20xx-xx-xx xx:xx:xx
$ Enter ".help" for usage hints.
$ sqlite> create table users(id text, pwhash text);
$ sqlite> .exit

Then, register your id and pwhash.

cd ~/BiORAM-SGX/
python3 CreateID_pass.py
$ Input userID:   DataOwner
$ Input password: DataOwner
$ Are you sure to register this userID and password[y/n]?: y
python3 CreateID_pass.py
$ Input userID:   Client
$ Input password: Client
$ Are you sure to register this userID and password[y/n]?: y

[Data Owner] Download genome data (1000 genome project)

cd ~/BiORAM-SGX/dataowner_data/
wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
gunzip ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz

[Data Owner] Split and Encrypt genome data

cd ~/BiORAM-SGX/dataowner_data/
# Split genome data by nation. Use "xlrd" library.
python SplitVCFData_nation.py 22
# Split nation genome data by each size(102000[byte]: about 100000 byte + padding).
python3 SplitVCFData_size.py ~/BiORAM-SGX/dataowner_data/ ~/BiORAM-SGX/dataowner_data/chr22_GWD/ 22 GWD 100000 2000
python3 SplitVCFData_size.py ~/BiORAM-SGX/dataowner_data/ ~/BiORAM-SGX/dataowner_data/chr22_JPT/ 22 JPT 100000 2000
# Encrypt splitted nation genome data. We use Intel SGX for encryption, but it is not necessary for Data Owner to use Intel SGX in case Data Onwer encrypt them using AES-GCM.
cd EncryptAES_SGX
make
# GWD: Gambian in Western Division, The Gambia
# JPT: Japanese in Tokyo, Japan
./app ~/BiORAM-SGX/dataowner_data/chr22_GWD/ 22 GWD 102000
./app ~/BiORAM-SGX/dataowner_data/chr22_JPT/ 22 JPT 102000
cd ../
cp -r chr22_GWD chr22_JPT ../SGXserver_data/upload_data/
rm ../SGXserver_data/upload_data/chr22_GWD/AES_SK.key
rm ../SGXserver_data/upload_data/chr22_JPT/AES_SK.key

※ Shortcut for download, split and encrypt genome data.

Above commands take about 10 minutes because genome data of chromosome 22 is huge. If you use following commands, reduce time.

cd ~/BiORAM-SGX/dataowner_data/
# short size of genome data.
gunzip *.gz
python3 SplitVCFData_size.py ~/BiORAM-SGX/dataowner_data/ ~/BiORAM-SGX/dataowner_data/chr22_GWD/ 22 GWD 100000 2000
python3 SplitVCFData_size.py ~/BiORAM-SGX/dataowner_data/ ~/BiORAM-SGX/dataowner_data/chr22_JPT/ 22 JPT 100000 2000
cd EncryptAES_SGX
make
./app ~/BiORAM-SGX/dataowner_data/chr22_GWD/ 22 GWD 102000
./app ~/BiORAM-SGX/dataowner_data/chr22_JPT/ 22 JPT 102000
cd ../
cp -r chr22_GWD chr22_JPT ../SGXserver_data/upload_data/
rm ../SGXserver_data/upload_data/chr22_GWD/AES_SK.key
rm ../SGXserver_data/upload_data/chr22_JPT/AES_SK.key

[Data Owner] Create ORAM structure

  • SGX Server side
./run-SGXserver
  • Data Owner side
./run-client
$ Input your user ID: DataOwner
$ Input your ID's password: DataOwner
$ (If you do not have key, push ENTER only.)
$ Input your SK filename: ./dataowner_data/chr22_GWD/AES_SK.key
$ Input your JavaScript code: ./dataowner_data/ORAMinit_GWD.js
---
./run-client
$ Input your user ID: DataOwner
$ Input your ID's password: DataOwner
$ (If you do not have key, push ENTER only.)
$ Input your SK filename: ./dataowner_data/chr22_JPT/AES_SK.key
$ Input your JavaScript code: ./dataowner_data/ORAMinit_JPT.js

[Client] Analyze genome data

  • SGX Server side
./run-SGXserver
  • Client side
./run-client
$ Input your user ID: Client
$ Input your ID's password: Client
$ (If you do not have key, push ENTER only.)
$ Input your SK filename: [ENTER]
$ Input your JavaScript code: ./client_data/fisher.js

Client sample .js codes are as follows.

  • fisher.js: sample code to execute fisher's exact test.
  • LR.js: sample code to execute logistic regression(100 positions).
  • PCA.js: sample code to execute PCA(100 positions -> 2 dimension).
  • LR_PCA.js: execute LR(10 positions) -> select 5 positions that have high relation between GWD and JPT -> PCA(5 positions -> 2 dimension) -> save result as file.
    It can visualize as follows. Because sample positions are quite a few, classification is not proper.(If you check proper classification, see demo.)
    cd ~/BiORAM-SGX/client_data/
    python Visualize_data.py
    

Benchmark(2020/02/20)

Machine Spec

Parameters

  • Z(see detail on Path ORAM paper.): 6
  • StackMaxSize: 4[MB] (4,000,000 byte)
  • HeapMaxSize: 96[MB] (96,000,000 byte)
  • Data: 1000 Genome Project data, espwcially 2 nations.
    • GWD: Gambian in Western Division, The Gambia
    • JPT: Japanese in Tokyo, Japan

Genome data size are as follows.

AllGenome(JPT) AllGenome(GWD) chr1(JPT) chr1(GWD) chr22(JPT) chr22(GWD)
Data size [GB] 35.8 38.6 2.76 2.97 0.471 0.508
num of splitted data 384758 415536 29658 32006 5062 5463

Case1: AllGenome

We create ORAM Trees using all human chromosome, each nation(JPT, GWD).

  • Fisher
process time [sec]
File Search 4.372849
Analyze 0.0273248
Total 4.401838
  • LR
    Using gradient descent, regularization.
number of positions
10 50 100
Fille Search [sec] 47.97443 216.4722 406.3569
Analyze [sec] 0.0052505 0.022678 0.04415015
Total [sec] 47.98099 216.4971 406.40365
  • PCA
    In PCA, we use only JPT data, using power method.
number of positions
10 50 100
Fille Search [sec] 19.74556 101.20553 237.0048
Analyze [sec] 0.0002727 0.0028131 0.0117333
Total [sec] 19.74735 101.21001 237.0183

Case2: chromosome 1

We create ORAM Trees using chromosome 1, each nation(JPT, GWD).

  • Fisher
process time [sec]
File Search 1.4665754
Analyze 0.0001375
Total 1.4682056
  • LR
    Using gradient descent, regularization.
number of positions
10 50 100
Fille Search [sec] 5.742125 28.30003 64.83146
Analyze [sec] 0.0055113 0.022171 0.0434385
Total [sec] 5.748933 28.32372 64.87664
  • PCA
    In PCA, we use only JPT data, using power method.
number of positions
10 50 100
Fille Search [sec] 2.47331 13.19456 27.24546
Analyze [sec] 0.006414 0.0059026 0.0153582
Total [sec] 2.475577 13.20257 27.26291

Case3: chromosome 22

We create ORAM Trees using chromosome 22, each nation(JPT, GWD).

  • Fisher
process time [sec]
File Search 0.2158026
Analyze 0.0274049
Total 0.244528
  • LR
    Using gradient descent, regularization.
number of positions
10 50 100
Fille Search [sec] 3.184544 22.78428 39.85593
Analyze [sec] 0.0060702 0.0235689 0.0479591
Total [sec] 3.191978 22.80935 39.90606
  • PCA
    In PCA, we use only JPT data, using power method.
number of positions
10 50 100
Fille Search [sec] 1.470165 9.026763 15.40194
Analyze [sec] 0.0006192 0.0039763 0.0133208
Total [sec] 1.472607 9.032648 15.41728

License

BiORAM-SGX is released under the MIT License. See LICENSE for details.

Licenses of external libraries are listed as follows.

Acknowledgement

We thank Mr.Ao Sakurai for fruitful discussions.

Contact

Daiki Iwata(d_iwata@ruri.waseda.jp)

About

A Practical Privacy-Preserving Data Analysis for Personal Genome by Intel SGX. Latest version is in https://github.com/cBioLab/BiORAM-SGX

License:Other


Languages

Language:C++ 69.3%Language:C 13.4%Language:Makefile 10.9%Language:M4 3.3%Language:Python 1.8%Language:JavaScript 0.5%Language:Shell 0.5%Language:Batchfile 0.2%