The heterologous expression of recombinant protein requires host cells, such as Escherichia coli, and the solubility of protein greatly affects the protein yield. A novel and highly accurate solubility predictor that concurrently improves the production yield and minimizes production cost, and that forecasts protein solubility in an E. coli expression system before the actual experimental work is highly sought.
-
Install Anaconda (https://www.anaconda.com/download)
- Create EPSOL environment (Run
conda env create -f tf.yml
)
- Create EPSOL environment (Run
-
SCRATCH-1D release 1.2 (http://download.igb.uci.edu/SCRATCH-1D_1.2.tar.gz)
-
R requirements (https://www.r-project.org)
- R libraries
- bio3d
- stringr
- Interpol
- zoo
- R libraries
You can also create R environment by conda (Run conda env create -f R.yml
)
Use conda activate tf
or conda activate R
to activate the environment.
You need to perform three steps to predict new test file (e.g. new_test.fasta).
-
Run SCRATCH with the new test file.
- Execute in the command line:
Run
your_SCRATCH_installation_path/bin/run_SCRATCH-1D_predictors.sh new_test.fasta new_test 20
20
is the number of processors,new_test
is the output files' prefix. - It will return four files in current folder:
- new_test.ss
- new_test.ss8
- new_test.acc
- new_test.acc20
- Execute in the command line:
Run
-
Calculate features for test sequences.
- Execute in the command line:
Run
R --vanilla < PaRSnIP.R new_test.fasta new_test.ss new_test.ss8 new_test.acc20 new_test
- Following this step, one file is created in current folder:
- new_test_src_bio: contains biological features corresponding to the raw protein sequences
- Execute in the command line:
Run
-
Run EPSOL prediction file.
- Execute in the command line:
Run
python new_test.fasta new_test.ss new_test.ss8 new_test.acc new_test.acc20 new_test
- The final prediction result will be saved in
./result/predict_file/
, and the filename isnew_test_prediction.txt
- Execute in the command line:
Run
Liang Yu: lyu@xidian.edu.cn