FRED-2 / OptiType

Precision HLA typing from next-generation sequencing data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

unpaired reads, no mismatches, ambiguous

QianRongAn opened this issue · comments

I can get the results successfully, But the result is not ideal.
This is result.tsv, I don't whether it is right.

	A1	A2	B1	B2	C1	C2	Reads	Objective
0	A*01:01	A*01:01					0	0.0

And this is plot.pdf:

This is scRNA-seq data,
I use razers3 first:
razers3 -i 95 -m 1 -dr 0 -o fished_1.bam /mnt/bwa-0.7.17/optitype/OptiType-1.3.5/data/hla_reference_rna.fasta /data1/s/liver-cancer-GSA-HCC/HRR572980_S1_L001_R1_001.fastq.gz

razers3 -i 95 -m 1 -dr 0 -o fished_2.bam /mnt/bwa-0.7.17/optitype/OptiType-1.3.5/data/hla_reference_rna.fasta /data1/s/liver-cancer-GSA-HCC/HRR572980_S1_L001_R2_001.fastq.gz

Then, use samtools:
samtools bam2fq fished_1.bam > sample_1_fished.fastq

Using optitype last:
python OptiTypePipeline.py -i /mnt/bwa-0.7.17/optitype/OptiType-1.3.5/sample_1_fished.fastq /mnt/bwa-0.7.17/optitype/OptiType-1.3.5/sample_2_fished.fastq --rna -v -o /mnt/bwa-0.7.17/optitype/OptiType-1.3.5/output/

mapping with 16 threads...

0:00:01.17 Mapping sample_1_fished.fastq to NUC reference...

0:00:26.79 Mapping sample_2_fished.fastq to NUC reference...

0:01:19.13 Generating binary hit matrix.
[E::idx_find_and_load] Could not retrieve index file for '/mnt/bwa-0.7.17/optitype/OptiType-1.3.5/output/2023_10_20_09_13_49/2023_10_20_09_13_49_1.bam'
0:01:19.19 Loading /mnt/bwa-0.7.17/optitype/OptiType-1.3.5/output/2023_10_20_09_13_49/2023_10_20_09_13_49_1.bam started. Number of HLA reads loaded (updated every thousand):
1K...2K...3K...4K...
0:01:33.37 4445 reads loaded. Creating dataframe...
0:05:49.75 Dataframes created. Shape: 4445 x 7339, hits: 6690942 (6710769), sparsity: 1 in 4.86
[E::idx_find_and_load] Could not retrieve index file for '/mnt/bwa-0.7.17/optitype/OptiType-1.3.5/output/2023_10_20_09_13_49/2023_10_20_09_13_49_2.bam'
0:05:49.85 Loading /mnt/bwa-0.7.17/optitype/OptiType-1.3.5/output/2023_10_20_09_13_49/2023_10_20_09_13_49_2.bam started. Number of HLA reads loaded (updated every thousand):
1K...2K...3K...4K...5K...6K...7K...8K...9K...10K...11K...12K...13K...14K...15K...16K...17K...18K...19K...20K...21K...22K...23K...24K...25K...26K...27K...28K...29K...30K...31K...32K...33K...34K...35K...36K...37K...38K...
0:06:01.87 38034 reads loaded. Creating dataframe...
0:40:44.93 Dataframes created. Shape: 38034 x 7339, hits: 7407290 (7407290), sparsity: 1 in 37.68
0:40:48.28 Alignment pairing completed. 0 paired, 42479 unpaired, 0 discordant

WARNING: Less than 10% of reads could be paired. Consider an appropriate unpaired_weight setting in your config file (currently 0.000), because you may need to resort to using unpaired reads.

0:40:49.04 temporary pruning of identical rows and columns

0:40:49.05 Size of mtx with unique rows and columns: (0, 1)
0:40:49.05 determining minimal set of non-overshadowed alleles

0:40:49.06 Keeping only the minimal number of required alleles (1,)

0:40:49.06 Creating compact model...

starting ilp solver with 1 threads...

0:40:49.07 Initializing OptiType model...
WARNING: Initializing ordered Set R with a fundamentally unordered data source
(type: set). This WILL potentially lead to nondeterministic behavior in Pyomo
WARNING: DEPRECATED: The Model.preprocess() method is deprecated and no longer
performs any actions (deprecated in 6.0) (called from
/mnt/bwa-0.7.17/optitype/OptiType-1.3.5/model.py:147)
GLPSOL--GLPK LP/MIP Solver 5.0
Parameter(s) specified in the command line:
--write /tmp/tmptj7ujh5j.glpk.raw --wglp /tmp/tmprelcfi26.glpk.glp --cpxlp
/tmp/tmpunkeoa1h.pyomo.lp
Reading problem data from '/tmp/tmpunkeoa1h.pyomo.lp'...
/tmp/tmpunkeoa1h.pyomo.lp:27: warning: lower bound of variable 'x4' redefined
/tmp/tmpunkeoa1h.pyomo.lp:27: warning: upper bound of variable 'x4' redefined
3 rows, 3 columns, 4 non-zeros
One variable is binary
28 lines were read
Writing problem data to '/tmp/tmprelcfi26.glpk.glp'...
18 lines were written
GLPK Integer Optimizer 5.0
3 rows, 3 columns, 4 non-zeros
1 integer variable, which is binary
Preprocessing...
Objective value = 0.000000000e+00
INTEGER OPTIMAL SOLUTION FOUND BY MIP PREPROCESSOR
Time used: 0.0 secs
Memory used: 0.0 Mb (39693 bytes)
Writing MIP solution to '/tmp/tmptj7ujh5j.glpk.raw'...
15 lines were written

0:40:54.96 Result dataframe has been constructed...