A error of Basenji's example :ValueError: Cannot feed value of shape (10, 131072, 4) for Tensor 'inputs:0', which has shape '(2, 131072, 4)'
Licko0909 opened this issue · comments
I tried to use basenji
in 'examples/1-predict` to make predictions.
ValueError: Cannot feed value of shape (32,131072, 4) for Tensor 'inputs:0', which has shape '(2, 131072, 4)
so I changed it to batch size=2
, and then there are other problems:
Can I only use batch size= 2
?
I want to use basenji for some other predictions. Thank you!
Yes, you can only use batch size of 2 with Basenji since that's how the model was serialized.
Seems that it can't store all the predictions into the hdf5 file due to large chunk_size
. I suggest you implement the prediction loop in python yourself for basenji (see https://github.com/kipoi/kipoi/blob/master/kipoi/cli/main.py#L239-L281) and specify a lower chunk_size
of the HDF5Writer explicitly (https://github.com/kipoi/kipoi/blob/master/kipoi/writers.py#L286). You could also try using the AsyncBatchWriter
or the ZarrBatchWriter
to get faster write speeds.
For 1-predict
, I also tried CpGenie (keras1.2) and DeepCpG (keras1.2) and DeepSEA (keras2) mentioned in snakemake, but all have problems.
models = ['CpGenie/merged', 'DeepCpG_DNA/Hou2016_mESC_dna']
models = ['DeepSEA/predict']
-------------------------------------------------------------------------------------------------------------------
"""Run `kipoi predict` for multiple models
"""
import kipoi
# --------------------------------------------
# Config
fasta_file = 'input/hg19.chr22.fa'
# which bed files to run
# intervals = ['random', 'enhancer-regions']
intervals = ['random', 'enhancer-regions']
# get all DeepBind models in trained on human ChIP-seq
df = kipoi.list_models()
deepbind_models = df.model[df.model.str.match("DeepBind/Homo_sapiens/TF/.*_ChIP-seq.*")].tolist()
assert len(deepbind_models) == 137
# which models to use
#models = ['Basenji'] + ['Basset'] + deepbind_models[:5] # + ['DeepSEA/predict']
models = deepbind_models[:5] + ['DeepSEA/predict']
# You can also use the following two, but you have to install the environment
# `kipoi env create shared/envs/kipoi-py3-keras1.2`
# ['CpGenie/merged', 'DeepCpg_DNA/Hou2016_mESC_dna']
#models = ['CpGenie/merged', 'DeepCpG_DNA/Hou2016_mESC_dna']
# output file formats
file_formats = ['tsv', 'h5']
# --------------------------------------------
rule all:
input:
expand('output/{model}/{interval}.{ext}',
model=models,
interval=intervals,
ext=file_formats)
# Main rule
rule predict:
"""Generic rule for running model prediction for Kipoi models
that take as input `intervals_file` and `fasta_file`
"""
input:
intervals_file = "input/{interval}.hg19.chr22.bed.gz",
fasta_file = fasta_file
output:
predictions = expand("output/{{model}}/{{interval}}.{ext}", ext=file_formats)
params:
workers = 20, # number of workers,
batch_size = 12
shell:
"""
source activate $(kipoi env get {wildcards.model})
kipoi predict \
{wildcards.model} \
--dataloader_args='{{"intervals_file": "{input.intervals_file}",
"fasta_file": "{input.fasta_file}"}}' \
-n {params.workers} \
--batch_size={params.batch_size} \
-o {output.predictions}
"""
rule unzip:
input:
fa_gz = fasta_file + ".gz"
output:
fa = fasta_file
shell:
"zcat {input.fa_gz} > {output.fa}"
Hm strange. Make sure the intervals are not from the edge of the chromosome which would yield shorter sequencs.
Yes, you can only use batch size of 2 with Basenji since that's how the model was serialized.
Seems that it can't store all the predictions into the hdf5 file due to large
chunk_size
. I suggest you implement the prediction loop in python yourself for basenji (see https://github.com/kipoi/kipoi/blob/master/kipoi/cli/main.py#L239-L281) and specify a lowerchunk_size
of the HDF5Writer explicitly (https://github.com/kipoi/kipoi/blob/master/kipoi/writers.py#L286). You could also try using theAsyncBatchWriter
or theZarrBatchWriter
to get faster write speeds.
Thanks you! It's working, but it is too slow!
vim ~/miniconda3/envs/kipoi-gpu-shared__envs__kipoi-py3-keras2/lib/python3.6/site-packages/kipoi/writers.py
Hm strange. Make sure the intervals are not from the edge of the chromosome which would yield shorter sequencs.
I try to use the others BED file, it's working!
Hm strange. Make sure the intervals are not from the edge of the chromosome which would yield shorter sequencs.
But the CpGenie and DeepCpG ,these two still can't run.
CpGenie and DeepCpG expect fixed sequence lengths. If this is still a problem, please re-open the issue.