A error of Basenji's example :ValueError: Cannot feed value of shape (10, 131072, 4) for Tensor 'inputs:0', which has shape '(2, 131072, 4)'

Licko0909 opened this issue

kipoi test Basenji --source=kipoi

ValueError: Cannot feed value of shape (10, 131072, 4) for Tensor 'inputs:0', which has shape '(2, 131072, 4)'


I tried to use basenji in 'examples/1-predict` to make predictions.

ValueError: Cannot feed value of shape (32,131072, 4) for Tensor 'inputs:0', which has shape '(2, 131072, 4)

so I changed it to batch size=2, and then there are other problems:

Can I only use batch size= 2?


I want to use basenji for some other predictions. Thank you!

For 1-predict, I also tried CpGenie (keras1.2) and DeepCpG (keras1.2) and DeepSEA (keras2) mentioned in snakemake, but all have problems.

models = ['CpGenie/merged', 'DeepCpG_DNA/Hou2016_mESC_dna']


models = ['DeepSEA/predict']


"""Run `kipoi predict` for multiple models
import kipoi

# --------------------------------------------
# Config
fasta_file = 'input/hg19.chr22.fa'

# which bed files to run
# intervals = ['random', 'enhancer-regions']
intervals = ['random', 'enhancer-regions']

# get all DeepBind models in trained on human ChIP-seq
df = kipoi.list_models()
deepbind_models = df.model[df.model.str.match("DeepBind/Homo_sapiens/TF/.*_ChIP-seq.*")].tolist()
assert len(deepbind_models) == 137

# which models to use
#models = ['Basenji'] + ['Basset'] + deepbind_models[:5] # + ['DeepSEA/predict']
models = deepbind_models[:5]  + ['DeepSEA/predict']

# You can also use the following two, but you have to install the environment
# `kipoi env create shared/envs/kipoi-py3-keras1.2`
# ['CpGenie/merged', 'DeepCpg_DNA/Hou2016_mESC_dna']
#models = ['CpGenie/merged', 'DeepCpG_DNA/Hou2016_mESC_dna']

# output file formats
file_formats = ['tsv', 'h5']
# --------------------------------------------

rule all:

# Main rule
rule predict:
    """Generic rule for running model prediction for Kipoi models
    that take as input `intervals_file` and `fasta_file`
        intervals_file = "input/{interval}.hg19.chr22.bed.gz",
        fasta_file = fasta_file
        predictions = expand("output/{{model}}/{{interval}}.{ext}", ext=file_formats)
        workers = 20,  # number of workers,
        batch_size = 12
        source activate $(kipoi env get {wildcards.model})
        kipoi predict \
          {wildcards.model} \
          --dataloader_args='{{"intervals_file": "{input.intervals_file}",
                              "fasta_file": "{input.fasta_file}"}}' \
          -n {params.workers} \
          --batch_size={params.batch_size} \
          -o {output.predictions}

rule unzip:
        fa_gz = fasta_file + ".gz"
        fa = fasta_file
        "zcat {input.fa_gz} > {output.fa}"

Yes, you can only use batch size of 2 with Basenji since that's how the model was serialized.

Seems that it can't store all the predictions into the hdf5 file due to large chunk_size. I suggest you implement the prediction loop in python yourself for basenji (see and specify a lower chunk_size of the HDF5Writer explicitly ( You could also try using the AsyncBatchWriter or the ZarrBatchWriter to get faster write speeds.

Thanks you! It's working, but it is too slow!

vim ~/miniconda3/envs/kipoi-gpu-shared__envs__kipoi-py3-keras2/lib/python3.6/site-packages/kipoi/

and change the chunk_size=10000 to 200

I try to use the others BED file, it's working!

But the CpGenie and DeepCpG ,these two still can't run.

CpGenie and DeepCpG expect fixed sequence lengths. If this is still a problem, please re-open the issue.