A error of Basenji's example ：ValueError: Cannot feed value of shape (10, 131072, 4) for Tensor 'inputs:0', which has shape '(2, 131072, 4)'

Question

A error of Basenji's example ：ValueError: Cannot feed value of shape (10, 131072, 4) for Tensor 'inputs:0', which has shape '(2, 131072, 4)'

Licko0909 opened this issue 5 years ago · comments

Licko0909 commented 5 years ago

kipoi test Basenji --source=kipoi

ValueError: Cannot feed value of shape (10, 131072, 4) for Tensor 'inputs:0', which has shape '(2, 131072, 4)'

Licko0909 · Answer 1 · Sun Apr 28 2019 21:57:56 GMT+0800 (China Standard Time)

I tried to use basenji in 'examples/1-predict` to make predictions.

ValueError: Cannot feed value of shape (32,131072, 4) for Tensor 'inputs:0', which has shape '(2, 131072, 4)

so I changed it to batch size=2, and then there are other problems：

Can I only use batch size= 2?

I want to use basenji for some other predictions. Thank you！

Žiga Avsec · Answer 2 · Sun Apr 28 2019 22:07:43 GMT+0800 (China Standard Time)

Yes, you can only use batch size of 2 with Basenji since that's how the model was serialized.

Seems that it can't store all the predictions into the hdf5 file due to large chunk_size. I suggest you implement the prediction loop in python yourself for basenji (see https://github.com/kipoi/kipoi/blob/master/kipoi/cli/main.py#L239-L281) and specify a lower chunk_size of the HDF5Writer explicitly (https://github.com/kipoi/kipoi/blob/master/kipoi/writers.py#L286). You could also try using the AsyncBatchWriter or the ZarrBatchWriter to get faster write speeds.

Licko0909 · Answer 3 · Sun Apr 28 2019 22:08:31 GMT+0800 (China Standard Time)

For 1-predict, I also tried CpGenie (keras1.2) and DeepCpG (keras1.2) and DeepSEA (keras2) mentioned in snakemake, but all have problems.

models = ['CpGenie/merged', 'DeepCpG_DNA/Hou2016_mESC_dna']

models = ['DeepSEA/predict']

-------------------------------------------------------------------------------------------------------------------
"""Run `kipoi predict` for multiple models
"""
import kipoi

# --------------------------------------------
# Config
fasta_file = 'input/hg19.chr22.fa'

# which bed files to run
# intervals = ['random', 'enhancer-regions']
intervals = ['random', 'enhancer-regions']

# get all DeepBind models in trained on human ChIP-seq
df = kipoi.list_models()
deepbind_models = df.model[df.model.str.match("DeepBind/Homo_sapiens/TF/.*_ChIP-seq.*")].tolist()
assert len(deepbind_models) == 137

# which models to use
#models = ['Basenji'] + ['Basset'] + deepbind_models[:5] # + ['DeepSEA/predict']
models = deepbind_models[:5]  + ['DeepSEA/predict']

# You can also use the following two, but you have to install the environment
# `kipoi env create shared/envs/kipoi-py3-keras1.2`
# ['CpGenie/merged', 'DeepCpg_DNA/Hou2016_mESC_dna']
#models = ['CpGenie/merged', 'DeepCpG_DNA/Hou2016_mESC_dna']

# output file formats
file_formats = ['tsv', 'h5']
# --------------------------------------------

rule all:
    input:
        expand('output/{model}/{interval}.{ext}',
               model=models,
               interval=intervals,
               ext=file_formats)

# Main rule
rule predict:
    """Generic rule for running model prediction for Kipoi models
    that take as input `intervals_file` and `fasta_file`
    """
    input:
        intervals_file = "input/{interval}.hg19.chr22.bed.gz",
        fasta_file = fasta_file
    output:
        predictions = expand("output/{{model}}/{{interval}}.{ext}", ext=file_formats)
    params:
        workers = 20,  # number of workers,
        batch_size = 12
    shell:
        """
        source activate $(kipoi env get {wildcards.model})
        kipoi predict \
          {wildcards.model} \
          --dataloader_args='{{"intervals_file": "{input.intervals_file}",
                              "fasta_file": "{input.fasta_file}"}}' \
          -n {params.workers} \
          --batch_size={params.batch_size} \
          -o {output.predictions}
        """

rule unzip:
    input:
        fa_gz = fasta_file + ".gz"
    output:
        fa = fasta_file
    shell:
        "zcat {input.fa_gz} > {output.fa}"

Žiga Avsec · Answer 4 · Sun Apr 28 2019 22:10:28 GMT+0800 (China Standard Time)

Hm strange. Make sure the intervals are not from the edge of the chromosome which would yield shorter sequencs.

Licko0909 · Answer 5 · Sun Apr 28 2019 22:38:47 GMT+0800 (China Standard Time)

Yes, you can only use batch size of 2 with Basenji since that's how the model was serialized.

Seems that it can't store all the predictions into the hdf5 file due to large chunk_size. I suggest you implement the prediction loop in python yourself for basenji (see https://github.com/kipoi/kipoi/blob/master/kipoi/cli/main.py#L239-L281) and specify a lower chunk_size of the HDF5Writer explicitly (https://github.com/kipoi/kipoi/blob/master/kipoi/writers.py#L286). You could also try using the AsyncBatchWriter or the ZarrBatchWriter to get faster write speeds.

Thanks you! It's working, but it is too slow!

vim ~/miniconda3/envs/kipoi-gpu-shared__envs__kipoi-py3-keras2/lib/python3.6/site-packages/kipoi/writers.py

and change the chunk_size=10000 to 200

Licko0909 · Answer 6 · Sun Apr 28 2019 23:01:14 GMT+0800 (China Standard Time)

Hm strange. Make sure the intervals are not from the edge of the chromosome which would yield shorter sequencs.

I try to use the others BED file, it's working!

Licko0909 · Answer 7 · Sun Apr 28 2019 23:02:21 GMT+0800 (China Standard Time)

Hm strange. Make sure the intervals are not from the edge of the chromosome which would yield shorter sequencs.

But the CpGenie and DeepCpG ,these two still can't run.

Julien Gagneur · Answer 8 · Thu Nov 07 2019 00:53:40 GMT+0800 (China Standard Time)

CpGenie and DeepCpG expect fixed sequence lengths. If this is still a problem, please re-open the issue.