kipoi / models

Model zoo for genomics

Home Page:http://kipoi.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A error of Basenji's example :ValueError: Cannot feed value of shape (10, 131072, 4) for Tensor 'inputs:0', which has shape '(2, 131072, 4)'

Licko0909 opened this issue · comments

kipoi test Basenji --source=kipoi

ValueError: Cannot feed value of shape (10, 131072, 4) for Tensor 'inputs:0', which has shape '(2, 131072, 4)'

image

I tried to use basenji in 'examples/1-predict` to make predictions.

ValueError: Cannot feed value of shape (32,131072, 4) for Tensor 'inputs:0', which has shape '(2, 131072, 4)

so I changed it to batch size=2, and then there are other problems:

Can I only use batch size= 2?

image
image

I want to use basenji for some other predictions. Thank you!

Yes, you can only use batch size of 2 with Basenji since that's how the model was serialized.

Seems that it can't store all the predictions into the hdf5 file due to large chunk_size. I suggest you implement the prediction loop in python yourself for basenji (see https://github.com/kipoi/kipoi/blob/master/kipoi/cli/main.py#L239-L281) and specify a lower chunk_size of the HDF5Writer explicitly (https://github.com/kipoi/kipoi/blob/master/kipoi/writers.py#L286). You could also try using the AsyncBatchWriter or the ZarrBatchWriter to get faster write speeds.

For 1-predict, I also tried CpGenie (keras1.2) and DeepCpG (keras1.2) and DeepSEA (keras2) mentioned in snakemake, but all have problems.

models = ['CpGenie/merged', 'DeepCpG_DNA/Hou2016_mESC_dna']

image

models = ['DeepSEA/predict']

image

-------------------------------------------------------------------------------------------------------------------
"""Run `kipoi predict` for multiple models
"""
import kipoi

# --------------------------------------------
# Config
fasta_file = 'input/hg19.chr22.fa'

# which bed files to run
# intervals = ['random', 'enhancer-regions']
intervals = ['random', 'enhancer-regions']

# get all DeepBind models in trained on human ChIP-seq
df = kipoi.list_models()
deepbind_models = df.model[df.model.str.match("DeepBind/Homo_sapiens/TF/.*_ChIP-seq.*")].tolist()
assert len(deepbind_models) == 137

# which models to use
#models = ['Basenji'] + ['Basset'] + deepbind_models[:5] # + ['DeepSEA/predict']
models = deepbind_models[:5]  + ['DeepSEA/predict']

# You can also use the following two, but you have to install the environment
# `kipoi env create shared/envs/kipoi-py3-keras1.2`
# ['CpGenie/merged', 'DeepCpg_DNA/Hou2016_mESC_dna']
#models = ['CpGenie/merged', 'DeepCpG_DNA/Hou2016_mESC_dna']

# output file formats
file_formats = ['tsv', 'h5']
# --------------------------------------------

rule all:
    input:
        expand('output/{model}/{interval}.{ext}',
               model=models,
               interval=intervals,
               ext=file_formats)

# Main rule
rule predict:
    """Generic rule for running model prediction for Kipoi models
    that take as input `intervals_file` and `fasta_file`
    """
    input:
        intervals_file = "input/{interval}.hg19.chr22.bed.gz",
        fasta_file = fasta_file
    output:
        predictions = expand("output/{{model}}/{{interval}}.{ext}", ext=file_formats)
    params:
        workers = 20,  # number of workers,
        batch_size = 12
    shell:
        """
        source activate $(kipoi env get {wildcards.model})
        kipoi predict \
          {wildcards.model} \
          --dataloader_args='{{"intervals_file": "{input.intervals_file}",
                              "fasta_file": "{input.fasta_file}"}}' \
          -n {params.workers} \
          --batch_size={params.batch_size} \
          -o {output.predictions}
        """

rule unzip:
    input:
        fa_gz = fasta_file + ".gz"
    output:
        fa = fasta_file
    shell:
        "zcat {input.fa_gz} > {output.fa}"
        

Hm strange. Make sure the intervals are not from the edge of the chromosome which would yield shorter sequencs.

Yes, you can only use batch size of 2 with Basenji since that's how the model was serialized.

Seems that it can't store all the predictions into the hdf5 file due to large chunk_size. I suggest you implement the prediction loop in python yourself for basenji (see https://github.com/kipoi/kipoi/blob/master/kipoi/cli/main.py#L239-L281) and specify a lower chunk_size of the HDF5Writer explicitly (https://github.com/kipoi/kipoi/blob/master/kipoi/writers.py#L286). You could also try using the AsyncBatchWriter or the ZarrBatchWriter to get faster write speeds.

Thanks you! It's working, but it is too slow!

vim ~/miniconda3/envs/kipoi-gpu-shared__envs__kipoi-py3-keras2/lib/python3.6/site-packages/kipoi/writers.py

and change the chunk_size=10000 to 200
image

Hm strange. Make sure the intervals are not from the edge of the chromosome which would yield shorter sequencs.

I try to use the others BED file, it's working!

Hm strange. Make sure the intervals are not from the edge of the chromosome which would yield shorter sequencs.

But the CpGenie and DeepCpG ,these two still can't run.

CpGenie and DeepCpG expect fixed sequence lengths. If this is still a problem, please re-open the issue.