poseidonchan / TAPE

Deep learning-based tissue compositions and cell-type-specific gene expression analysis with tissue-adaptive autoencoder (TAPE)

Home Page:https://sctape.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About program operation

solitudetuzi opened this issue · comments

Hello, I'm glad to learn this excellent paper. There is a problem with the running of this paper, which takes up your time. I hope to get your reply.
I am here https://sctape.readthedocs.io/usage/ See the usage of scTAPE on the link.
Two methods are described in the document. One is to run it in example.ipynb in the test folder of github's scTAPE project.
In the second way, I did not find similar use case files in the scTAPED project code, so I wrote test code by myself according to the two methods described in the document.
The code in the document is as follows:
"
Typically, users need to generate simulated data at first. This could be achieved through the following code:

generate simulated data

from TAPE.simulation import generate_simulated_data
simulated_data = generate_simulated_data(sc_ref,
outname=None, prop=None,
n=500, samplenum=5000)
Then, users could use the following to make predictions:

simulated data -> results

SignatureMatrix, CellFractionPrediction =
Deconvolution(simulated_data, bulkdata, sep='\t',
datatype='counts', genelenfile='./GeneLength.txt',
mode='overall', adaptive=True,
save_model_name=None,
batch_size=128, epochs=128)
"

According to the code in the document, I wrote the test case and passed in the input file. The code is as follows:

"
from TAPE.simulation import generate_simulated_data
simulated_data = generate_simulated_data('Kidney_ref.txt', outname=None, n=500, samplenum=5000)
SigmSignatureMatrix, CellFractionPrediction =
Deconvolution(simulated_data, bulkdata, sep='\t',
datatype='counts', genelenfile='./GeneLength.txt',
mode='overall', adaptive=True,
save_model_name=None,
batch_size=128, epochs=128)

"
But I haven't figured out what the file uploaded by this code should be 'Kidney_ref.txt' file is used as the input file, and the code has running errors such as
“ 133 sample[i] += sc_data[select_index].sum(axis=0)
134
135 prop = pd.DataFrame(prop, columns=celltype_groups.keys())

MemoryError: Unable to allocate 20.6 MiB for an array with shape (231, 23433) and data type float32

I would like to ask:

  1. What should the input file of the second test method be?
  2. An error occurred during the operation, which could not be solved after a long time of thinking.
  3. Could you please provide a test case similar to emaple.ipynb of the first operation mode for the second test mode(Second way: from simulated data to results) for learning.
    We look forward to your reply. Thank you very much.
    My email is 1510265602@163.com I would appreciate it if I could contact you

Hi,

I think you have understood the second way, the code you tested is right. The problem is MemoryError: Unable to allocate 20.6 MiB for an array with shape (231, 23433) and data type float32. I also used your code to test, but no error was raised. So I suggest you can change the test environment, maybe choose a larger memory server? The sampling process does not consume too much memory (about 3G in this case). Maybe you can check the memory usage on you machine.