xiaoyeye / CNNC

covolutional neural network based coexpression analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can’t generate NEPDF data for the example sc-RNA seq data

JFF1594032292 opened this issue · comments

Hello!
I have a new problem: The get_xy_label_data_cnn_combine_from_database.py report an error when generate NEPDF data for the example sc-RNA seq h5 file
python CNNC/get_xy_label_data_cnn_combine_from_database.py \ None \ CNNC/data/sc_gene_list.txt \ CNNC/data/dendritic_gene_pairs_200.txt \ CNNC/data/dendritic_gene_pairs_200_num.txt \ None \ CNNC/Data_sources/dendritic_cell.h5 \ 1. The dendritic_cell.h5 was downloaded from
image
Then it reported errors:
image
So I opened the h5 file by h5py and changed the key from "RPKMs" to "rpkm". However, it also reported errors:
image
It works well when I run this command for the bulk RNA seq data "mouse_bulk.h5", so I wonder if the sc-RNA seq data should be different from bulk data or some other reasons?
Tanks a lot!

Hi,
It seems that the problem lies in the store function. I wonder what if you change to rpkm = store['RPKMs']?
Best

I edited the "get_xy_label_data_cnn_combine_from_database.py", then it runs well on example sc-RNA seq data.
However, when I use it to run on my own sc-RNA seq data, it reported the same error:
image
The sc-RNA seq .h5 file was generate from my own expression matrix with h5py module, and have the same structure with the example .h5 file.
image

Well. It is hard for me to debug based on other datasets. Generally speaking, the "store" function is used to read the expression data as a pd.DataFrame format, so I believe any function that is able to achieve this can be used. Of course, please pay attention to its columns name which is gene symbol. hope it can help.

Thanks!
I generate the h5 file with pd.HDFStore and it runs well.
Another question is about the get_xy_label_data_cnn_combine_from_database.py scripts:
image
I noticed the annotation and removed the "[0:43261]" as described, and I already know the 43261 means the number of samples in the sc data. However, I also noticed the "43261" with green box marked (130 and 136 lines) . So should I modify this number with my own sc-RNA data samples counts?

Great!

Yes, you can modify this number with your own data.
Best