gdewael / cpg-transformer

CpG Transformer for imputation of single-cell methylomes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use of Bismark output for imputation

AST87 opened this issue · comments

commented

Hi Gaetan,
CpG-transformer is a great tool. I have been analyzing single-cell BSC data (from NMT-seq) and would like to impute some of the missing CpGs. However, I am unable to figure out a way to use cpg-transformer on bismark output file. I tried FastatoTable function on raw data to prepare input files but it gets complicated for paired-end sequencing.

Could you please suggest a way by which I could use the output from bismark as input to cpg-transformer? If not, could you please suggest how to use the imputed output from raw data in Bismark?

Hi, thanks for your interest in using CpG Transformer!

I am not a specialist in using Bismark, but I think you should be able to get a single CpG methylation file for every cell. (Look for the "Bismark methylation extractor" steps here).

Then, from these Bismark output files, you should try to construct a tab-separated file that looks like this. Where every row indicates a CpG Site, and every column with -1s, 0s, and 1s indicates a cell. This will require some manual programming, as I am not sure an automated script (from bismark to this tsv format) is a good choice as users may have wildly different experimental settings and needs.

Once you have this script, you can use our provided functions EncodeFromTsv.py to encode the methylation calls and EncodeGenome.py to encode the genome to NumPy formats for input to the model.

Let me know if this answers your questions!

commented

Thank you Gaetan for the solution.