The code is used to analyze the targeted bisulfite sequencing data for colorectal cancer diagnosis. THe code was consisted of three parts.
- The codes to combine the CGmap files generated by the BSseeker2. (CGMapMerge.R and CpGFilter.R);
- CpGsite2Gene_byMean.R: Compute the mean methylation ratio of multiple CpG sites in a gene to represent the methylation level of the gene.
- MethylationStatus.R: Performs the analysis to show the OR, Confidence Interval, and P.value, Sensitivity, Specificity, AUC of each Gene with logistic regression.
- MethylationBoxplot.R: plot the boxplot.
- MethylCurve.R: plot the methylation curve plot.
- ROCcurve.R: plot the ROC curve.
- MLresult.R: Performs a five-fold cross validation of the methylation data using ten different kinds of machine learning methods.
- ML_classfiers_Table.R: Obtain the results from MLresult.R and generated the mean sensitivity, specificity and accuracy for the training and test data.
- Subgroup_Compare.R: Compare the diagnostic abilities of the biomarkers in different subgroups.
- Preliminary_Analysis.Rmd: The main procedures to identify the potential ZFGs for CRC diagnosis.
- DiscoveryDataAnalysis.Rmd: The main procedures to show the methylation and expression changes of these ZFGs in the discovery dataset.
- ValidationAnalysis.Rmd: The main procedures for analyzing the validation datasets.