ZacharyWang-007 / TCGA_xml2csv

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

parseTCGAClinicalXML.py

a small python program for parsing TCGA clinical XML files to CSV format table

Install

  • Just download parseTCGAClinicalXML.py and put it in the directory where you want to run it.

Run

  • run from command line

generate txt with xml files

find . -name "*.xml" -type f > xml_files.txt
python2 parseTCGAClinicalXML.py 'output.csv' 'input_xml_file_list.txt' 'patient'or'followup'or'survival'

replace 'output.csv' with the name you want for your output csv file

'input_xml_file_list.txt' is a text file containing paths to TCGA clinical xml files

use either one of 'patient', 'followup' or 'survival' as the third input variable to output different clinical data of patients.

sample/ directory contains some samples of input and output

  • import as a python module and use functions inside parseTCGAClinicalXML.py

About


Languages

Language:Python 100.0%