vanallenlab / EnsemblToHGNC

This script takes in a tab-separated file containing at least one column of Ensembl IDs and a string indicating the header for this column, and outputs a tab-separated file identical to the input file except that it has an additional column containing mapped HGNC gene symbols for each row.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

EnsemblToHGNC

This script takes in a tab-separated file containing at least one column of Ensembl gene or transcript IDs (ENSG/ENST) and a string indicating the header for this column, and outputs a tab-separated file identical to the input file except that it has an additional column containing mapped HGNC gene symbols for each row.

The ENSG-HGNC symbol mapping is derived from downloads at this site: https://www.genenames.org/cgi-bin/download. Data is based on Ensembl release 92, from April 2018.

The ENSG-ENST symbol mapping is derived from Ensembl's biomart: http://useast.ensembl.org/biomart/. Data is based on Ensemble release 92, from April 2018.


Example usages:

  1. Assuming output path is location of input file

python EnsemblToHGNC.py /path/to/file_containing_ensembl_column --ensg_header/--enst_header <ensembl_column_header>

  1. Specifying output path

python EnsemblToHGNC.py /path/to/file_containing_ensembl_column --ensg_header/--enst_header <ensembl_column_header> --output_path /path/to/output/location

About

This script takes in a tab-separated file containing at least one column of Ensembl IDs and a string indicating the header for this column, and outputs a tab-separated file identical to the input file except that it has an additional column containing mapped HGNC gene symbols for each row.

License:GNU General Public License v2.0


Languages

Language:Python 100.0%