KewalinSamart / CPBS7711_module3

From a population of solutions for a given set of loci, score the genes on the loci using the method in Tasan et al.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Gene Scoring

Given a population of Prix Fixe (PF) solutions derived from a set of loci, this program scores the genes on the loci using the method in Tasan et al. and visualize the final solution.

Command

python3 final_scored_solution.py -solutions [solutions file name] -num_loci [number of loci] -network [network file name] -output_dir [path to output directory] -score_cutoff [score cutoff]    

Arguments

  • args[0] -solutions (string) solutions file name; set to toy_loci_set.txt by default
  • args[1] -num_loci (int) number of loci; set to 12 by default
  • args[2] -network (string) network file name txt (tab separated) file containing gene-gene interaction network (undirected; can be weighted/unweighted, but - weights will not be used in gene scoring); set to STRING_network.txt by default
  • args[3] -output_dir (string) path to store final output; set to 'example_result/final_solution.txt'
  • args[4] -score_cutoff (float) score cutoff for circular-layout (via Networkx) solution visualization; set to 0.25 by default

Example command

python3 final_scored_solution.py toy_loci_set.txt 12 STRING_network.txt 'example_result/final_solution.txt' 0.25

Details of Inputs/Outputs

Inputs

  1. Undirected un/weighted network file in .txt format (\t separated) containing gene-gene interactions.
  • column 1&2: gene names
  • column 3: interaction strengths range from 0 to 1.
  • Example input STRING_network.txt –– weighted network:
BLOC1S6 BLOC1S3	  0.24
RAB3D   CHML      0.847
MYL7    MYO15A    0.842
...

NOTE: For unweighted network, network file would be exact the same format as above but without the weight column.

  1. A population of PF solutions; .txt (\t separated) file containing n+1 columns where each column is named 0,1,...,n indicating locus index and contains their corresponding chosen genes (one gene per locus for a single solution (i.e. row)); number of rows = number of PF solutions
  • Example input toy_loci_set.txt
0 1 2 3 4 5 6 7 8 9 10 11
XPO6 FBXO3 C17orf64 CORO2A DEF8 CNTN2 RPL32 FANCE FREM2 PARN ANPEP CDIP1
KDM8 CAT DDX5 NR4A3 GAS8 PPFIA4 BRK1 TULP1 SERTM1 RPS15A MESP1 RP11-127I20.4
PLK1 SVIP GDPD1 NANS AC133919.6 SOX13 FANCD2 SLC26A8 SERTM1 NPIPP1 SEMA4B HMOX2
...

Outputs

  1. A .txt (\t separated) file containing genes, their final scores, and the locus that each gene belongs to
  • column 1: gene names

  • column 2: final gene score; higher scores indicate better contributions

  • column 3: associated locus (locus index)

  • Example output final_solution.txt

gene     score                 locus
PALB2    0.13280434328669868   0
NUPR1    0.02125051082958725   0
SLC5A11  0.016724454415663878  0
CCDC73   0.0                   1
CAPRIN1  0.02042483660130719   1
RCN1     0.05896805896805897   1
...
  1. Solution visualization in png format

Visualization details:

  • Each circle represents a gene

  • Different colors indicate different loci that the genes belong to

  • Circle's size represents their final gene score: a higher score, a bigger circle

  • Edges are gene-gene interaction based on the input network

    2.1 full final solution with no score cutoff applied –– kamada kawai layout

    • Example output example_result/example_finalsol_kkviz.png: alt text 2.2 highest-scores solution with a score cutoff (0.25 by default) –– circular layout
    • Example output example_result/example_finalsol_ccviz.png: alt text
  1. Final solution subnetwork in .json for user to customize the visualization via the interactive tool Cytoscape
  • Example output example_finalsol_json.js:
{"data": [], "directed": false, "multigraph": false, "elements": {"nodes": [{"data": {"locus": 8, "score": 0.0358671285918076, "color": "#64678B", 
"id": "AKAP11", "value": "AKAP11", "name": "AKAP11"}}, {"data": {"locus": 11, "score": 0.1770076779414816, "color": "#D5A612", "id": "SLX4", 
"value": "SLX4", "name": "SLX4"}}

Installation and Dependencies

  • Python 3.8.3
  • pandas 1.5.0
  • numpy 1.23.3
  • argparse 1.4.0
  • matplotlib 3.5.1
  • networkx 2.6
  • itertools 8.4.0

About

From a population of solutions for a given set of loci, score the genes on the loci using the method in Tasan et al.

License:MIT License


Languages

Language:Python 100.0%