loosolab / TOBIAS

Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Healp with CreateNetwork edges.txt interpretation

c2b2pss opened this issue · comments

I ran CreateNetwork on a batch of " _bound.bed" files and got

  1. adjancent.txt
  2. edges.txt
  3. a paths and edges file for each TF for which inputted the bed file.

The edges.txt file output is attached.

  1. What are the column names?
  2. Is this the file to use for looking at the whole network?
  3. Which column is source and which is target?
  4. At the last column there is are names of TF matched to my input mapping files. However, in colmn 4 ="Sites 3" the original HOCOMOCO names are still retained.

If you can please clarify which file to use to build the network and how the connections go it would be very helpful!
edges.txt

Any comments?

Hi @c2b2pss,

yes, I realize that the CreateNetwork tool can be confusing. So here is a bit more detail based on the example run given here. Please also see the image in this link as it summarizes the intention of this tool very well.

The CreateNetwork tool builds a TF-TF network based on given TF binding sites. For this to work, it needs two additional pieces of information: the gene origin (the gene that creates the TF) and the target gene (TF is bound in the promoter). The gene origin is provided through the --origin parameter. This is a two-column mapping file with the TF name (left) and the origin gene (right; see motif2gene_mapping.txt). The target genes are provided through a column within the *_bound.bed files. With this, you can run TOBIAS CreateNetwork to create four file types:

1. adjacency.txt

This file contains all direct connections between a source TF and its target TFs. It can be read as "Source TF binds in the promoter of Target TF" (Supplementary Methods of the TOBIAS paper) and is recommended to be used for visualization.

Source	Targets
AR	
ARNT	LIN54, ELF2, IRF2
...

2. edges.txt

This file contains the TF binding locations used to create the network. All *_bound.bed files are combined but filtered for sites that target genes with a known TF motif. Columns named Site_x come from the .bed files and Origin_x columns come from the TF-to-Gene mapping file (--origin).

Sites_0	Sites_1	Sites_2	Sites_3	Sites_4	Sites_5	Sites_6	Sites_7	Sites_8	Sites_9	Sites_10	Sites_11	Sites_12	Sites_13	Sites_14	Origin_0	Origin_1
CHR4	83013103	83013109	ARNT	8.10161	-	CHR4	83012435	83013425	BCELL,TCELL	.	.	ENSG00000189308	LIN54	25.46566	LIN54	ENSG00000189308
CHR4	139177963	139177969	ARNT	8.10161	-	CHR4	139176415	139178557	BCELL,TCELL	.	.	ENSG00000109381	ELF2	27.12301	ELF2	ENSG00000109381

3. *_path_edges.txt

Similar to adjacency.txt this file contains connections between TFs however, it is limited to only one source TF. The Level column provides whether the connection between two TFs is direct or indirect (see graph theory level).

Source	Target	Level
ARNT	LIN54	1
ARNT	IRF2	1
ARNT	ELF2	1

4. *_paths.txt

This file contains all paths with the respective TF. The n_nodes column gives the number of nodes (TFs) involved in any given path.

Regulatory_path	n_nodes
ARNT --> LIN54	2
ARNT --> ELF2	2
ARNT --> IRF2	2

I hope this clears things up!

Best wishes,
Hendrik

No activity for at least 30 days. Marking issue as stale. Stale issues are closed after one week.