gitter-lab / min-cost-flow

Pathway reconstruction with a minimum cost flow algorithm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add Support for Directionality to MinCostFlow

ntalluri opened this issue · comments

I'll be keeping my notes and ideas here.

Idea 1

  1. pass in the direction column from the universal input. This means in SPRAS we don't need to change anything, and can send in the original input into the mincostflow code.
  2. in construct_digraph
  • construct a dictionary that will store the [edge: (rank, directionality)] (i might not need the rank)
  • read the directionality, and based on that either use one directed edge or a pair of directed edges
  • when there is an undirected edge, i think the dictionary will need have node1 -> node2 and node2->node1
  1. after the mincostflow function is done running, we will return the edges chosen with its directionality. This will not have to be done in SPRAS. I think this will be done in the write_output_to_sif function.
  • using the dict, we can compare the edge chosen in the graph to the dict to see its flow direction, and return that edge if it has flow.

Idea 2: Adjacency Matrix rather than a Dictionary

The dictionary's current challenge is that when there is an undirected edge, we have to record both pairs of directed edges in the dictionary. This is due to the fact that if one of the edges is picked but not the other, we cannot sort and then look up a single key to determine if it is directed or not, since we would lose information. There may also be strange edge cases where a user overwrites one of the edges in this pair of edges indicating an undirected edge and ends up making one of them directed.

An adjacency matrix simplifies this because when representing an undirected edge it is inherently symmetrical in the matrix, shown by marking both matrix[A][B] and matrix[B][A], making it easier to describe undirected edges and eliminating the need to store the edge twice This makes it easier to manage and maintain, and hopefully makes dealing with edge cases easier.

edge cases:

  1. A user overwriting an edge with a different directionality

Can you describe the edge case? When would this happen?

I don't know that we want to follow the design to closely, but Omics Integrator 1 also uses a similar idea to add back edge directions after the msgsteiner solver returns a network: https://github.com/fraenkel-lab/OmicsIntegrator/blob/0a57ede6beeef6e63b86d19898e560d62015e85d/scripts/forest.py#L670-L706 Even if you don't copy that design, it may give some ideas.

I think what I mean by this would be if a user had

A B 1 U
B A 1 D

as the input, it would overwrite the dictionary.

edge 1: A B 1 U
dict: (A,B): U, (B,A): U

edge 2: B A 1 D
dict: (A,B): U, (B,A): D

I don't know if this fully would actually happen, but would we want to have some catch saying this is what is happening?

Actually, the omics integrator code might fix the problem I was having, so I will just need to make a version of it that works well for the min cost flow code

If we have that scenario and the edges are unweighted, then we could make an assumption that the directed edge takes precedence. That is, we would check a directed edge dictionary first for (B,A) and find it, so we return that directed edge. That may vary depending on what underlying data structures you use. We would need to document this assumption.

The trickier thing is when edges have different properties like weights, which are supported here. We need to decide whether to give precedence based on weight or directionality.

Looking at the readme, we also should document that weighted edges are supported.

global variables:

// key: (node1, node2), value: (directionality, weight)
directed_edges = dict()
undirected_edges = dict()

construct_digraph:

if d = D:
check for edge in directed_edges:
if weight is lower than newer edge, then replace with newer edges weight
check if edge in undirected_edges:
remove from undirected_edges, and place in directed_edges

otherwise add edge to directed_edges dictionary as (node1, node2) = (D, weight)

if d = U:
check for edge in directed_edges:
skip
check if edge in undirected_edges:
if weight is lower than newer edge, then replace with newer edges weight

otherwise add edge to undirected_edges dictionary as (node1, node2) = (U, weight)

for edge in each dictionary:
add edges based on directionality to G

write_output_to_sif:

for the edge chosen,
// node1 is the tail
// node2 is the head
check if (node1, node2) is in directed_edges dict:
write output with a directed edge
otherwise check if (node1, node2) or (node2, node1) is in the undirected_edges dict:
write output with an undirected edge

This logic looks good to me

test cases for the new code:

  1. all directed edges that are added to the directed_dict are unique
  2. for duplicate directed edges, the higher edge weight is chosen
  3. check that if the same edge that is both directed and undirected in the input, that the code prioritizes the directed edge
  4. all undirected edges that are added to the undirected_dict are unique
  5. an undirected edge is not added if a directed edge of the same edge in either direction already exists in directed_dict
  6. for duplicate undirected edges, the higher edge weight is chosen
  7. empty input