Add Support for Directionality to MinCostFlow

Question

Add Support for Directionality to MinCostFlow

ntalluri opened this issue 7 months ago · comments

Neha Talluri commented 7 months ago

I'll be keeping my notes and ideas here.

Neha Talluri · Answer 1 · Thu Dec 28 2023 03:07:11 GMT+0800 (China Standard Time)

Idea 1

pass in the direction column from the universal input. This means in SPRAS we don't need to change anything, and can send in the original input into the mincostflow code.
in construct_digraph

construct a dictionary that will store the [edge: (rank, directionality)] (i might not need the rank)
read the directionality, and based on that either use one directed edge or a pair of directed edges
when there is an undirected edge, i think the dictionary will need have node1 -> node2 and node2->node1

after the mincostflow function is done running, we will return the edges chosen with its directionality. This will not have to be done in SPRAS. I think this will be done in the write_output_to_sif function.

using the dict, we can compare the edge chosen in the graph to the dict to see its flow direction, and return that edge if it has flow.

Neha Talluri · Answer 2 · Fri Dec 29 2023 02:52:58 GMT+0800 (China Standard Time)

Idea 2: Adjacency Matrix rather than a Dictionary

The dictionary's current challenge is that when there is an undirected edge, we have to record both pairs of directed edges in the dictionary. This is due to the fact that if one of the edges is picked but not the other, we cannot sort and then look up a single key to determine if it is directed or not, since we would lose information. There may also be strange edge cases where a user overwrites one of the edges in this pair of edges indicating an undirected edge and ends up making one of them directed.

An adjacency matrix simplifies this because when representing an undirected edge it is inherently symmetrical in the matrix, shown by marking both matrix[A][B] and matrix[B][A], making it easier to describe undirected edges and eliminating the need to store the edge twice This makes it easier to manage and maintain, and hopefully makes dealing with edge cases easier.

Neha Talluri · Answer 3 · Fri Dec 29 2023 03:08:19 GMT+0800 (China Standard Time)

edge cases:

A user overwriting an edge with a different directionality

Anthony Gitter · Answer 4 · Sat Dec 30 2023 22:49:30 GMT+0800 (China Standard Time)

Can you describe the edge case? When would this happen?

I don't know that we want to follow the design to closely, but Omics Integrator 1 also uses a similar idea to add back edge directions after the msgsteiner solver returns a network: https://github.com/fraenkel-lab/OmicsIntegrator/blob/0a57ede6beeef6e63b86d19898e560d62015e85d/scripts/forest.py#L670-L706 Even if you don't copy that design, it may give some ideas.

Neha Talluri · Answer 5 · Tue Jan 02 2024 07:39:15 GMT+0800 (China Standard Time)

I think what I mean by this would be if a user had

A B 1 U
B A 1 D

as the input, it would overwrite the dictionary.

edge 1: A B 1 U
dict: (A,B): U, (B,A): U

edge 2: B A 1 D
dict: (A,B): U, (B,A): D

I don't know if this fully would actually happen, but would we want to have some catch saying this is what is happening?

Neha Talluri · Answer 6 · Tue Jan 02 2024 09:36:35 GMT+0800 (China Standard Time)

Actually, the omics integrator code might fix the problem I was having, so I will just need to make a version of it that works well for the min cost flow code

Anthony Gitter · Answer 7 · Wed Jan 03 2024 06:56:05 GMT+0800 (China Standard Time)

If we have that scenario and the edges are unweighted, then we could make an assumption that the directed edge takes precedence. That is, we would check a directed edge dictionary first for (B,A) and find it, so we return that directed edge. That may vary depending on what underlying data structures you use. We would need to document this assumption.

The trickier thing is when edges have different properties like weights, which are supported here. We need to decide whether to give precedence based on weight or directionality.

Looking at the readme, we also should document that weighted edges are supported.

Neha Talluri · Answer 8 · Thu Jan 11 2024 05:18:36 GMT+0800 (China Standard Time)

global variables:

// key: (node1, node2), value: (directionality, weight)
directed_edges = dict()
undirected_edges = dict()

construct_digraph:

if d = D:
check for edge in directed_edges:
if weight is lower than newer edge, then replace with newer edges weight
check if edge in undirected_edges:
remove from undirected_edges, and place in directed_edges

otherwise add edge to directed_edges dictionary as (node1, node2) = (D, weight)

if d = U:
check for edge in directed_edges:
skip
check if edge in undirected_edges:
if weight is lower than newer edge, then replace with newer edges weight

otherwise add edge to undirected_edges dictionary as (node1, node2) = (U, weight)

for edge in each dictionary:
add edges based on directionality to G

write_output_to_sif:

for the edge chosen,
// node1 is the tail
// node2 is the head
check if (node1, node2) is in directed_edges dict:
write output with a directed edge
otherwise check if (node1, node2) or (node2, node1) is in the undirected_edges dict:
write output with an undirected edge

Anthony Gitter · Answer 9 · Thu Jan 11 2024 06:23:36 GMT+0800 (China Standard Time)

This logic looks good to me

Neha Talluri · Answer 10 · Thu Jan 18 2024 04:56:52 GMT+0800 (China Standard Time)

test cases for the new code:

all directed edges that are added to the directed_dict are unique
for duplicate directed edges, the higher edge weight is chosen
check that if the same edge that is both directed and undirected in the input, that the code prioritizes the directed edge
all undirected edges that are added to the undirected_dict are unique
an undirected edge is not added if a directed edge of the same edge in either direction already exists in directed_dict
for duplicate undirected edges, the higher edge weight is chosen
empty input