beyondbeneath / bezier-curved-edges-networkx

Function to produce Bezier curves for the edges in a NetworkX graph

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Color edges by group

nbucklin opened this issue · comments

Hello,

First, I want to thank you for creating this and sharing it. I've been working on adding color to the graph based on spectral clustering. I mostly got the nodes to be colored by group and have been working on the edges. My plan was to create separate LineCollections per group and plot them individually with different colors. My approach was:

  1. Classify nodes into n groups
  2. Generate positions dictionary from ForceAtlas
  3. Make a new "sub network" out of each n group of nodes
  4. Make new "sub positions" dictionary from the positions dictionary for each n group
  5. Generate curves array based on the "sub network" and "sub positions"

I've been able to do this, but I'm having problems with the curved_edges function. It's this step that is causing problems:
coords = np.array([pos[x] for x in u]).reshape([edges.shape[0], 2, edges.shape[1]])
The resulting coords array just as the same x, y combo repeating over and over again. Could you tell me what this step is doing? Particularly the pos[x] part... can't figure that out.
Below is my full code, which should reproduce the issue. Thanks for your help!

import sys
import curved_edges
import networkx as nx
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
from fa2 import ForceAtlas2
from sklearn.cluster import SpectralClustering
import pandas as pd
import numpy as np

# Importing network as Pandas dataframe
network = pd.read_csv(r'/path/',sep=" ",header=None, names=['from','to'])
#network = network.rename(columns={"0":"from","1":"to"})

# Creating initial graph
G = nx.from_pandas_edgelist(network,'from','to',create_using=nx.Graph())

# Performing spectral clustering
matrix = nx.to_numpy_matrix(G)
sc = SpectralClustering(7, affinity='precomputed', n_init=100,assign_labels='discretize')
sc.fit(matrix)
clusters = pd.DataFrame({'group':sc.labels_})

# Rearraging graph for coloring
clusters = clusters.reindex(G.nodes())

# Running Force Atlas
forceatlas2 = ForceAtlas2()
positions = forceatlas2.forceatlas2_networkx_layout(G,pos=None,iterations=50)

# Making a new network based on selected cluster
network_sub = pd.merge(network,clusters[clusters['group'] ==3],left_on='to',right_index=True,how='inner').drop(['group'],axis=1).reset_index(drop=True)

# Making a list that has the nodes we want to keep
node_list = list(dict.fromkeys(network_sub['to'].tolist() + network_sub['from'].tolist()))

# Getting the subset of the Atlast positions dict
positions_sub = {k:v for k, v in positions.items() for k in node_list}

# Loading a new network based on the subset
G_sub = nx.from_pandas_edgelist(network_sub,'from','to',create_using=nx.Graph())
print(G_sub.number_of_nodes())

curves = curved_edges(G_sub,positions_sub)

I figured it out! The issue I was having was with selecting the nodes I need from the positions dictionary. The full code, along with a picture of the generated graph, are below. There are still a few bugs to work out. For example, I don't actually want to draw the nodes, just the line collections. As you can see, I've made the nodes transparent. But I can't figure out how to do this. Also if you run the plot function alone, you get an error message. The whole program needs to be ran at once for the function to work.

import sys
import networkx as nx
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
from fa2 import ForceAtlas2
from sklearn.cluster import SpectralClustering
from curved_edges import curved_edges
import pandas as pd


# Number of clusters
n = 7
loop_list = list(range(0,n))

colors = ['#FF3333','#FF9333','#33FF36','#33FFC7','#33F3FF','#33A8FF','#F333FF','#FF3361']
# colors = ['b','g','r','c','m','y','k']
colors = colors[:n]

# Importing network as Pandas dataframe
network = pd.read_csv(r'/path/',sep=" ",header=None, names=['from','to'])
#network = network.rename(columns={"0":"from","1":"to"})

# Creating initial graph
G = nx.from_pandas_edgelist(network,'from','to',create_using=nx.Graph())

# Performing spectral clustering
matrix = nx.to_numpy_matrix(G)
sc = SpectralClustering(7, affinity='precomputed', n_init=100,assign_labels='discretize')
sc.fit(matrix)
clusters = pd.DataFrame({'group':sc.labels_})

# Rearraging graph for coloring
clusters = clusters.reindex(G.nodes())

# Running Force Atlas
forceatlas2 = ForceAtlas2()
positions = forceatlas2.forceatlas2_networkx_layout(G,pos=None,iterations=50)

lc_dict = {}
for n, c in zip(loop_list,colors):
# Making a new network based on selected cluster
    network_sub = pd.merge(network,clusters[clusters['group'] == n],left_on='to',right_index=True,how='inner').drop(['group'],axis=1).reset_index(drop=True)
    
    # Making a list that has the nodes we want to keep
    node_list = list(dict.fromkeys(network_sub['to'].tolist() + network_sub['from'].tolist()))
    
    # Getting the subset of the atlast positions dict
    positions_sub = {k:v for k, v in positions.items() if k in node_list}
    
    # Loading a new network based on the subset
    G_sub = nx.from_pandas_edgelist(network_sub,'from','to',create_using=nx.Graph())
    print(G_sub.number_of_nodes())
    
    curves = curved_edges(G_sub,positions_sub)
    lc = LineCollection(curves,color=c,alpha=.05)
    
    lc_dict[n] = lc
        
# Plot
def plot():
    plt.figure(figsize=(10,10))
    plt.gca().set_facecolor('k')
    nx.draw_networkx_nodes(G, positions, node_size=5, node_color='w', alpha=0)
    for key,value in lc_dict.items():
        plt.gca().add_collection(lc_dict[key])
    plt.tick_params(axis='both',which='both',bottom=False,left=False,labelbottom=False,labelleft=False)
    plt.show()

plot()

Screen Shot 2019-05-31 at 12 21 19 PM

Hi @nbucklin , thanks for testing this out, I am glad you're finding it useful! Sorry for the delay, I missed any notifications from this and only just saw it.

A few things to unpack here, if I'm not too late:

  1. To not plot the nodes, just remove that line! The reason you're having a problem is because adding a LineCollection to a plot unfortunately doesn't trigger the auto x and y axis scales, and because the axis display is being removed, you can't see this is happening (if it was on, you would notice what is being displayed is between 0 and 1 in both x and y, when the actual lines are being plotted between something like -5000 and +5000). In order to fix it, just run something like plt.axis('tight') before plt.show().

  2. You get an error with the plot() function only (I believe) if you re-run it. This is because you can't add a LineCollection to more than one figure. This is annoying, I know - maybe a way around is to make the LineCollection each time you plot. So in other words, store the list returned from curved_edges in your loop (as well as creating a new place to store the colours):

curves = curved_edges.curved_edges(G_sub,positions_sub)
lc_dict[n] = [curves, c]

then dynamically generate your LineCollection in the plot() function:

for key,value in lc_dict.items():
    plt.gca().add_collection(LineCollection(lc_dict[key][0],color=lc_dict[key][1],alpha=.05))
  1. Your initial question about the coords line... essentially what that does is convert the edge list (which is a list like [node1_id, node2_id]) into the actual coordinates of each node, so something like [(node1_x, node1_y), (node2_x, node2_y)]. It is pretty obscure, but it is done that way for speed to avoid having to look up each node more than once (it would be much clearer if I had put the code in a loop). pos[x] is the position of node x (pos is the output of the ForceAtlas call). Hope that makes sense.

Thanks for getting back to me! I tried your recommendations for plotting the LineCollections only and removing the plot() error - both worked! Also thanks for describing the pos[x] command... I wasn't understanding that pos came from the ForceAtlas function.