Color edges by group
nbucklin opened this issue · comments
Hello,
First, I want to thank you for creating this and sharing it. I've been working on adding color to the graph based on spectral clustering. I mostly got the nodes to be colored by group and have been working on the edges. My plan was to create separate LineCollections
per group and plot them individually with different colors. My approach was:
- Classify nodes into
n
groups - Generate
positions
dictionary from ForceAtlas - Make a new "sub network" out of each
n
group of nodes - Make new "sub positions" dictionary from the
positions
dictionary for eachn
group - Generate curves array based on the "sub network" and "sub positions"
I've been able to do this, but I'm having problems with the curved_edges
function. It's this step that is causing problems:
coords = np.array([pos[x] for x in u]).reshape([edges.shape[0], 2, edges.shape[1]])
The resulting coords
array just as the same x, y combo repeating over and over again. Could you tell me what this step is doing? Particularly the pos[x]
part... can't figure that out.
Below is my full code, which should reproduce the issue. Thanks for your help!
import sys
import curved_edges
import networkx as nx
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
from fa2 import ForceAtlas2
from sklearn.cluster import SpectralClustering
import pandas as pd
import numpy as np
# Importing network as Pandas dataframe
network = pd.read_csv(r'/path/',sep=" ",header=None, names=['from','to'])
#network = network.rename(columns={"0":"from","1":"to"})
# Creating initial graph
G = nx.from_pandas_edgelist(network,'from','to',create_using=nx.Graph())
# Performing spectral clustering
matrix = nx.to_numpy_matrix(G)
sc = SpectralClustering(7, affinity='precomputed', n_init=100,assign_labels='discretize')
sc.fit(matrix)
clusters = pd.DataFrame({'group':sc.labels_})
# Rearraging graph for coloring
clusters = clusters.reindex(G.nodes())
# Running Force Atlas
forceatlas2 = ForceAtlas2()
positions = forceatlas2.forceatlas2_networkx_layout(G,pos=None,iterations=50)
# Making a new network based on selected cluster
network_sub = pd.merge(network,clusters[clusters['group'] ==3],left_on='to',right_index=True,how='inner').drop(['group'],axis=1).reset_index(drop=True)
# Making a list that has the nodes we want to keep
node_list = list(dict.fromkeys(network_sub['to'].tolist() + network_sub['from'].tolist()))
# Getting the subset of the Atlast positions dict
positions_sub = {k:v for k, v in positions.items() for k in node_list}
# Loading a new network based on the subset
G_sub = nx.from_pandas_edgelist(network_sub,'from','to',create_using=nx.Graph())
print(G_sub.number_of_nodes())
curves = curved_edges(G_sub,positions_sub)
I figured it out! The issue I was having was with selecting the nodes I need from the positions
dictionary. The full code, along with a picture of the generated graph, are below. There are still a few bugs to work out. For example, I don't actually want to draw the nodes, just the line collections. As you can see, I've made the nodes transparent. But I can't figure out how to do this. Also if you run the plot
function alone, you get an error message. The whole program needs to be ran at once for the function to work.
import sys
import networkx as nx
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
from fa2 import ForceAtlas2
from sklearn.cluster import SpectralClustering
from curved_edges import curved_edges
import pandas as pd
# Number of clusters
n = 7
loop_list = list(range(0,n))
colors = ['#FF3333','#FF9333','#33FF36','#33FFC7','#33F3FF','#33A8FF','#F333FF','#FF3361']
# colors = ['b','g','r','c','m','y','k']
colors = colors[:n]
# Importing network as Pandas dataframe
network = pd.read_csv(r'/path/',sep=" ",header=None, names=['from','to'])
#network = network.rename(columns={"0":"from","1":"to"})
# Creating initial graph
G = nx.from_pandas_edgelist(network,'from','to',create_using=nx.Graph())
# Performing spectral clustering
matrix = nx.to_numpy_matrix(G)
sc = SpectralClustering(7, affinity='precomputed', n_init=100,assign_labels='discretize')
sc.fit(matrix)
clusters = pd.DataFrame({'group':sc.labels_})
# Rearraging graph for coloring
clusters = clusters.reindex(G.nodes())
# Running Force Atlas
forceatlas2 = ForceAtlas2()
positions = forceatlas2.forceatlas2_networkx_layout(G,pos=None,iterations=50)
lc_dict = {}
for n, c in zip(loop_list,colors):
# Making a new network based on selected cluster
network_sub = pd.merge(network,clusters[clusters['group'] == n],left_on='to',right_index=True,how='inner').drop(['group'],axis=1).reset_index(drop=True)
# Making a list that has the nodes we want to keep
node_list = list(dict.fromkeys(network_sub['to'].tolist() + network_sub['from'].tolist()))
# Getting the subset of the atlast positions dict
positions_sub = {k:v for k, v in positions.items() if k in node_list}
# Loading a new network based on the subset
G_sub = nx.from_pandas_edgelist(network_sub,'from','to',create_using=nx.Graph())
print(G_sub.number_of_nodes())
curves = curved_edges(G_sub,positions_sub)
lc = LineCollection(curves,color=c,alpha=.05)
lc_dict[n] = lc
# Plot
def plot():
plt.figure(figsize=(10,10))
plt.gca().set_facecolor('k')
nx.draw_networkx_nodes(G, positions, node_size=5, node_color='w', alpha=0)
for key,value in lc_dict.items():
plt.gca().add_collection(lc_dict[key])
plt.tick_params(axis='both',which='both',bottom=False,left=False,labelbottom=False,labelleft=False)
plt.show()
plot()
Hi @nbucklin , thanks for testing this out, I am glad you're finding it useful! Sorry for the delay, I missed any notifications from this and only just saw it.
A few things to unpack here, if I'm not too late:
-
To not plot the nodes, just remove that line! The reason you're having a problem is because adding a
LineCollection
to a plot unfortunately doesn't trigger the autox
andy
axis scales, and because the axis display is being removed, you can't see this is happening (if it was on, you would notice what is being displayed is between0
and1
in bothx
andy
, when the actual lines are being plotted between something like-5000
and+5000
). In order to fix it, just run something likeplt.axis('tight')
beforeplt.show()
. -
You get an error with the
plot()
function only (I believe) if you re-run it. This is because you can't add aLineCollection
to more than one figure. This is annoying, I know - maybe a way around is to make theLineCollection
each time you plot. So in other words, store the list returned fromcurved_edges
in your loop (as well as creating a new place to store the colours):
curves = curved_edges.curved_edges(G_sub,positions_sub)
lc_dict[n] = [curves, c]
then dynamically generate your LineCollection
in the plot()
function:
for key,value in lc_dict.items():
plt.gca().add_collection(LineCollection(lc_dict[key][0],color=lc_dict[key][1],alpha=.05))
- Your initial question about the
coords
line... essentially what that does is convert the edge list (which is a list like[node1_id, node2_id]
) into the actual coordinates of each node, so something like[(node1_x, node1_y), (node2_x, node2_y)]
. It is pretty obscure, but it is done that way for speed to avoid having to look up each node more than once (it would be much clearer if I had put the code in a loop).pos[x]
is the position of nodex
(pos
is the output of the ForceAtlas call). Hope that makes sense.
Thanks for getting back to me! I tried your recommendations for plotting the LineCollections
only and removing the plot()
error - both worked! Also thanks for describing the pos[x]
command... I wasn't understanding that pos
came from the ForceAtlas function.