In this lab, you'll get a chance to practice implementing and interpreting the centrality metrics from the previous section. You'll do this be investigating the social network from Game of Thrones!
You will be able to:
- Understand and explain network centrality and its importance in graph analysis
- Understand and calculate Degree, Closeness, Betweenness and Eigenvector centrality measures
- Describe the use case for several centrality measures
A. J. Beveridge, and J. Shan created a network from George R. Martin's "A song of ice and fire" by extracting relationships between characters of the story. The dataset is available at Github. Relationships between characters were formed every time a character's name appears within 15 words of another character. This was designed as an approximate metric for character's interactions with each other. The results of this simple analysis are quite profound and produce interesting visuals such as this graph:
With that, it's your turn to start investigating the most central characters!
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
Start by loading the dataset as a pandas DataFrame. From this, you'll then create a network representation of the dataset using NetworkX.
The dataset is stored in the file asoiaf-all-edges.csv
.
# Load edges into dataframes
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Source | Target | Type | id | weight | |
---|---|---|---|---|---|
0 | Addam-Marbrand | Brynden-Tully | Undirected | 0 | 3 |
1 | Addam-Marbrand | Cersei-Lannister | Undirected | 1 | 3 |
2 | Addam-Marbrand | Gyles-Rosby | Undirected | 2 | 3 |
3 | Addam-Marbrand | Jaime-Lannister | Undirected | 3 | 14 |
4 | Addam-Marbrand | Jalabhar-Xho | Undirected | 4 | 3 |
Now that you have the data loaded as a pandas DataFrame, iterate through the data and create appropriate edges to the empty graph you instantiated above. Be sure to add the weight to each edge.
# Create an empty graph instance
# Read edge lists into dataframes
To start the investigation of the most central characters in the books, calculate the degree centrality for each character. Then create a bar graph of the top 10 characters according to degree centrality.
#Your code here
Repeat the above exercise for the top 10 characters according to closeness centrality.
#Your code here
Repeat the process one more time for betweeness centrality.
#Your code here
Great! Now try putting all of these metrics together along with eigenvector centrality. Combine all four metrics into a single dataframe for each character.
#Your code here
While centrality can tell us a lot, you've also begun to see how certain individuals may not be the most central characters, but can be pivotal in the flow of information from one community to another. In the previous lesson, such nodes were labeled as 'bridges' acting as the intermediaries between two clusters. Try and identify such characters from this dataset.
#Your code here
To visualize all of these relationships, draw a graph of the network.
#Your code here
As you can see, the above graph is undoubtedly noisy, making it difficult to discern any useful patterns. As such, reset the graph and only add edges whose weight is 75 or greater. From there, redraw the graph. To further help with the display, try using nx.spring_layout(G)
for the position. To jazz it up, try and recolor those nodes which you identified as bridge or bottleneck nodes to communication.
#Your code here
In this lab, we looked at different centrality measures of the graph data for the ASIOF dataset. We also compared these measures to see how they correlate with each other. We also saw in practice, the difference between taking the weighted centrality measures and how it may effect the results.