thomasp85 / ggraph

Grammar of Graph Graphics

Home Page:https://ggraph.data-imaginist.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Documentation on geom_edge_link incomplete

julianbarg opened this issue · comments

On the data argument, the documentation on geom_edge_links states:

The return of a call to ‘get_edges()’ or a data.frame giving
in correct format (see details for for guidance on the
See ‘get_edges()’ for more details on edge
extraction.

However, in neither place is there sufficient information to make the data argument on geom_edge_links work. A reproducible example would be great. Below is a hypothetical example I have been trying to work with. My overall goal would be to apply the geom_edge_link function or a related function to a subset of edges. You will notice that all the error messages say at their core that "object 'edge.id' not found," which does not help much.

Here is the setup:

library(tidyverse)
library(tidygraph)
library(ggraph)
graph <- tribble(
  ~from, ~to,
  "a", "b", 
  "a", "c",
  "b", "d",
  "a", "d", 
  "b", "e",
  "d", "f",
  "b", "f"
  ) %>%
  as_tbl_graph()
plot <- ggraph(graph, layout = "kk")
plot +
  geom_node_point()

Here is what I've tried.

  1. Based on ggplot2, intuition would be just to provide "from" and "to" data.
test1 <- tribble(
  ~from, ~to,
  "a", "b"
)

plot +
  geom_node_point() +
  geom_edge_link(data = test1)
  1. Worry not, just subset the original tbl_graph. Doesn't work though.
test2 <- graph %>%
  activate("edges") %>%
  filter(from == "a" & to == "b")

plot +
  geom_node_point() +
  geom_edge_link(data = test2)
  1. Reading the documentation for get_edges, we learn that we should provide both x and y coordinates, however, all we know is the identity of the edges we want to connect. Maybe we can use get_edges to extract the corresponding coordinates?
get_edges(plot)

Well, I don't know what I expected, but not for this to return a function.

  1. So we learn that get_edges returns a function. So let's apply this function to our graph or our plot to get what we need. Neither works.
get_edges()(graph)
get_edges()(plot)
  1. I eventually stumbled upon the right answer by opening random entries of the documentation I found on google and some trial and error:
layout <- create_layout(graph, layout = "kk")
edge_one <- get_edges("short")(layout)[1,]
ggraph(layout) +
  geom_node_point() +
  geom_edge_link(data = edge_one)

ggraph works under the assumption that edge data is always coming from a call to get_edges() so I don't want to document how this data should look. It might change at some point because new edge geoms requires it (e.g. the new edge bundle geoms require knowledge of the graph topology).

So the omission is fully intended

Then I would pose the more general question of how ggraph could get closely aligned with the grammar of graphics in the long run. get_edges() isn't fully working, yet -- see issue #362 -- and even if it was, it does not support, e.g., subsetting. In any case, ggraph also should not try to replicate all the subsetting functions of, say, dplyr. The typical workflow for ggplot and for creating network plots involves quite a lot of manual subsetting and preparing data, to set the correct aesthetics for all parts of the data. The ggraph documentation at this time assumes a more linear workflow. One dataset is used and all observations are treated equally, with the bottleneck being get_edges(). Compare that to the grammar of graphics approach of ggplot where you can add any data with any aesthetic to an existing plot.

Don't get me wrong, ggraph supports mostly everything I need. But as it stands it requires me to manually prepare a tibble with x and y coordinates from the original layout. That's not as much of an abstraction from using ggplot with x and y coordinates as ggraph was going for.

Here is a mock workflow based on my current work which is based on the typical ggplot workflow, and which the documentation on ggraph doesn't cover yet.

graph <- as_tbl_graph(df)
layout <- create_layout(graph, layout = "fr")

subset_a <- get_nodes()(layout) %>%
  filter(name %in% group_a) %>%
  left_join(attribute_df, by = c("name" = "from"))
subset_b <- get_nodes()(layout) %>%
  filter(name %in% group_b)
edges <- get_edges("short")(layout) %>%
  group_by(from, to) %>%
  mutate(connections = n()) %>%
  slice_head(n = 1)

ggraph(layout)  +
  geom_node_point(data = subset_a, aes(size = attribute), shape = 23) +
  geom_node_point(data= subset_b) +
  geom_edge_arc(data = edges, aes(width = connections))

Let me know if you would like me to open a new issue for this discussion.

There is no plan to get more "closely aligned with the grammar of graphics".

In the example above, the operations should be handled by tidygraph and is not described in the ggraph docs for the same reason that dplyr isn't covered in the ggplot2 docs. In general you prepare your graph with tidygraph and plot it with ggraph. Since network plots are fully reliant on the underlying network it does not make much sense to mix and match small datasets like you are trying to.

If you want to only plot a subset of edges, use the filter aesthetic which is available for all ggraph geoms

I will close the issue then. But I will say that the filter aesthetic is a very dissatisfying solution, especially for more complex conditions. Instead of being able to quickly grab some observations and drop them in, I have to go back and create an additional column in the original data just to turn them on and off. Never needed to do this in ggraph, and I don't see why you are trying to reinvent the wheel here.

An aesthetic can be an expression and have access to tidygraph algorithms so it is actually quite flexible and powerful, e.g. geom_edge_link(aes(filter = !edge_is_loop()))

ggraph is a standard ggplot2 extension so you can use whatever other geoms you want if you don't want to buy into the network idea for all your layers. There is nothing stopping you from adding a geom_segment() to your plot

I see what you mean with that example. In line with that, the filter aesthetic should probably take a more prominent position in the documentation. As it stands, it is just a bullet point in the manual page for geom_edge_link. It's also only covered by one example in the documentation, and that's one that uses one of the built-in attributes, not a custom one. I would also be apprehensive to write a complex filtering logic into a function argument. Looking forward to eventually seeing more examples on that in the documentation -- I know you were busy with the major version release recently. Ideally longer examples that start with some of the well-known included datasets as well. I suppose the intended workflow would be:

  1. Import dataset
  2. Preprocess
  3. Initiate layout
  4. Modify nodes/edges as necessary
  5. Initiate graph
  6. Add nodes/edges with filter aesthetic

Gplot functions are a decent fallback, but using them really negates any advantages that the ggraph package would have. Users need to understand the underlying data structure to use ggplot functions with ggraph, so all ease of use features of ggraph are then out the window and it's back to square one. Of course, that such is the nature of networks, where there are always two separate dataframes/tibbles that make up the data.

It might change at some point because new edge geoms requires it (e.g. the new edge bundle geoms require knowledge of the graph topology).

I'm sure there are more pressing issues right now, but one option might be to at some point creat a vignette on advanced ggraph, similar to the "Programming with dplyr" vignette?