aphalo / ggpp

Grammar of graphics extensions to 'ggplot2'

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Nudging `x` when `y` is a grouping factor using `position_stacknudge()`

ujtwr opened this issue · comments

commented

I want to offset the bar graph so that the fill boundary is zero using position_stacknudge and position_fillnudge as shown below.

library(tidyverse)

D <-
  tibble(
    team = c("A", "A", "A", "B", "B", "B", "C", "C", "D", "D", "D"),
    value = c(0.1, 0.2, 0.7, 0.3, 0.2, 0.5, 0.4, 0.6, 0.6, 0.3, 0.1),
    category = c("i", "j", "k", "i", "j", "k", "i", "k", "i", "j", "k")
  )

# Calculate the amount of offset from the data frame.
# For each bar, we want to shift the "category" to the left by the "i" "value,"
# so we create an "h_move" column and for each "TEAM" give each row the "i" "value" of that "TEAM".
# 
# Sort this using "team" and "h_move" as keys, and take it out as a vector.
h_move <-
  D %>% 
  group_by(team) %>% 
  mutate(
    h_move = -sum(if_else(category == "i", value, 0))
  ) %>% 
  arrange(team, h_move) %>% 
  pull(h_move)

D %>% 
  ggplot(mapping = aes(x = value, y = team, fill = category)) +
  geom_col(
    color = "black",
    position = ggpp::position_stacknudge(reverse = TRUE, x = h_move)
  )

Rplot

Next, to rearrange the order of the bars, convert the "team" column to factor and set the order with the "levels" option.

Running the same code as above with this data, the order of the vectors (h_move), which determines the amount of offset, and the bars that are actually offset are shifted. As a result, the boundary between "i" and "j" in "category" is not zero as shown below.

library(tidyverse)

D <-
  tibble(
    team = c("A", "A", "A", "B", "B", "B", "C", "C", "D", "D", "D"),
    value = c(0.1, 0.2, 0.7, 0.3, 0.2, 0.5, 0.4, 0.6, 0.6, 0.3, 0.1),
    category = c("i", "j", "k", "i", "j", "k", "i", "k", "i", "j", "k")
  ) %>% 
  mutate(
    team = factor(team, levels = c("B", "D", "A", "C"))
  )

h_move <-
  D %>% 
  group_by(team) %>% 
  mutate(
    h_move = -sum(if_else(category == "i", value, 0))
  ) %>% 
  # The order of the "TEAM" column depends on the LEVELS of the FACTOR.
  arrange(team, h_move) %>% 
  pull(h_move)

D %>% 
  ggplot(mapping = aes(x = value, y = team, fill = category)) +
  geom_col(
    color = "black",
    position = ggpp::position_stacknudge(reverse = TRUE, x = h_move)
  )

Rplot01

When giving the vector to the "x" option, shouldn't it also take into account FACTOR in the order in which the amount of movement is applied? Currently it seems to depend on the order of the rows of the data frame given to ggplot. For example, if you randomize the data before passing it to ggplot as shown below, you will get a funny result.

library(tidyverse)

D <-
  tibble(
    team = c("A", "A", "A", "B", "B", "B", "C", "C", "D", "D", "D"),
    value = c(0.1, 0.2, 0.7, 0.3, 0.2, 0.5, 0.4, 0.6, 0.6, 0.3, 0.1),
    category = c("i", "j", "k", "i", "j", "k", "i", "k", "i", "j", "k")
  )

h_move <-
  D %>% 
  group_by(team) %>% 
  mutate(
    h_move = -sum(if_else(category == "i", value, 0))
  ) %>% 
  arrange(team, h_move) %>% 
  pull(h_move)

D %>% 
  # randomize the order of rows
  slice_sample(prop = 1) %>% 
  ggplot(mapping = aes(x = value, y = team, fill = category)) +
  geom_col(
    color = "black",
    position = ggpp::position_stacknudge(reverse = TRUE, x = h_move)
  )

Rplot02

This package is very good for creating offset bars as shown above. It would be even better if it were easy to create offset bars without having to create a vector to determine the amount of offset.

thank you.

@ujtwr Thanks for the code examples!
In your first example, arrange() is not needed, including group_by() already ensures sorting in the order of the levels of Team.

In your second example, arrange() causes the problem, as it arranges the rows based on the alphabetical order of factor level labels instead of the order of the levels. Removing this line fixes the second figure, as the correct sorting has been already done based on group_by() grouping.

The third example demonstrates that the nudge is applied based on the order of the data rows. This is currently the expected behaviour: I will make this clearer in the documenttaion. I cannot think of a way of making this work differently. I am not even sure how ggplot2::position_nudge() handles this case. Does it rearrange the vector passed as argument to x? If not, I would rather keep the behaviour of the positions from 'ggpp' consistent with 'ggplot2'.

a) I will update the documentation a.s.a.p.
b) I will in the future investigate if it would be possible to implement an alternative to parameter x based on shifts by groups created by factors mapped to x or y aesthetics.

As the example below shows, this is also how ggplot2::position_nudge() behaves. So, to implement the behaviour you suggest a new parameter could be used, instead of changing the behaviour of x. I will put this issue "on hold".

library(tidyverse)

D <-
  tibble(
    team = c("A", "B", "C", "D"),
    value = c(0.1, 0.2, 0.7, 0.3)
  )

sample(D, 1) %>%

ggplot(data = D, mapping = aes(value, team)) +
  geom_col(position = position_nudge(x = -D$value))

ggplot(data = slice_sample(D, prop = 1), mapping = aes(value, team)) +
  geom_col(position = position_nudge(x = -D$value))

Created on 2023-07-27 with reprex v2.0.2

Documentation is now updated. Enhancement remains on hold.

commented

Thank you for your quick response.

If we are careful about the order of the rows, it works as expected, so there would be no problem for now.

I don't know about internals, but if we give a vector to the x parameter when moving bar positions, we would want to determine the amount of movement for each bar. If the correspondence depends on the order of the rows in the data frame, it is difficult to specify the relationship between the elements of the vector and the elements of the graph to be moved. I would be grateful if you could think of a better way to do this.

Thank you

commented

Additional Information:

I've used position_stacknudge many times and I'm trying to figure out why I never faced this problem before.

The reason why I was using position_stacknudge and position_fillnudge and did not notice the problem until now is because I was using "geom_bar" instead of "geom_col" for drawing bar graphs.

The data was probably sorted internally there because it was aggregated in "stat="count"" in "geom_bar".

This time, we used "geom_col" or "geom_bar(stat = "identity")" which uses the data frame values as they are, so the problem seems to have become apparent.

I do understand your concerns. However, the internals of 'ggplot2' are such that position functions see only the data after being mapped to aesthetics, so the original factor levels are not easily available, only a group code. Possibly, a nudge by group could be implemented rather easily, but the user would still need to manually match internaly used group codes to factor levels. I am rather busy at the moment, but I will see if I can implement this in the future, at least to test how much it helps.

The position functions in 'ggpp' currently reuse/call and combine functions defined in 'ggplot2' with the idea of making sure the new functions remain compatible and consistent with the related functions defined in 'ggplot2'. The easiest solution, with the current position functions seems to me to define a functon that takes the data frame and computes the shifts, with the user making sure to pass to it as argument the same data frame passed as argument to ggplot. This can be most elegantly coded as a function in base R, rather than the tidyverse, I think. When I find time, I will write a page with some examples at my web site "R gallery".

Without a defining a function code like shown below could be a bit safer.

# we add the target shift to the tibble
  D1 <-
    D %>% 
    group_by(team) %>% 
    mutate(
      h_move = -sum(if_else(category == "i", value, 0))
    )
  
  # and extract the column with nudge shifts within the plotting statement
  D1 %>% 
    ggplot(mapping = aes(x = value, y = team, fill = category)) +
    geom_col(
      color = "black",
      position = ggpp::position_stacknudge(reverse = TRUE, x = D1$h_move)
    )
  
  # reordering the tibble does not affect the plot
  D2 <- slice_sample(D1, prop = 1)

  D2 %>% 
    ggplot(mapping = aes(x = value, y = team, fill = category)) +
    geom_col(
      color = "black",
      position = ggpp::position_stacknudge(reverse = TRUE, x = D2$h_move)
    )

Obviously this code does not solve the problem, but perhaps makes it a little more tolerable.

commented

Thank you for sharing your awareness of the issue. Also, thank you for the suggestion for a more refined code.

We look forward to seeing this package evolve better.

@ujtwr I have given some thought to this issue and decided to not implement the suggested change. Nudge is mainly intended for data labels, not to nudge columns or points representing data, because this in principle invalidates the mapping to the x and/or y aesthetics. So, this use case is not within the expected ones based on the Grammar of Graphics. The idea is that the variables representing values to be "read" from the plot using the scales of the aesthetics contains those values, or the transformation is applied through the scale. Unless the current behaviour changes in 'ggplot2', I will not change it in 'ggpp'. In other words, the best way to handle this problem is by shifting the data before plotting, and then passing the data ready to be plotted to ggplot() using position_stack() from 'ggplot2'. If a separate layer is added, for example, with geom_text() to label the stacked bars, then only in this layer it would be useful to use position_stacknedge().