steveharoz / Tidyverse_Tips

Tips and useful functions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tidyverse_Tips

Tips and useful functions

uncount()

Copies each row n number of times

df <- tibble(x = c("a", "b"), n = c(1, 2))
uncount(df, n)
#> # A tibble: 3 x 1
#>   x    
#> 1 a    
#> 2 b    
#> 3 b  

relocate()

Move a column to a different position (moves to first position if new position is unspecified)

df <- tibble(a = 1, b = 2, c = 3, d = 4, e = 5, f = 6)
df %>% relocate(f)
#> # A tibble: 1 x 6
#>   f         a     b     c d     e    
#> 1 6         1     2     3 4     4    

simplify()

df %>% summarise(x = simplify(strsplit(x, ",")))

view()

(lowercase!) will view the tibble and return the value, so you can debug long pipe chains.

df %>%
   view("before filter") %>%
   filter(a > 0) %>%
   view("after filter")

%T>%

will return the original value instead of the result of the function. It's useful for print() or View() calls in the middle of a pipe chain.

df %T>%
   View("before filter") %>%
   filter(a > 0)

Custom linetype

To make a custom dash style, specify the length of the line (hex 1-F) and the length of the space.

geom_line(linetype = "21")

ggsignif

image

get data from ggplot object

gg1 = ggplot(mtcars) + 
  aes(x=mpg, y=wt) +
  geom_point() +
  geom_smooth(se = FALSE)
gg1

# build the plot
gg_object = ggplot_build(gg1)

# get the data for 2nd geom
smooth_line_data = gg_object$data[[2]]

gg1 + annotate(
  geom = "text",
  x = 25,
  y = mean(smooth_line_data$y[round(smooth_line_data$x) == 25]),
  hjust = -.1,
  label = "<- x = 25 here"
)

image

skimr::skim()

overview of dataframe

skim(iris)

## ── Data Summary ────────────────────────
##                            Values
## Name                       iris  
## Number of rows             150   
## Number of columns          5     
## _______________________          
## Column type frequency:           
##   factor                   1     
##   numeric                  4     
## ________________________         
## Group variables            None  
## 
## ── Variable type: factor ───────────────────────────────────────────────────────────────────────────
##   skim_variable n_missing complete_rate ordered n_unique top_counts               
## 1 Species               0             1 FALSE          3 set: 50, ver: 50, vir: 50
## 
## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────
##   skim_variable n_missing complete_rate  mean    sd    p0   p25   p50   p75  p100 hist 
## 1 Sepal.Length          0             1  5.84 0.828   4.3   5.1  5.8    6.4   7.9 ▆▇▇▅▂
## 2 Sepal.Width           0             1  3.06 0.436   2     2.8  3      3.3   4.4 ▁▆▇▂▁
## 3 Petal.Length          0             1  3.76 1.77    1     1.6  4.35   5.1   6.9 ▇▁▆▇▂
## 4 Petal.Width           0             1  1.20 0.762   0.1   0.3  1.3    1.8   2.5 ▇▁▇▅▃

optimize()

find an x that will minimize or maximize f(x)

optimize(function(x) abs(exp(x)-23.14069), c(0, 10))

Custom point shape

https://coolbutuseless.github.io/2021/11/04/custom-ggplot2-point-shapes-with-gggrid/
https://twitter.com/yutannihilat_en/status/1493237440043126785

datasets

  • tidycensus

ggplot data

ggplot(mtcars) +
    aes(wt, mpg) +
    geom_point(data = . %>% filter(mpg > 20))

multiple color scales

image

ggview()

view plot with specific size

About

Tips and useful functions