purler
contains tools for run-length encoding vector data.
NA
values are considered identical (unlikebase::rle()
)- Results returned as a
data.frame
(rather than a list), but still compatible withbase::inverse.rle()
- Faster! Includes a C implementation for regular atomic vectors, and
an R version compatible with every input
base::rle()
accepts.
rlenc()
is C code for run-length encoding of raw, logical, integer, numeric and character vectors.- Groups
NA
values into a run (unlikebase::rle()
) - Returns a data.frame rather than a list
- Returned object is compatible with
base::inverse.rle()
- Can be 10x faster than
base::rle()
- Groups
rlenc_compat()
- A pure R version of
rlenc()
which is compatible with all inputs thatbase::rle()
accepts
- A pure R version of
rleid()
returns an integer vector numbering the runs of identical values within a vector of numeric or character data. This is very similar todata.table::rleid()
, execpt thedata.table()
version is much more configurable and flexible. This version is probably only useful if you wanted to avoid pulling indata.table
as a dependency.
You can install from GitHub with:
# install.package('remotes')
remotes::install_github('coolbutuseless/purler')
- Long vector support in
rlenc()
input <- c(1, 1, 1, 2, 2, 8, 8, 8, 8, 8, NA, NA, NA, NA)
(result <- purler::rlenc(input))
#> lengths values start
#> 1 3 1 1
#> 2 2 2 4
#> 3 5 8 6
#> 4 4 NA 11
inverse.rle(result)
#> [1] 1 1 1 2 2 8 8 8 8 8 NA NA NA NA
library(tidyr)
library(bench)
library(dplyr)
library(ggplot2)
N <- 1000
M <- 10
zz <- sample(seq_len(M), N, replace = TRUE)
res <- bench::mark(
rle(zz),
rlenc(zz),
rlenc_compat(zz),
check = FALSE
)
plot(res) + theme_bw()
In base::rle()
, runs of NA values are not treated as a group.
All functions in purler
do treat NAs as identical for the purpose of
creating groups
input <- c(1, 1, 2, NA, NA, NA, NA, 4, 4, 4)
base::rle(input)
#> Run Length Encoding
#> lengths: int [1:7] 2 1 1 1 1 1 3
#> values : num [1:7] 1 2 NA NA NA NA 4
purler::rlenc_compat(input)
#> lengths values start
#> 1 2 1 1
#> 2 1 2 3
#> 3 4 NA 4
#> 4 3 4 8
purler::rlenc(input)
#> lengths values start
#> 1 2 1 1
#> 2 1 2 3
#> 3 4 NA 4
#> 4 3 4 8
purler::rlenc_id(input)
#> [1] 1 1 2 3 3 3 3 4 4 4
rlenc_id()
numbers the runs of identical values in a numeric or
character vector.
For a more complete approach to this problem, see data.table::rleid()
input <- c(11, 11, 12, 12, 12, NA, NA, NA, NA)
rlenc_id(input)
#> [1] 1 1 2 2 2 3 3 3 3
base::rle()
data.table::rleid()
- R Core for developing and maintaining the language.
- CRAN maintainers, for patiently shepherding packages onto CRAN and maintaining the repository