a-maldet / labelmachine

R package for labeling factor variables in data frames. Manage your labels in yaml files and stay on top of things.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

labelmachine

cran release Travis build status GitHub last commit GitHub code size in bytes codecov.io

labelmachine is an R package that helps assigning meaningful labels to data sets. Furthermore, you can manage your labels in so called lama-dictionary files, which are yaml files. This makes it very easy using the same label translations in multiple projects which share similar data structure.

Labeling your data can be easy!

Installation

# Install release version from CRAN
install.packages("labelmachine")

# Install development version from GitHub
devtools::install_github('a-maldet/labelmachine', build_vignettes = TRUE)

Concept

The label assignments are given in so called translations (named character vectors), which are like a recipes, telling which original value will be mapped onto which new label. The translations are collected in so called lama_dictionary objects. This lama_dictionary objects will be used to translate your data frame variables.

Usage

Let df be a data frame with marks and subjects, which should be translated

df <- data.frame(
  pupil_id = c(1, 1, 2, 2, 3),
  subject = c("en", "ma", "ma", "en", "en"),
  result = c(2, 1, 3, 2, NA),
  stringsAsFactors = FALSE
)
df
##   pupil_id subject result
## 1        1      en      2
## 2        1      ma      1
## 3        2      ma      3
## 4        2      en      2
## 5        3      en     NA

Create a lama_dictionary object holding the translations:

library(labelmachine)
dict <- new_lama_dictionary(
  subjects = c(en = "English", ma = "Mathematics", NA_ = "other subjects"),
  results = c("1" = "Excellent", "2" = "Satisfying", "3" = "Failed", NA_ = "Missed")
)
dict
## 
## --- lama_dictionary ---
## Variable 'subjects':
##               en               ma              NA_ 
##        "English"    "Mathematics" "other subjects" 
## 
## Variable 'results':
##            1            2            3          NA_ 
##  "Excellent" "Satisfying"     "Failed"     "Missed"

Translate the data frame variables:

df_new <- lama_translate(
  df,
  dict,
  subject_new = subjects(subject),
  result_new = results(result)
)
str(df_new)
## 'data.frame':    5 obs. of  5 variables:
##  $ pupil_id   : num  1 1 2 2 3
##  $ subject    : chr  "en" "ma" "ma" "en" ...
##  $ result     : num  2 1 3 2 NA
##  $ subject_new: Factor w/ 3 levels "English","Mathematics",..: 1 2 2 1 1
##  $ result_new : Factor w/ 4 levels "Excellent","Satisfying",..: 2 1 3 2 4

Highlights

labelmachine offers the following features:

  • All types of variables can be translated: Logical, Numeric, Character, Factor
  • When translating your variables, you may choose between keeping the current ordering or applying a new factor ordering to your variable.
  • Assigning meaningful labels to missing values (NA) is no problem.
  • Assigning NA to existing values is no problem.
  • Merging two values into a single label is no problem.
  • Transforming a data frame holding label assignment lists into a lama_dictionary is no problem.
  • Manage your translations in yaml files in order to use the same translations in different projects sharing similar data.

Further reading

A short introduction can be found here: Get started

About

R package for labeling factor variables in data frames. Manage your labels in yaml files and stay on top of things.


Languages

Language:R 99.5%Language:CSS 0.5%