orgclockr

Installation

library(devtools)
devtools::install_github("mutbuerger/orgclockr")

install_github() will not build vignettes by default. Depending on what packages you’re missing, building vignettes may be time consuming but will provide a nice introduction to the library:

devtools::install_github("mutbuerger/orgclockr", build_vignettes = TRUE)

Introduction to orgclockr

What’s Org Mode and why would I want to parse it?

Corresponding to its description, the Emacs-mode Org mode is an organizational tool that lives in the plain text world. Furthermore, it’s a quite complex markup language. Complexity is not imposed though, jumping into it is adviced here. What sets Org mode apart other organizational tools is the seemingly flawless integration of a full-fledged task management solution into a flexible outliner.

Org mode offers multiple options to filter by org elements: Filtering in an org file by todo keywords for example is achieved by org-sparse-tree which uses an overlay. This may be sufficient to get an overview in Org mode, but for presentational purposes this is not an option. Another way to filter by elements is using the org-agenda with org-agenda-filter-by-tag and similar filter functions. A filtered org-agenda can be exported to various formats via org-agenda-write, which provides a simple presentation of your current agenda. In my humble opinion though, there is still a need to repeatedly filter large org files by various org elements outside Emacs. This will especially pay off when Org mode is used for its clocking capabilities. I will elaborate on this in the next chapter.

There are various parsers for elements in org files, even one in R (orgR) is available on CRAN. I recommend using orgR for quickly extracting the raw headings and timestamps of an org file, but the results on todo keywords and tags were certainly not satisfactory. orgclockr strives to provide more flexible extraction functions to capture the elements in a heading as well as the clocking information related to it.

Clocking Work Time in Org Mode

Keeping track of the time you spend not only on work but also on activities to which you dedicate yourself in your free time is something that the Quantified Self Movement brought to a whole new level. What I really like about it is the inspiring collection of visualizations that are derived from the continuous data collection by its members. Clocking work time is also of use in task and project management for getting a glimpse on what you spent your time on and how the time spent meets the according effort. For me, becoming aware of my weaknesses (spending way too much time on some tasks, totally avoiding others) offers the greatest opportunity to improve my work efficiency. While Org mode is a fantastic tool to do the actual clocking and to build simple clock tables, I struggled doing the weekly reviews properly. Most of the time I noticed where my priorities were and how much time I spent in total and proceeded with archiving completed tasks or something else. To actually improve my time management, I planned on focussing on my work efficiency rate, which compares the effort to the time spent, and visualizing results in time series to become aware of trends and changes.

Org mode allows to clock the time spent on a task. The relevant information is stored in a drawer using timestamps in the format predetermined in org-time-stamp-formats. Furthermore, to fix a time limit, Org mode uses both an effort and a deadline property. These elements will be parsed by orgclockr and the time spent, the average length of a clock interval, the number of clock intervals on a given day, the period of time on a task and the effort set returned per task. These informations allow for a more detailed clock report than the built-in org-clock-report in Org mode. The availability of the clocking data in a dplyr::data_frame object is of great benefit not only for filtering headings by elements, but also for calculating measures of work efficiency. Examples are provided in the next chapter.

Learning to set efforts properly can only be done from experience. Therefore making clocking a habit is indispensable. Setting the right efforts is especially useful when breaking down large projects into manageable parts. While Org mode comes in useful when pointing out that you are exceeding the effort set on the currently clocked task, I’d like to have a more general view on my efforts set for a whole project. Calculating the sum of the estimated time needed on the whole project or parts of it is more convenient in R than in Org Tables. Arguably the greatest benefit of orgclockr, though, is in creating time series and visualizing the results with various plotting libraries in R.

Exploring an Org File

Data: orgfile

The orgclockr package comes with the built-in dataset orgfile. This dataset in the form of a character vector illustrates the typical org file. For presentational purposes the file consists of only 100 lines but is enriched with various org elements. The object is the result from reading in an org file. Typically this is done with a combination of file() and readLines() in R:

file("/path/to/file.org") %>%
    readLines()

This package provides the raw data of orgfile, the sample.org file the object stems from, as well. Reading sample.org is simply done using system.file():

library(orgclockr)

system.file("extdata", "sample.org", package = "orgclockr") %>%
    readLines()

Extracting the Org Elements

orgclockr provides several extraction functions if you are only interested in a specific element of an org file. These start with extract_. Most commonly you’d want to extract several elements and store them in a dplyr::data_frame for further manipulation, which is done using org_elements_df(). The code given below filters the headings of the built-in dataset orgfile that are not tagged with TagThree. If you are not familiar with the manipulation functions of the dplyr library yet, you may start with the Data Wrangling Cheat Sheet provided by RStudio.

library(orgclockr)

f <- org_elements_df(orgfile)
filter(f, !grepl("TagThree", Tag), !is.na(Tag))

Headline	Category	Tag	Level	State	Deadline	Effort
HeadingOne	CategoryOne	TagOne	1	nil	nil	nil
TaskOne	nil	TagOne TagTwo	2	TODO	nil	nil

Extracting Clocking Information

While org_elements_df() extracts various elements from org headings, I decided to separate the clocking information from it. This is therefore returned from org_clock_df(), which will also result in a dplyr::data_frame object. As will be shown below, the local data frames returned from both functions can easily be joined using Headline as the index column. The following code returns the number of days a task has been clocked into. Do not confuse this with the sum of TimeSpent in days:

org_clock_df(orgfile) %>%
    group_by(Headline) %>%
    summarise(DaysOnTask = n())

Headline	DaysOnTask
TaskEight	2
TaskFive	2
TaskNine	1
TaskSeven	1
TaskSix	5
TaskTen	1
TaskTwo	2

The local data frame below sorts the tasks and days by the amount of time invested:

org_clock_df(orgfile) %>%
    filter(between(Date, as.Date("2015-01-01"), Sys.Date())) %>%
    group_by(Date, Headline) %>%
    summarise(TimeSpent) %>%
    ungroup() %>%
    arrange(desc(TimeSpent))

Date	Headline	TimeSpent
2015-01-19	TaskTen	334
2015-01-20	TaskEight	129
2015-01-05	TaskSeven	122
2015-02-28	TaskFive	51
2015-01-01	TaskSix	34
2015-02-05	TaskEight	23
2015-03-01	TaskFive	6
2015-01-19	TaskNine	2

The AvgClockInterval returns the mean or median interval for the task per day. You may be interested how the average time on a task has been over time:

org_clock_df(orgfile) %>%
    group_by(Headline) %>%
    summarise(AvgTimeOnTask = round(sum(TimeSpent)/sum(NIntervals), 2)) %>%
    arrange(desc(AvgTimeOnTask))

Headline	AvgTimeOnTask
TaskSeven	122
TaskTen	55.67
TaskEight	50.67
TaskSix	46.4
TaskTwo	10.5
TaskFive	9.5
TaskNine	2

After doing simple calculations on the clocking data you may want to visualize your time spent as a time series. The autoplot() takes a zoo object, which is particularly aimed at irregular time series:

library(zoo)

org_clock_df(orgfile) %>%
    select(Date, TimeSpent) %>%
    filter(between(Date, as.Date("2015-01-01"), Sys.Date())) %>%
    as.data.frame() %>%
    read.zoo(index.column = "Date") %>%
    autoplot.zoo(stat = "identity",
                 geom = "bar") +
                     scale_fill_gradient2(trans = "sqrt") +
                     aes(fill = Value) +
                     guides(fill = FALSE) +
                     theme_classic() +
                     ylab("Time Spent (min)") +
                     xlab("Date")

The plot below shows a very simple retrospective Gantt chart diagram, that takes the first and the last day clocked into a task as values:

org_clock_df(orgfile) %>%
    select(Date, Headline) %>%
    filter(between(Date, as.Date("2014-11-01"), Sys.Date())) %>%
    as.data.frame() %>%
    read.zoo(index.column = "Date") %>%
    autoplot.zoo(stat = "identity",
                 geom = "line") +
                     scale_color_brewer(type = "qual",
                                        palette = 2) +
                     aes(size	= 1,
                         colour = Value) +
                     guides(size	= FALSE,
                            colour	= FALSE) +
                     theme_classic() +
                     ylab("Task") + xlab("Date")

This plot is pretty much self-explanatory:

Palette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2",
             "#D55E00")
org_clock_df(orgfile) %>%
    group_by(Headline) %>%
    summarise(TimeSpent = sum(TimeSpent)) %>%
    ggplot(aes(Headline, sort(TimeSpent, decreasing = TRUE),
               fill = Palette)) +
                   geom_bar(stat  = "identity",
                            width = .5) +
                   theme_classic() +
                   guides(fill = FALSE) +
                   labs(x = "Task", y = "Time Spent (min)")

Joining the results of org_clock_df() and org_elements_df() is achieved with the various dedicated functions provided by the dplyr library. The following example uses a left_join(), because we want to omit the information on headings without any clocking information:

a_df <- org_clock_df(orgfile)
b_df <- org_elements_df(orgfile)
left_join(a_df, b_df) %>%
    group_by(Date, Headline) %>%
    summarise(TimeSpentTotal = sum(TimeSpent), Effort) %>%
    filter(Effort < TimeSpentTotal) %>%
    mutate(Overdue = TimeSpentTotal - Effort) %>%
    ungroup() %>%
    arrange(desc(Overdue))

Date	Headline	TimeSpentTotal	Effort	Overdue
2015-01-19	TaskTen	334	25	309
2015-01-20	TaskEight	129	25	104
2015-01-05	TaskSeven	122	30	92
2014-12-21	TaskSix	90	60	30

The plot below is what I had in mind before writing orgclockr:

library(tidyr)

left_join(a_df, b_df) %>%
    select(Date, Headline, TimeSpent, Effort) %>%
    filter(!is.na(Effort)) %>%
    group_by(Headline) %>%
    summarise(TimeSpent = sum(TimeSpent),
              Effort = unique(Effort)) %>%
                  tidyr::gather(Variable, Value, TimeSpent:Effort) %>%
                  as.data.frame() %>%
                  ggplot() +
                  aes(Headline, Value,
                      fill = Variable) +
                          scale_fill_brewer(type = "qual",
                                            palette = 7) +
                          geom_bar(stat		= "identity",
                                   position	= "dodge") +
                                       theme_classic() +
                                       theme(legend.title	= element_blank(),
                                             legend.position	= "bottom") +
                          labs(x = "Task", y = "Time (min)")

We got a striking example of mostly under estimates and one over estimate here. This obviously should be avoided. The preceeding plot clearly suggests horrible work efficiency rates for the tasks depicted with the sole exception of TaskTwo near the desired value of one:

left_join(a_df, b_df) %>%
    select(Date, Headline, TimeSpent, Effort) %>%
    filter(!is.na(Effort)) %>%
    group_by(Headline) %>%
    summarise(TimeSpent = sum(TimeSpent),
              Effort = unique(Effort)) %>%
                  mutate(EfficiencyRate = round(Effort/TimeSpent, 2))

Headline	TimeSpent	Effort	EfficiencyRate
TaskEight	152	25	0.16
TaskNine	2	240	120
TaskSeven	122	30	0.25
TaskSix	232	60	0.26
TaskTen	334	25	0.07
TaskTwo	21	20	0.95

Limitations

This section may and hopefully will undergo changes in the future, so the list below is also a development roadmap:

[ ] Currently the tag inheritance provided by the inherit_tags parameter in org_elements_df() and the inherit parameter in extract_tags() only works for level one tags.
[ ] For simplicity reasons, clock intervals are not split at midnight. Keep this in mind when clocking for long periods of time spanning from one day to the next. This may impair the meaningfulness of the TimeSpent in org_clock_df().
[ ] Currently orgclockr doesn’t parse the ARCHIVE_ITAGS and ARCHIVE_CATEGORY in archived org files.

stenw / orgclockr