orgclockr
Installation
library(devtools)
devtools::install_github("mutbuerger/orgclockr")
install_github()
will not build vignettes by default. Depending on what packages you’re missing, building vignettes may be time consuming but will provide a nice introduction to the library:
devtools::install_github("mutbuerger/orgclockr", build_vignettes = TRUE)
Introduction to orgclockr
What’s Org Mode and why would I want to parse it?
Corresponding to its description, the Emacs-mode Org mode is an organizational tool that lives in the plain text world. Furthermore, it’s a quite complex markup language. Complexity is not imposed though, jumping into it is adviced here. What sets Org mode apart other organizational tools is the seemingly flawless integration of a full-fledged task management solution into a flexible outliner.
Org mode offers multiple options to filter by org elements: Filtering in an org file by todo keywords for example is achieved by org-sparse-tree
which uses an overlay. This may be sufficient to get an overview in Org mode, but for presentational purposes this is not an option. Another way to filter by elements is using the org-agenda
with org-agenda-filter-by-tag
and similar filter functions. A filtered org-agenda
can be exported to various formats via org-agenda-write
, which provides a simple presentation of your current agenda. In my humble opinion though, there is still a need to repeatedly filter large org files by various org elements outside Emacs. This will especially pay off when Org mode is used for its clocking capabilities. I will elaborate on this in the next chapter.
There are various parsers for elements in org files, even one in R (orgR) is available on CRAN. I recommend using orgR for quickly extracting the raw headings and timestamps of an org file, but the results on todo keywords and tags were certainly not satisfactory. orgclockr
strives to provide more flexible extraction functions to capture the elements in a heading as well as the clocking information related to it.
Clocking Work Time in Org Mode
Keeping track of the time you spend not only on work but also on activities to which you dedicate yourself in your free time is something that the Quantified Self Movement brought to a whole new level. What I really like about it is the inspiring collection of visualizations that are derived from the continuous data collection by its members. Clocking work time is also of use in task and project management for getting a glimpse on what you spent your time on and how the time spent meets the according effort. For me, becoming aware of my weaknesses (spending way too much time on some tasks, totally avoiding others) offers the greatest opportunity to improve my work efficiency. While Org mode is a fantastic tool to do the actual clocking and to build simple clock tables, I struggled doing the weekly reviews properly. Most of the time I noticed where my priorities were and how much time I spent in total and proceeded with archiving completed tasks or something else. To actually improve my time management, I planned on focussing on my work efficiency rate, which compares the effort to the time spent, and visualizing results in time series to become aware of trends and changes.
Org mode allows to clock the time spent on a task. The relevant information is stored in a drawer using timestamps in the format predetermined in org-time-stamp-formats
. Furthermore, to fix a time limit, Org mode uses both an effort and a deadline property. These elements will be parsed by orgclockr
and the time spent, the average length of a clock interval, the number of clock intervals on a given day, the period of time on a task and the effort set returned per task. These informations allow for a more detailed clock report than the built-in org-clock-report
in Org mode. The availability of the clocking data in a dplyr::data_frame
object is of great benefit not only for filtering headings by elements, but also for calculating measures of work efficiency. Examples are provided in the next chapter.
Learning to set efforts properly can only be done from experience. Therefore making clocking a habit is indispensable. Setting the right efforts is especially useful when breaking down large projects into manageable parts. While Org mode comes in useful when pointing out that you are exceeding the effort set on the currently clocked task, I’d like to have a more general view on my efforts set for a whole project. Calculating the sum of the estimated time needed on the whole project or parts of it is more convenient in R than in Org Tables. Arguably the greatest benefit of orgclockr
, though, is in creating time series and visualizing the results with various plotting libraries in R.
Exploring an Org File
Data: orgfile
The orgclockr
package comes with the built-in dataset orgfile
. This dataset in the form of a character vector illustrates the typical org file. For presentational purposes the file consists of only 100 lines but is enriched with various org elements. The object is the result from reading in an org file. Typically this is done with a combination of file()
and readLines()
in R:
file("/path/to/file.org") %>%
readLines()
This package provides the raw data of orgfile
, the sample.org
file the object stems from, as well. Reading sample.org
is simply done using system.file()
:
library(orgclockr)
system.file("extdata", "sample.org", package = "orgclockr") %>%
readLines()
Extracting the Org Elements
orgclockr
provides several extraction functions if you are only interested in a specific element of an org file. These start with extract_
. Most commonly you’d want to extract several elements and store them in a dplyr::data_frame
for further manipulation, which is done using org_elements_df()
. The code given below filters the headings of the built-in dataset orgfile
that are not tagged with TagThree
. If you are not familiar with the manipulation functions of the dplyr
library yet, you may start with the Data Wrangling Cheat Sheet provided by RStudio.
library(orgclockr)
f <- org_elements_df(orgfile)
filter(f, !grepl("TagThree", Tag), !is.na(Tag))
Headline | Category | Tag | Level | State | Deadline | Effort |
---|---|---|---|---|---|---|
HeadingOne | CategoryOne | TagOne | 1 | nil | nil | nil |
TaskOne | nil | TagOne TagTwo | 2 | TODO | nil | nil |
Extracting Clocking Information
While org_elements_df()
extracts various elements from org headings, I decided to separate the clocking information from it. This is therefore returned from org_clock_df()
, which will also result in a dplyr::data_frame
object. As will be shown below, the local data frames returned from both functions can easily be joined using Headline
as the index column. The following code returns the number of days a task has been clocked into. Do not confuse this with the sum of TimeSpent
in days:
org_clock_df(orgfile) %>%
group_by(Headline) %>%
summarise(DaysOnTask = n())
Headline | DaysOnTask |
---|---|
TaskEight | 2 |
TaskFive | 2 |
TaskNine | 1 |
TaskSeven | 1 |
TaskSix | 5 |
TaskTen | 1 |
TaskTwo | 2 |
The local data frame below sorts the tasks and days by the amount of time invested:
org_clock_df(orgfile) %>%
filter(between(Date, as.Date("2015-01-01"), Sys.Date())) %>%
group_by(Date, Headline) %>%
summarise(TimeSpent) %>%
ungroup() %>%
arrange(desc(TimeSpent))
Date | Headline | TimeSpent |
---|---|---|
2015-01-19 | TaskTen | 334 |
2015-01-20 | TaskEight | 129 |
2015-01-05 | TaskSeven | 122 |
2015-02-28 | TaskFive | 51 |
2015-01-01 | TaskSix | 34 |
2015-02-05 | TaskEight | 23 |
2015-03-01 | TaskFive | 6 |
2015-01-19 | TaskNine | 2 |
The AvgClockInterval
returns the mean or median interval for the task per day. You may be interested how the average time on a task has been over time:
org_clock_df(orgfile) %>%
group_by(Headline) %>%
summarise(AvgTimeOnTask = round(sum(TimeSpent)/sum(NIntervals), 2)) %>%
arrange(desc(AvgTimeOnTask))
Headline | AvgTimeOnTask |
---|---|
TaskSeven | 122 |
TaskTen | 55.67 |
TaskEight | 50.67 |
TaskSix | 46.4 |
TaskTwo | 10.5 |
TaskFive | 9.5 |
TaskNine | 2 |
After doing simple calculations on the clocking data you may want to visualize your time spent as a time series. The autoplot()
takes a zoo
object, which is particularly aimed at irregular time series:
library(zoo)
org_clock_df(orgfile) %>%
select(Date, TimeSpent) %>%
filter(between(Date, as.Date("2015-01-01"), Sys.Date())) %>%
as.data.frame() %>%
read.zoo(index.column = "Date") %>%
autoplot.zoo(stat = "identity",
geom = "bar") +
scale_fill_gradient2(trans = "sqrt") +
aes(fill = Value) +
guides(fill = FALSE) +
theme_classic() +
ylab("Time Spent (min)") +
xlab("Date")
The plot below shows a very simple retrospective Gantt chart diagram, that takes the first and the last day clocked into a task as values:
org_clock_df(orgfile) %>%
select(Date, Headline) %>%
filter(between(Date, as.Date("2014-11-01"), Sys.Date())) %>%
as.data.frame() %>%
read.zoo(index.column = "Date") %>%
autoplot.zoo(stat = "identity",
geom = "line") +
scale_color_brewer(type = "qual",
palette = 2) +
aes(size = 1,
colour = Value) +
guides(size = FALSE,
colour = FALSE) +
theme_classic() +
ylab("Task") + xlab("Date")
This plot is pretty much self-explanatory:
Palette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2",
"#D55E00")
org_clock_df(orgfile) %>%
group_by(Headline) %>%
summarise(TimeSpent = sum(TimeSpent)) %>%
ggplot(aes(Headline, sort(TimeSpent, decreasing = TRUE),
fill = Palette)) +
geom_bar(stat = "identity",
width = .5) +
theme_classic() +
guides(fill = FALSE) +
labs(x = "Task", y = "Time Spent (min)")
Joining the results of org_clock_df()
and org_elements_df()
is achieved with the various dedicated functions provided by the dplyr
library. The following example uses a left_join()
, because we want to omit the information on headings without any clocking information:
a_df <- org_clock_df(orgfile)
b_df <- org_elements_df(orgfile)
left_join(a_df, b_df) %>%
group_by(Date, Headline) %>%
summarise(TimeSpentTotal = sum(TimeSpent), Effort) %>%
filter(Effort < TimeSpentTotal) %>%
mutate(Overdue = TimeSpentTotal - Effort) %>%
ungroup() %>%
arrange(desc(Overdue))
Date | Headline | TimeSpentTotal | Effort | Overdue |
---|---|---|---|---|
2015-01-19 | TaskTen | 334 | 25 | 309 |
2015-01-20 | TaskEight | 129 | 25 | 104 |
2015-01-05 | TaskSeven | 122 | 30 | 92 |
2014-12-21 | TaskSix | 90 | 60 | 30 |
The plot below is what I had in mind before writing orgclockr
:
library(tidyr)
left_join(a_df, b_df) %>%
select(Date, Headline, TimeSpent, Effort) %>%
filter(!is.na(Effort)) %>%
group_by(Headline) %>%
summarise(TimeSpent = sum(TimeSpent),
Effort = unique(Effort)) %>%
tidyr::gather(Variable, Value, TimeSpent:Effort) %>%
as.data.frame() %>%
ggplot() +
aes(Headline, Value,
fill = Variable) +
scale_fill_brewer(type = "qual",
palette = 7) +
geom_bar(stat = "identity",
position = "dodge") +
theme_classic() +
theme(legend.title = element_blank(),
legend.position = "bottom") +
labs(x = "Task", y = "Time (min)")
We got a striking example of mostly under estimates and one over estimate here. This obviously should be avoided. The preceeding plot clearly suggests horrible work efficiency rates for the tasks depicted with the sole exception of TaskTwo
near the desired value of one:
left_join(a_df, b_df) %>%
select(Date, Headline, TimeSpent, Effort) %>%
filter(!is.na(Effort)) %>%
group_by(Headline) %>%
summarise(TimeSpent = sum(TimeSpent),
Effort = unique(Effort)) %>%
mutate(EfficiencyRate = round(Effort/TimeSpent, 2))
Headline | TimeSpent | Effort | EfficiencyRate |
---|---|---|---|
TaskEight | 152 | 25 | 0.16 |
TaskNine | 2 | 240 | 120 |
TaskSeven | 122 | 30 | 0.25 |
TaskSix | 232 | 60 | 0.26 |
TaskTen | 334 | 25 | 0.07 |
TaskTwo | 21 | 20 | 0.95 |
Limitations
This section may and hopefully will undergo changes in the future, so the list below is also a development roadmap:
- [ ] Currently the tag inheritance provided by the
inherit_tags
parameter inorg_elements_df()
and theinherit
parameter inextract_tags()
only works for level one tags. - [ ] For simplicity reasons, clock intervals are not split at midnight. Keep this in mind when clocking for long periods of time spanning from one day to the next. This may impair the meaningfulness of the
TimeSpent
inorg_clock_df()
. - [ ] Currently
orgclockr
doesn’t parse theARCHIVE_ITAGS
andARCHIVE_CATEGORY
in archived org files.