stenw / orgclockr

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

orgclockr

Installation

library(devtools)
devtools::install_github("mutbuerger/orgclockr")

install_github() will not build vignettes by default. Depending on what packages you’re missing, building vignettes may be time consuming but will provide a nice introduction to the library:

devtools::install_github("mutbuerger/orgclockr", build_vignettes = TRUE)

Introduction to orgclockr

What’s Org Mode and why would I want to parse it?

Corresponding to its description, the Emacs-mode Org mode is an organizational tool that lives in the plain text world. Furthermore, it’s a quite complex markup language. Complexity is not imposed though, jumping into it is adviced here. What sets Org mode apart other organizational tools is the seemingly flawless integration of a full-fledged task management solution into a flexible outliner.

Org mode offers multiple options to filter by org elements: Filtering in an org file by todo keywords for example is achieved by org-sparse-tree which uses an overlay. This may be sufficient to get an overview in Org mode, but for presentational purposes this is not an option. Another way to filter by elements is using the org-agenda with org-agenda-filter-by-tag and similar filter functions. A filtered org-agenda can be exported to various formats via org-agenda-write, which provides a simple presentation of your current agenda. In my humble opinion though, there is still a need to repeatedly filter large org files by various org elements outside Emacs. This will especially pay off when Org mode is used for its clocking capabilities. I will elaborate on this in the next chapter.

There are various parsers for elements in org files, even one in R (orgR) is available on CRAN. I recommend using orgR for quickly extracting the raw headings and timestamps of an org file, but the results on todo keywords and tags were certainly not satisfactory. orgclockr strives to provide more flexible extraction functions to capture the elements in a heading as well as the clocking information related to it.

Clocking Work Time in Org Mode

Keeping track of the time you spend not only on work but also on activities to which you dedicate yourself in your free time is something that the Quantified Self Movement brought to a whole new level. What I really like about it is the inspiring collection of visualizations that are derived from the continuous data collection by its members. Clocking work time is also of use in task and project management for getting a glimpse on what you spent your time on and how the time spent meets the according effort. For me, becoming aware of my weaknesses (spending way too much time on some tasks, totally avoiding others) offers the greatest opportunity to improve my work efficiency. While Org mode is a fantastic tool to do the actual clocking and to build simple clock tables, I struggled doing the weekly reviews properly. Most of the time I noticed where my priorities were and how much time I spent in total and proceeded with archiving completed tasks or something else. To actually improve my time management, I planned on focussing on my work efficiency rate, which compares the effort to the time spent, and visualizing results in time series to become aware of trends and changes.

Org mode allows to clock the time spent on a task. The relevant information is stored in a drawer using timestamps in the format predetermined in org-time-stamp-formats. Furthermore, to fix a time limit, Org mode uses both an effort and a deadline property. These elements will be parsed by orgclockr and the time spent, the average length of a clock interval, the number of clock intervals on a given day, the period of time on a task and the effort set returned per task. These informations allow for a more detailed clock report than the built-in org-clock-report in Org mode. The availability of the clocking data in a dplyr::data_frame object is of great benefit not only for filtering headings by elements, but also for calculating measures of work efficiency. Examples are provided in the next chapter.

Learning to set efforts properly can only be done from experience. Therefore making clocking a habit is indispensable. Setting the right efforts is especially useful when breaking down large projects into manageable parts. While Org mode comes in useful when pointing out that you are exceeding the effort set on the currently clocked task, I’d like to have a more general view on my efforts set for a whole project. Calculating the sum of the estimated time needed on the whole project or parts of it is more convenient in R than in Org Tables. Arguably the greatest benefit of orgclockr, though, is in creating time series and visualizing the results with various plotting libraries in R.

Exploring an Org File

Data: orgfile

The orgclockr package comes with the built-in dataset orgfile. This dataset in the form of a character vector illustrates the typical org file. For presentational purposes the file consists of only 100 lines but is enriched with various org elements. The object is the result from reading in an org file. Typically this is done with a combination of file() and readLines() in R:

file("/path/to/file.org") %>%
    readLines()

This package provides the raw data of orgfile, the sample.org file the object stems from, as well. Reading sample.org is simply done using system.file():

library(orgclockr)

system.file("extdata", "sample.org", package = "orgclockr") %>%
    readLines()

Extracting the Org Elements

orgclockr provides several extraction functions if you are only interested in a specific element of an org file. These start with extract_. Most commonly you’d want to extract several elements and store them in a dplyr::data_frame for further manipulation, which is done using org_elements_df(). The code given below filters the headings of the built-in dataset orgfile that are not tagged with TagThree. If you are not familiar with the manipulation functions of the dplyr library yet, you may start with the Data Wrangling Cheat Sheet provided by RStudio.

library(orgclockr)

f <- org_elements_df(orgfile)
filter(f, !grepl("TagThree", Tag), !is.na(Tag))
HeadlineCategoryTagLevelStateDeadlineEffort
HeadingOneCategoryOneTagOne1nilnilnil
TaskOnenilTagOne TagTwo2TODOnilnil

Extracting Clocking Information

While org_elements_df() extracts various elements from org headings, I decided to separate the clocking information from it. This is therefore returned from org_clock_df(), which will also result in a dplyr::data_frame object. As will be shown below, the local data frames returned from both functions can easily be joined using Headline as the index column. The following code returns the number of days a task has been clocked into. Do not confuse this with the sum of TimeSpent in days:

org_clock_df(orgfile) %>%
    group_by(Headline) %>%
    summarise(DaysOnTask = n())
HeadlineDaysOnTask
TaskEight2
TaskFive2
TaskNine1
TaskSeven1
TaskSix5
TaskTen1
TaskTwo2

The local data frame below sorts the tasks and days by the amount of time invested:

org_clock_df(orgfile) %>%
    filter(between(Date, as.Date("2015-01-01"), Sys.Date())) %>%
    group_by(Date, Headline) %>%
    summarise(TimeSpent) %>%
    ungroup() %>%
    arrange(desc(TimeSpent))
DateHeadlineTimeSpent
2015-01-19TaskTen334
2015-01-20TaskEight129
2015-01-05TaskSeven122
2015-02-28TaskFive51
2015-01-01TaskSix34
2015-02-05TaskEight23
2015-03-01TaskFive6
2015-01-19TaskNine2

The AvgClockInterval returns the mean or median interval for the task per day. You may be interested how the average time on a task has been over time:

org_clock_df(orgfile) %>%
    group_by(Headline) %>%
    summarise(AvgTimeOnTask = round(sum(TimeSpent)/sum(NIntervals), 2)) %>%
    arrange(desc(AvgTimeOnTask))
HeadlineAvgTimeOnTask
TaskSeven122
TaskTen55.67
TaskEight50.67
TaskSix46.4
TaskTwo10.5
TaskFive9.5
TaskNine2

After doing simple calculations on the clocking data you may want to visualize your time spent as a time series. The autoplot() takes a zoo object, which is particularly aimed at irregular time series:

library(zoo)

org_clock_df(orgfile) %>%
    select(Date, TimeSpent) %>%
    filter(between(Date, as.Date("2015-01-01"), Sys.Date())) %>%
    as.data.frame() %>%
    read.zoo(index.column = "Date") %>%
    autoplot.zoo(stat = "identity",
                 geom = "bar") +
                     scale_fill_gradient2(trans = "sqrt") +
                     aes(fill = Value) +
                     guides(fill = FALSE) +
                     theme_classic() +
                     ylab("Time Spent (min)") +
                     xlab("Date")

http://mutbuerger.github.io/images/orgclockr1.png

The plot below shows a very simple retrospective Gantt chart diagram, that takes the first and the last day clocked into a task as values:

org_clock_df(orgfile) %>%
    select(Date, Headline) %>%
    filter(between(Date, as.Date("2014-11-01"), Sys.Date())) %>%
    as.data.frame() %>%
    read.zoo(index.column = "Date") %>%
    autoplot.zoo(stat = "identity",
                 geom = "line") +
                     scale_color_brewer(type = "qual",
                                        palette = 2) +
                     aes(size	= 1,
                         colour = Value) +
                     guides(size	= FALSE,
                            colour	= FALSE) +
                     theme_classic() +
                     ylab("Task") + xlab("Date")

http://mutbuerger.github.io/images/orgclockr2.png

This plot is pretty much self-explanatory:

Palette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2",
             "#D55E00")
org_clock_df(orgfile) %>%
    group_by(Headline) %>%
    summarise(TimeSpent = sum(TimeSpent)) %>%
    ggplot(aes(Headline, sort(TimeSpent, decreasing = TRUE),
               fill = Palette)) +
                   geom_bar(stat  = "identity",
                            width = .5) +
                   theme_classic() +
                   guides(fill = FALSE) +
                   labs(x = "Task", y = "Time Spent (min)")

http://mutbuerger.github.io/images/orgclockr3.png

Joining the results of org_clock_df() and org_elements_df() is achieved with the various dedicated functions provided by the dplyr library. The following example uses a left_join(), because we want to omit the information on headings without any clocking information:

a_df <- org_clock_df(orgfile)
b_df <- org_elements_df(orgfile)
left_join(a_df, b_df) %>%
    group_by(Date, Headline) %>%
    summarise(TimeSpentTotal = sum(TimeSpent), Effort) %>%
    filter(Effort < TimeSpentTotal) %>%
    mutate(Overdue = TimeSpentTotal - Effort) %>%
    ungroup() %>%
    arrange(desc(Overdue))
DateHeadlineTimeSpentTotalEffortOverdue
2015-01-19TaskTen33425309
2015-01-20TaskEight12925104
2015-01-05TaskSeven1223092
2014-12-21TaskSix906030

The plot below is what I had in mind before writing orgclockr:

library(tidyr)

left_join(a_df, b_df) %>%
    select(Date, Headline, TimeSpent, Effort) %>%
    filter(!is.na(Effort)) %>%
    group_by(Headline) %>%
    summarise(TimeSpent = sum(TimeSpent),
              Effort = unique(Effort)) %>%
                  tidyr::gather(Variable, Value, TimeSpent:Effort) %>%
                  as.data.frame() %>%
                  ggplot() +
                  aes(Headline, Value,
                      fill = Variable) +
                          scale_fill_brewer(type = "qual",
                                            palette = 7) +
                          geom_bar(stat		= "identity",
                                   position	= "dodge") +
                                       theme_classic() +
                                       theme(legend.title	= element_blank(),
                                             legend.position	= "bottom") +
                          labs(x = "Task", y = "Time (min)")

http://mutbuerger.github.io/images/orgclockr4.png

We got a striking example of mostly under estimates and one over estimate here. This obviously should be avoided. The preceeding plot clearly suggests horrible work efficiency rates for the tasks depicted with the sole exception of TaskTwo near the desired value of one:

left_join(a_df, b_df) %>%
    select(Date, Headline, TimeSpent, Effort) %>%
    filter(!is.na(Effort)) %>%
    group_by(Headline) %>%
    summarise(TimeSpent = sum(TimeSpent),
              Effort = unique(Effort)) %>%
                  mutate(EfficiencyRate = round(Effort/TimeSpent, 2))
HeadlineTimeSpentEffortEfficiencyRate
TaskEight152250.16
TaskNine2240120
TaskSeven122300.25
TaskSix232600.26
TaskTen334250.07
TaskTwo21200.95

Limitations

This section may and hopefully will undergo changes in the future, so the list below is also a development roadmap:

  • [ ] Currently the tag inheritance provided by the inherit_tags parameter in org_elements_df() and the inherit parameter in extract_tags() only works for level one tags.
  • [ ] For simplicity reasons, clock intervals are not split at midnight. Keep this in mind when clocking for long periods of time spanning from one day to the next. This may impair the meaningfulness of the TimeSpent in org_clock_df().
  • [ ] Currently orgclockr doesn’t parse the ARCHIVE_ITAGS and ARCHIVE_CATEGORY in archived org files.

Further Reading

About

License:Other


Languages

Language:R 99.0%Language:Rebol 1.0%