Base is a daemon for linux that continuously monitors and stores what you are doing on your computer. This way, pull statistics, visualize how you spend your time, attach a model to be a semantic memory store, and more. It is inspired by the Quantified Self-movement and Stephen Wolfram's personal key logging asa well as SelfSpy.
See Example Statistics, below, for some of the fabulous things you can do with this data.
To install Base, follow these steps:
-
Clone the Base repository:
git clone https://github.com/Alignment-Lab-AI/Base.git
-
Navigate to the Base directory:
cd Base
-
Run the installation command with sudo:
sudo make install
This command will create the necessary directories and symlinks:
- It creates a
/var/lib/Base
directory and installs the Python files from theBase/
directory into it. - It creates a
/usr/bin
directory if it doesn't already exist. - It creates symlinks in
/usr/bin
for the__init__.py
andstats.py
files, making them accessible asBase
andBaseview
commands respectively.
Note: The installation destination can be customized by setting the
DESTDIR
variable when runningmake install
. For example:sudo make install DESTDIR=/path/to/custom/directory
- It creates a
-
After installation, you can run Base using the
Base
command and view statistics using theBaseview
command.
That's it! Base should now be installed on your system and ready to use.
You run Base with Base
. You should probably start with Base --help
to get to know the command line arguments. As of this writing, it should look like this:
optional arguments:
-h, --help show this help message and exit
-c FILE, --config FILE
Config file with defaults. Command line parameters
will override those given in the config file. The
config file must start with a "[Defaults]" section,
followed by [argument]=[value] on each line.
-d DATA_DIR, --data-dir DATA_DIR
Data directory for Base, where the database is
stored. Remember that Base must have read/write
access. Default is ~/.Base
-n, --no-text Do not store what you type. This will make your
database smaller and less sensitive to security
breaches. Process name, window titles, window
geometry, mouse clicks, number of keys pressed and key
timings will still be stored, but not the actual
letters. Key timings are stored to enable activity
calculation in selfstats. If this switch is used,
you will never be asked for password.
-r, --no-repeat Do not store special characters as repeated
characters.
Everything you do is stored in a Sqlite database in your DATA_DIR. Each day theyre saved to parquet files which are unilaterally queried along with the sqlite database via duckdb they contain a variety of useful columns, like process names and window titles
Unless you use the --no-text flag, Base will store everything you type in two columns in the database.
"OK, so now all this data will be stored, but what can I use it for?"
While you can access the Sqlite/duckdb tables directly or, if you like Python, import models.py
from the Base directory and use those duckdb tables, the standard way to query your data is intended to be via a language model.
Here are some model-agnostic use cases:
"Damn! The browser just threw away everything I wrote, because I was not logged in." "Show me everything I have written the last 30 minutes."
selfstats --back 30 m --showtext
"Hmm.. what is my password for Hoolaboola.com?" "show me everything I have ever written in Chrome, where the window title contained something with "Hoolaboola"."
selfstats -T "Hoolaboola" -P Google-chrome --showtext
"I need to remember what I worked on a few days ago, for my time report." "What buffers did I have open in Emacs on the tenth of this month and one day forward? Sort by how many keystrokes I wrote in each. This only works if I have set Emacs to display the current buffer in the window title. In general, try to set your programs (editors, terminals, web apps, ...) to include information on what you are doing in the window title. This will make it easier to search for later. On a related but opposite note: if you have the option, remove information like "mails unread" or "unread count" (for example in Gmail and Google Reader) from the window titles, to make it easier to group them in --tactive and --tkeys."
selfstats --date 10 --limit 1 d -P emacs --tkeys
"Also, when and how much have I used my computer this last week?" "display my active time periods for the last week. consider it inactive when I have not clicked or used the keyboard in 180 seconds."
selfstats -b 1 w --periods 180
"How effective have I been this week?" "show ratios informing me about how much I have written per active second and how much I have clicked vs used the keyboard. cause = a lot of clicking means too much browsing or inefficient use of my tools."
selfstats -b 1 w --ratios
"I remember that I wrote something to her about the IP address of our printer a few months ago. I can't quite remember if it was a chat, a tweet, a mail, a facebook post, or what.. Should I search them separately? No."
"Find all texts where I mentioned the word 'printer' in the last 10 weeks"
selfstats --body printer -s --back 40 w
"What programs do I use the most?"
"List all programs I have ever used, ordered by time spent active in them"
selfstats --pactive
"Which questions on the website Stack Overflow did I visit yesterday?"
"Show me all window titles containing 'Stack Overflow' from the last 32 hours, sorted by active time"
./selfstats -T "Stack Overflow" -P Google-chrome --back 32 h --tactive
"How much have I browsed today?"
"List all the different pages I visited in Chrome today, ordered by active time"
selfstats -P Google-chrome --clock 00:00 --tactive
"Who needs Qwerty? I am going to make an alternative super-programmer-keymap. I wonder what keys I use the most when I code C++?"
"Show me the most frequently pressed keys in Emacs while editing files with 'cpp' in the name"
selfstats --key-freq -P Emacs -T cpp
"While we are at it, which cpp files have I edited the most this month?"
"List the cpp files I edited in Emacs this month, sorted by amount typed"
selfstats -P Emacs -T cpp --tkeys --date 1
Selfstats is a swiss army knife of self knowledge. Experiment with it when you have acquired a few days of data. Remember that if you know SQL or SqlAlchemy, it is easy to construct your own queries against the database to get exactly the information you want, make pretty graphs, etc. There are a few stored properties, like coordinates of a mouse click and window geometry, that you can currently only reach through the database.
The --help is a beast that right now looks something like this:
usage: selfstats [-h] [-c FILE] [-d DATA_DIR] [-s]
[-D DATE [DATE ...]] [-C CLOCK] [-i ID]
[-b BACK [BACK ...]] [-l LIMIT [LIMIT ...]] [-m nr]
[-T regexp] [-P regexp] [-B regexp] [--ratios] [--clicks]
[--key-freqs] [--human-readable] [--active [seconds]] [--periods [seconds]]
[--pactive [seconds]] [--tactive [seconds]] [--pkeys]
[--tkeys]
Calculate statistics on Base data. Per default it will show non-text
information that matches the filter. Adding '-s' means also show text. Adding
any of the summary options will show those summaries over the given filter
instead of the listing. Multiple summary options can be given to print several
summaries over the same filter. If you give arguments that need to access text
/ keystrokes, you will be asked for the decryption password.
optional arguments:
-h, --help show this help message and exit
-c FILE, --config FILE
Config file with defaults. Command line parameters
will override those given in the config file. Options
to Base goes in the "[Defaults]" section, followed
by [argument]=[value] on each line. Options specific
to selfstats should be in the "[Selfstats]" section,
though "data-dir" is still read from "[Defaults]".
-d DATA_DIR, --data-dir DATA_DIR
Data directory for Base, where the database is
stored. Remember that Base must have read/write
access. Default is ~/.Base
-s, --showtext Also show the text column. This switch is ignored if
at least one of the summary options are used. Requires
password.
-D DATE [DATE ...], --date DATE [DATE ...]
Which date to start the listing or summarizing from.
If only one argument is given (--date 13) it is
interpreted as the closest date in the past on that
day. If two arguments are given (--date 03 13) it is
interpreted as the closest date in the past on that
month and that day, in that order. If three arguments
are given (--date 2012 03 13) it is interpreted as
YYYY MM DD
-C CLOCK, --clock CLOCK
Time to start the listing or summarizing from. Given
in 24 hour format as --clock 13:25. If no --date is
given, interpret the time as today if that results in
sometimes in the past, otherwise as yesterday.
-i ID, --id ID Which row ID to start the listing or summarizing from.
If --date and/or --clock is given, this option is
ignored.
-b BACK [BACK ...], --back BACK [BACK ...]
--back <period> [<unit>] Start the listing or summary
this much back in time. Use this as an alternative to
--date, --clock and --id. If any of those are given,
this option is ignored. <unit> is either "s"
(seconds), "m" (minutes), "h" (hours), "d" (days) or
"w" (weeks). If no unit is given, it is assumed to be
hours.
-l LIMIT [LIMIT ...], --limit LIMIT [LIMIT ...]
--limit <period> [<unit>]. If the start is given in
--date/--clock, the limit is a time period given by
<unit>. <unit> is either "s" (seconds), "m" (minutes),
"h" (hours), "d" (days) or "w" (weeks). If no unit is
given, it is assumed to be hours. If the start is
given with --id, limit has no unit and means that the
maximum row ID is --id + --limit.
-m nr, --min-keys nr Only allow entries with at least <nr> keystrokes
-T regexp, --title regexp
Only allow entries where a search for this <regexp> in
the window title matches something. All regular expressions
are case insensitive.
-P regexp, --process regexp
Only allow entries where a search for this <regexp> in
the process matches something.
-B regexp, --body regexp
Only allow entries where a search for this <regexp> in
the body matches something. Do not use this filter
when summarizing ratios or activity, as it has no
effect on mouse clicks. Requires password.
--clicks Summarize number of mouse button clicks for all
buttons.
--key-freqs Summarize a table of absolute and relative number of
keystrokes for each used key during the time period.
Requires password.
--human-readable This modifies the --body entry and honors backspace.
--active [seconds] Summarize total time spent active during the period.
The optional argument gives how many seconds after
each mouse click (including scroll up or down) or
keystroke that you are considered active. Default is
180.
--ratios [seconds] Summarize the ratio between different metrics in the
given period. "Clicks" will not include up or down
scrolling. The optional argument is the "seconds"
cutoff for calculating active use, like --active.
--periods [seconds] List active time periods. Optional argument works same
as for --active.
--pactive [seconds] List processes, sorted by time spent active in them.
Optional argument works same as for --active.
--tactive [seconds] List window titles, sorted by time spent active in
them. Optional argument works same as for --active.
--pkeys List processes sorted by number of keystrokes.
--tkeys List window titles sorted by number of keystrokes.
To monitor that Base works as it should and to continuously get feedback on yourself, it is good to regularly mail yourself some statistics. I think the easiest way to automate this is using sendEmail, which can do neat stuff like send through your Gmail account.
For example, put something like this in your weekly cron jobs:
/(PATH_TO_FILE)/selfstats --back 1 w --ratios 900 --periods 900 | /usr/bin/sendEmail -q -u "Weekly selfstats" <etc..>
This will give you some interesting feedback on how much and when you have been active this last week and how much you have written vs moused, etc.
- integrate a model api, and ideally a model trained to make duckdb queries intellgently
- integrate a part of speech tagging model to categorize on an additional axis based on key terms
- develop a ui to chat with the model comfortably and display the activity the user was participating in at any given moment
- integrate tooling for a local model to manage and schedule a task list based on the context of the current conversation
- integrate an efficient stt model to transcribe all audio and store it as well
- integrate an OCR model to capture any text which may get missed, and to parse over the already collected screenshots in current databases that lack the OCR functionality