NightMachinery / r_rational

r/rational archived in plain-text org-mode (good for, e.g., doing offline full-text searches)

Home Page:https://sourcegraph.com/search?q=context%3Aglobal+file%3Aindices%2F.*.org+repo%3A%5Egithub%5C.com%2FNightMachinery%2Fr_rational%24+&patternType=regexp&groupBy=path

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

r/rational in org-mode

This is an archive of all the posts in the r/rational subreddit in plain-text org-mode.

I personally use it to do fast, offline full-text searches on the whole subreddit.

Reddit does not map cleanly to org-mode, so I am open to ideas on changing the template used to create the org-mode files.

Github renders org headings as HTML headers, which doesn’t work at all for these. Use an org-mode viewer to view the files or just open them as plain-text.

readme.org_imgs/20210531_054346_t1GssN.png

readme.org_imgs/20210531_054821_vKtPi3.png

Full-text search guides

Online full-text search via Sourcegraph

This search engine was optimized for searching code, so it is not too suitable for our purposes, but it’s still much better than Reddit’s own search.

Here is Sourcegraph’s query syntax. The important point is that it supports regular expressions and assumes the words are in the correct order, unless you use boolean operators such as japanese AND horror.

Note that the link above searches in the indices directory, where each file contains only a single comment. This is usually what you want . (It’s only drawback being that it’s tedious to find the comments around the found results.) To search per submission (instead of per comment), use this link, which searches the posts directory instead.

readme.org_imgs/20210601_003236_9uj3rV.png

Searching via ugrep

Install GitHub - Genivia/ugrep: 🔍NEW ugrep v3.3: ultra fast grep with interactive que… by, e.g.,

brew install ugrep

Now paste this function into your shell:

ugc () {
    ugrep --heading --color=always --pretty --context=3 --recursive --bool --smart-case '--sort=best' --no-confirm --perl-regexp --hidden '--binary-files=without-match' "$@" | less -n
}

Now you can do:

git clone --recursive https://github.com/NightMachinary/r_rational
cd r_rational/posts
ugc 'japanese horror'

readme.org_imgs/20210531_174125_jXIQ5n.png

ugrep also supports an interactive, incremental search mode:

function ugci {
    local r="${@[-1]}" opts=("${@[1,-2]}")

    ugrep --heading --color=always --pretty --context=3 --recursive --bool --smart-case '--sort=best' --no-confirm --perl-regexp --hidden '--binary-files=without-match' "$opts[@]" --query=1 --regexp="$r"
}
ugci 'japanese horror'

FAQ

Reduce storage costs by deleting indices

This directory saves each comment to a single file, which is very inefficient on modern OSes with a block size of 4KB. If you don’t use these files, deleting them will reduce the size of this repo by a lot (as of this writing, the posts directory is only 163MB). You can also delete the .git directory, but then you would lose access to git features such as pulling new updates.

Search excluding the authors’ names

The easiest way to achieve this is to delete the authors’ names from the data using a search-and-replace tool such as ms-jpq/sad:

fd . | sad '\s*:author:.*' ''

fd . | sad 'u/\S+' 'u/redacted'

How was this repo made?

This repo was created using this script, which needs some refactoring to be decoupled from my environment.

I plan to keep the repo up-to-date as new posts are added to the subreddit.