Releases or build support

Question

Releases or build support

joshfinley opened this issue a year ago · comments

First of all, this is an amazing project you have put together. You’ve taken a simple static site generator and transformed it into a powerful tool for organizing complex written content and rendering it elegantly.

Because of this, I would love to use this as a platform for my own writing. Judging by the number of forks of this project, I believe others would as well.

That being said, would it be possible for a release version or some sort of build support and usage documentation to be provided?

I understand that gwern.net is a bespoke tool and was never intended for this purpose, but it would be amazing if other people could use it.

Gwern Branwen · Answer 1 · Wed Sep 27 2023 23:20:34 GMT+0800 (China Standard Time)

This is a WONTFIX because we don't see a release, or widespread use, as either particularly feasible or desirable at this point. The site+codebase is best seen as a prototype and demo to inspire other cleaner implementations.

Gwern.net is a testbed, showcase, and highly-opinionated personal wiki—but it is not finished and the backend is now an atrocious pile of hacks pushing the static site paradigm & Pandoc far beyond where they should be. I am somewhat optimistic that the overall design is stabilizing now that we have good client-side transclusions, but the backend needs a complete rewrite to a real database + dynamic site (ie. a regular wiki or CMS) and a non-Pandoc (and perhaps non-Markdown) language. No one should be trying to use the current backend (including me). The final output is pretty good, but the sausage factory would not pass an FDA inspection.

The JS frontend is reasonably high-quality due to Said Achmiz continually refactoring it (although we continue to work our way through feedback to finetune the experience and deal with endless occasional bugs), and parts of it can be profitably deployed elsewhere, but only parts, because it needs an extensive backend.
And the backend has many major issues.
Problems range from persistent issues with HTML↔Markdown↔AST due to Pandoc limitations to unscalable approaches like editing a large YAML file to create annotations to site compilation now requiring 10+ hours on a beefy workstation due to all the slowdowns (especially from the insanely slow vector search of rp-tree) to a link metadata schema that desperately needs a rewrite to drop the DOI field & add a creation date & add a more flexible [(String,String)] association list (to store miscellaneous metadata like the DOI & affiliation) to rewriting everything from String to Text (constant overhead in source code, development, and runtime)...
All that aside, while I've made some progress on factoring out 'configuration' data, there is still a lot of Gwern.net-hardwired assumptions that any would-be user would keep running into.

Indeed, I'm not sure the current backend even can be shipped. Haskell versioning aside (I'm on some old GHC, I think), there's at least two private forks I've had to make that I can recall: removing line IDs from Pandoc skylighting in code blocks, because it has no available override or way to configure it, and adding symlink support to Hakyll for copying files—as with 100GB+ of files, it would take a long time to unnecessarily copy everything into the 'compiled' site and I don't have the disk space now to do that even if I wanted to wait.

Had I known it would come to this, I probably would've never tried to do it with Pandoc+Hakyll, but of course, there was no way to know any of that without trying and engaging in constant iteration, so, here we are.

The current 'plan', such as it is, is to just keep going and incrementally patch issues until everything stabilizes and hopefully we get clarity on what the right architecture will be for a proper rewrite. I've had some thoughts about whether it could be done in org-mode, and I've been outlining some ideas about how to redesign personal wikis + text editors from the ground up for the new neural net age which might supersede Gwern.net entirely at some point.

Josh Finley · Answer 2 · Fri Sep 29 2023 06:05:04 GMT+0800 (China Standard Time)

Hello,

Thank you for sharing such a comprehensive look into the backend complexities of your project. I appreciate the candor; it’s certainly not uncommon to see an ambitious endeavor spiral into a tangle of technical challenges.

That being said, your site has several standout features. From design elements like a minimalist aesthetic and marginal footnotes, to functional components like archiving and automated link / PLOS/PMCID abstract extraction—these are features that I and many other content creators would find beneficial for producing long-form content in a static blog format.

Given the intricacies you've pointed out, it got me thinking: could some of the simpler features be distilled into a more straightforward static site project? Perhaps a Hugo theme or an extension for another static site generator? It's a thought I'm throwing into the wind, but I feel that some of the non-backend-heavy features could find life in other projects. This is something I may explore if I can find the time.

Thanks again for your insights, and best of luck with the future of Gwern.net and your ongoing exploration of new technologies.

Gwern Branwen · Answer 3 · Fri Sep 29 2023 09:06:08 GMT+0800 (China Standard Time)

The sidenotes/margin-notes I believe can be reused more or less as-is. The sidenotes JS should be standalone, and the margin notes are nothing but a simple <span class="marginnote">margin note</span> wrapper that anyone can write in a Markdown file (or using the Pandoc span syntax, [margin note]{.marginnote}) without further ado.

functional components like archiving and automated link / PLOS/PMCID abstract extraction

This could probably be split out of the backend relatively independently, but they are in the uncanny valley of being mostly glue code/special-cases/'schlep'. That makes it tricky to make them useful in general without a lot of overhead or architecture astronauting.

For example, the archiving code right now is much more specialized than, say, ArchiveBox, because the important parts are the logic of the whitelist and manual review to ensure quality and bookkeeping of what's been downloaded where.

And the PLOS/PMC code mostly outsources the work to a suite of R libraries, which I regret to say are bitrotting and are probably going to break outright in a few years as they are now unmaintained, so there's not all that much value to splitting them out.

Given the intricacies you've pointed out, it got me thinking: could some of the simpler features be distilled into a more straightforward static site project?

It's hard to say. Most of it builds on each other. If you have references and footnotes, you want to be able to quickly view them; hence the whole popup system to begin with; if you have many references, then to be useful they need to not be dead links everywhere, hence the archive system; with many different reference sources, it's help to annotate them by domain/filetype (if nothing else, to warn readers about PDFs, but the easiest way to do that is to just archive each URL and see if it turned out to be a PDF or not, as URLs are often misleading about the final filetype); if you have copies of references then you can view them in popups, but the experience of squinting at a PDF inside a popup is not great, especially when the abstract is often typeset even smaller, so you want to display the abstract at full size, so you get the annotation system, and you don't want to write them by hand, so you automate sources like Arxiv or Wikipedia; the more annotations you have, the more useful it becomes to collect them under tags, as otherwise you find yourself building up long lists of citations by hand and engaged in lots of copy-paste, so you need tag-directories and a tag metadata system; if they are hyperlinked to each other in addition to the tags, then the reverse citations become important, so you need bidirectional backlinks; and so on and so forth. If you try to stop partway, it's obviously bad. Like, you could have popups & annotations but only for within-website essays and Wikipedia and Arxiv, but then that's problematic for users because they have to learn that a very small subset of links will popup and have annotations, and then why not all the others...?

You can borrow the theme on its own, but to be honest, I consider that to be one of the least important parts. There are lots of minimalist themes and there's no arguing taste, so it's not really important to make a 'Gwern.net template' which has a bunch of black-and-white and boxes and a dropcap or SVG flourish here or there. (If anything, I think people should design & customize their own theme to express themselves rather than just copying my theme.)