jobdiogenes / stencila

Stencila

Home Page:http://stenci.la/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Stencila

Programmable, reproducible, interactive documents

👋 Intro 🚴 Roadmap 📜 Docs 📥 Install 🛠️ Develop

🙏 Acknowledgements 💖 Supporters 🙌 Contributors



👋 Introduction

Stencila is a platform for creating and publishing, dynamic, data-driven content. Our aim is to lower the barriers for creating truly programmable documents, and to make it easier to publish them as beautiful, interactive, and semantically rich, articles and applications. Our roots are in scientific communication, but our tools are useful far beyond.

This is v2 of Stencila, a rewrite in Rust focussed on the synergies between three recent and impactful innovations and trends:

We are embarking on a rewrite because CRDTs will now be the foundational synchronization and storage layer for Stencila documents. This requires fundamental changes to most other parts of the platform (e.g. how changes are applied to dynamic documents). Furthermore, a rewrite allow us to bake in, rather than bolt on, new modes of interaction between authors and LLM assistants and add mechanisms to mitigate the risks associated with using LLMs (e.g. by recording the actor, human or LLM, that made the change to a document). Much of the code in the v1 branch will be reused (after some tidy-ups and refactoring), so v2 is not a complete rewrite.

🎥 Showcase

Simultaneously editing the same document in different formats

Here, a Stencila Article has previously been saved to disk as a CRDT in main.sta. Then, the sync command of the CLI is used to simultaneously synchronize the CRDT with three files, in three different formats currently supported in v2: JATS XML, JSON, and Markdown. Changes made in one file (here, in VSCode) are merged into the in-memory CRDT and written to the other files.

You'd probably never want to do this just by yourself. But this demo illustrates how Stencila v2 will be enable collaboration across formats on the same document. Any particular format (e.g. Markdown, LaTeX, Word) is just one of the potential user interfaces to a document.

file-sync-2023-09-29.mp4

🚴 Roadmap

Our general strategy is to iterate horizontally across the feature set, rather than fully developing features sequentially. This will better enable early user testing of workflows and reduce the risk of finding ourselves painted into an architectural corner. So expect initial iterations to have limited functionality and be buggy.

We'll be making alpha and beta releases of v2 early and often across all products (e.g. CLI, desktop, SDKs). We're aiming for a 2.0.0 release by the end of Q3 2024.

🟢 Stable • 🔶 Beta • ⚠️ Alpha • 🚧 Under development • 🧪 Experimental • 🧭 Planned • ❔ Maybe

Schema

The Stencila Schema is the data model for Stencila documents. Most of the schema is well defined but some document node types are still marked as under development. A summary by category:

Category Description Status
Works Types of creative works (e.g. Article, Figure, Review) 🟢 Stable; mostly based on schema.org
Prose Types used in prose (e.g. Paragraph, List, Heading) 🟢 Stable; mostly based on HTML, JATS, Pandoc etc
Code Types for executable (e.g. CodeChunk) and non-executable code (e.g.CodeBlock) 🔶 Beta; may change
Math Types for math symbols and equations (e.g.MathBlock) 🔶 Beta; may change
Data Fundamental data types (e.g.Number) and validators (e.g. NumberValidator) 🔶 Beta; may change
Style Types for styling parts of a documents (Span and Division) 🚧 Under development; likely to change
Flow Types for control flow within a document (e.g. If, For, Call) 🚧 Under development; likely to change

Storage and synchronization

In v2, documents can be stored as binary Automerge CRDT files, forked and merged, and with the ability to import and export the document in various formats. Collaboration, including real-time, is made possible by exchanging fine-grained changes to the CRDT over the network. In addition, we want to enable interoperability with a Git-based workflow.

Functionality Description Status
Documents read/write-able Able to write a Stencila document to an Automerge binary file and read it back in ⚠️ Alpha; needs more testing
Documents import/export-able Able to import or export document as alternative formats, using tree diffing to generate CRDT changes ⚠️ Alpha; needs more testing
Documents fork/merge-able Able to create a fork of a document in another file and then later merge with the original 🧭 Planned Q4 2023
Documents diff-able Able to view a diff, in any of the supported formats, between versions of a document and between a document and another file 🧭 Planned Q4 2023
Git merge driver CLI can act as a custom Git merge driver 🧭 Planned Q4 2023
Relay server Documents can be synchronized by exchanging changes via a relay server 🧭 Planned Q4 2023
Rendezvous server Documents can be synchronized by exchanging changes peer-to-peer using TCP or UDP hole punching ❔ Maybe

Formats

Interoperability with existing formats has always been a key feature of Stencila. We are bringing over codecs (a.k.a. converters) from the v1 branch and porting other functionality from encoda to Rust.

Format Encoding Decoding Notes
JSON 🟢 🟢
JSON5 🟢 🟢
YAML 🟢 🟢
Plain text 🔶 -
HTML 🚧 🧭
JATS 🚧 🚧 Planned for completion Q4 2023. Port decoding and tests from encoda
Markdown 🚧 🧭 Planned Q4 2023 v1
R Markdown 🧭 🧭 Relies on Markdown; v1
Jupyter Notebook 🧭 🧭 Relies on Markdown; v1
Scripts 🧭 🧭 Relies on Markdown; v1
Pandoc 🧭 🧭 Planned Q4 2023. v1
LaTeX 🧭 🧭 Relies on Pandoc; v1; discussion
Org 🧭 🧭 Relies on Pandoc; PR
Microsoft Word 🧭 🧭 Relies on Pandoc; v1
ODT 🧭 🧭 Relies on Pandoc
Google Docs 🧭 🧭 Planned Q1 2024 v1
PDF 🧭 🧭 Planned Q1 2024, relies on HTML; v1
Codec Plugin API 🧭 🧭 An API allowing codecs to be developed as plugins in Python, Node.js, and other languages

Kernels

Kernels are what executes the code in Stencila CodeChunks and CodeExpressions, as well as in control flow document nodes such as IfClause and For. In addition to supporting interoperability with existing Jupyter kernels, we will bring over microkernels from v1. Microkernels are lightweight kernels for executing code which do not require separate installation and allow for parallel execution. We'll also implement at least one kernel for an embedded scripting language so that it is possible to author a Stencila document which does not rely on any other external binary.

Kernel Purpose Status
Embedded lang kernel Default language for executable code 🧭 Planned Q4 2023. Probably Rune or Rhai but could be RustPython
Bash microkernel Execute Bash code in documents 🧭 Planned Q4 2023; v1
Zsh microkernel Execute Zsh code in documents 🧭 Planned Q4 2023; v1
Python microkernel Execute Python code in documents 🧭 Planned Q4 2023; v1
R microkernel Execute R code in documents 🧭 Planned Q4 2023; v1
Node.js microkernel Execute JavaScript code in documents 🧭 Planned Q4 2023; v1
Deno microkernel Execute TypeScript code in documents ❔ Maybe; v1
SQL microkernel Execute SQL code in documents 🧭 Planned Q1 2024; v1
Jupyter kernel bridge Execute code in Jupyter kernels 🧭 Planned Q1 2024; v1
HTTP kernel Interact with RESTful APIs from within documents ❔ Maybe; v1

Actors

In Stencila v2, non-human changes to the document will be performed, concurrently, by various actors. Actors will listen for changes to document and react accordingly. For example, a LLM actor might listen for the insertion of a paragraph starting with "!add a code chunk to read in and summarize mydata.csv" and do just that. We'll be starting by implementing relatively simply actors but to avoid being painted into a corner will probably implement one LLM-base actor relatively early on.

Actor Purpose Status
MathML Update the mathml property of Math nodes when the code property changes 🧭 Planned Q4 2023
Tailwind Update the classes property of Styled nodes when the code property changes 🧭 Planned Q4 2023 v1
Parsers Update the executionDependency etc properties of CodeExecutable nodes when the code or programmingLanguage properties change 🧭 Planned Q4 2023 v1
Reactor For reactivity, maintain a dependency graph between nodes and update executionRequired of executable nodes when executionDependency or executionStatus of other nodes changes. 🧭 Planned Q4 2023 v1
Executor Execute nodes when their executionRequired property and update their executionStatus, output, etc properties 🧭 Planned Q4 2023
Actor Plugin API An API allowing actors to be developed as plugins in Python, Node.js, and other languages 🧭 Planned Q4 2023 to allow prototypes of Coder and Writer actors as plugins
Coder An LLM actor that creates and edits CodeExecutable nodes 🧭 Planned Q1 2024
Writer An LLM actor that creates and edits prose nodes 🧭 Planned Q1 2024
CitationIntent An AI actor that suggests a CitationIntent for Cite nodes ❔ Maybe

Editors

Editors allow users to edit Stencila documents, either directly, or via an intermediate format.

Interface Purpose Status
File watcher Edit documents via other formats and tools (e.g. code editors, Microsoft Word) and react on file change ⚠️ Alpha
Code editor Edit documents via other formats using a built-in code editor and react on key presses 🧭 Planned Q4 2023 v1
Visual editor Edit documents using a built-in visual editor and react on key presses and widget interactions 🧭 Planned Q1 2024 v1

Tools

Tools are what we call the self-contained Stencila products you can download and use locally on your machine to interact with Stencila documents.

Tool Purpose Status
CLI Manage documents from the command line and read and edit them using a web browser ⚠️ Alpha
Desktop Manage, read and edit documents from a desktop app 🧭 Planned Q1 2024, likely using Tauri
VSCode extension Manage, read and edit documents from within VSCode ❔ Maybe

SDKs

Stencila's software development kits (SDKs) enable developers to create plugins to extend Stencila's core functionality or to build other tools on top of. At this stage we are planning to support Python, Node.js and R but more languages may be added if there is demand.

Language Description Status
Python Types and functions for using Stencila from within Python 🚧 In progress, expected completion early Q4 2023
TypeScript JavaScript classes and TypeScript types for the Stencila Schema
Node.js Types and functions for using Stencila from within Node.js 🚧 In progress, expected completion early Q4 2023
R Types and functions for using Stencila from within R 🧭 Planned Q4 2023

📜 Documentation

At this stage, documentation for v2 is mainly reference material, much of it generated:

More reference docs as well as guides and tutorial will be added over the coming months. We will be bootstrapping the publishing of all docs (i.e. to use Stencila itself to publish HTML pages) and expect to have an initial published set in Q4 2023.

📥 Install

Although v2 is in early stages of development, and functionality may be limited or buggy, we are releasing alpha versions of the Stencila CLI and SDKs. Doing so allows us to get early feedback and monitor what impact the addition of features has on build times and distribution sizes.

CLI

Windows

To install the latest release download stencila-<version>-x86_64-pc-windows-msvc.zip from the latest release and place it somewhere on your PATH.

MacOS

To install the latest release in /usr/local/bin,

curl -L https://raw.githubusercontent.com/stencila/stencila/main/install.sh | bash

To install a specific version, append -s vX.X.X. Or, if you'd prefer to do it manually, download stencila-<version>-x86_64-apple-darwin.tar.xz from the one of the releases and then,

tar xvf stencila-*.tar.xz
cd stencila-*/
sudo mv -f stencila /usr/local/bin # or wherever you prefer
Linux

To install the latest release in ~/.local/bin/,

curl -L https://raw.githubusercontent.com/stencila/stencila/main/install.sh | bash

To install a specific version, append -s vX.X.X. Or, if you'd prefer to do it manually, download stencila-<version>-x86_64-unknown-linux-gnu.tar.xz from the one of the releases and then,

tar xvf stencila-*.tar.xz
mv -f stencila ~/.local/bin/ # or wherever you prefer
Docker

The CLI is also available in a Docker image you can pull from the Github Container Registry,

docker pull stencila/stencila

and use locally like this for example,

docker run -it --rm -v "$PWD":/work -w /work --network host stencila/stencila --help

The same image is also published to the Github Container Registry if you'd prefer to use that,

docker pull ghcr.io/stencila/stencila

SDKs

TypeScript

Use your favorite package manager to install @stencila/types:

npm install @stencila/types
yarn add @stencila/types
pnpm add @stencila/types

🛠️ Develop

This repository is organized into the following modules. Please see their respective READMEs, where available, for guides to contributing.

  • schema: YAML files which define the Stencila Schema, an implementation of, and extensions to, schema.org, for programmable documents.

  • json: A JSON Schema and JSON LD @context, generated from Stencila Schema, which can be used to validate Stencila documents and transform them to other vocabularies

  • rust: Several Rust crates implementing core functionality and a CLI for working with Stencila documents.

  • python: A Python package, with classes generated from Stencila Schema and bindings to Rust functions, so you can work with Stencila documents from within Python.

  • typescript: A package of TypeScript types generated from Stencila Schema so you can create type-safe Stencila documents in the browser, Node.js, Deno etc.

  • node: A Node.js package, using the generated TypeScript types and with runtime validation and bindings to Rust functions, so you can work with Stencila documents from within Node.js.

  • docs: Documentation, including reference documentation generated from schema and the rust CLI tool.

  • examples: Example of documents conforming to Stencila Schema, mostly for testing purposes.

🙏 Acknowledgements

Stencila is built on the shoulders of many open source projects. Our sincere thanks to all the maintainers and contributors of those projects for their vision, enthusiasm and dedication. But most of all for all their hard work! The following open source projects in particular have an important role in the current version of Stencila. We sponsor these projects where, and to an extent, possible through GitHub Sponsors and Open Collective.

Link Summary
Automerge A Rust library of data structures for building collaborative applications.
Clap A Command Line Argument Parser for Rust.
NAPI-RS A framework for building pre-compiled Node.js addons in Rust.
PyO3 Rust bindings for Python, including tools for creating native Python extension modules.
Rust A multi-paradigm, high-level, general-purpose programming language which emphasizes performance, type safety, and concurrency.
Serde A framework for serializing and deserializing Rust data structures efficiently and generically.
Similar A Rust library of diffing algorithms including Patience and Hunt–McIlroy / Hunt–Szymanski LCS.
Tokio An asynchronous runtime for Rust which provides the building blocks needed for writing network applications without compromising speed.

💖 Supporters

We wouldn’t be doing this without the support of these forward looking organizations.

🙌 Contributors

Thank you to all our contributors (not just the ones that submitted code!). If you made a contribution but are not listed here please create an issue, or PR, like this.

Ackerley Tng Aleksandra Pawlik Alex Ketch Ben Shaw Colette Doughty Daniel Beilinson Daniel Ecer
Daniel Mietchen Daniel Nüst Danielle Robinson Dave David Moulton Finlay Thompson Fábio H. K. Mendes
J Hunt Jacqueline James Webber Jure Triglav Lars Willighagen Mac Cowell Markus Elfring
Michael Aufreiter Morane Gruenpeter MorphicResonance Muad Abd El Hay Nokome Bentley Oliver Buchtala Raniere Silva
Remi Rampin Rich Lysakowski Robert Gieseke Seth Vincent Stefan Fritsch Suminda Sirinath Salpitikorala Dharmasena Tim McNamara
Titus Tony Hirst Uwe Brauer Vanessasaurus Vassilis Kehayas alexandr-sisiuc asisiuc
campbellyamane ern0 - Zalka Ernő grayflow happydentist huang12zheng ignatiusm jmhuang
jon r kitten solsson taunsquared yasirs

About

Stencila

http://stenci.la/

License:Apache License 2.0


Languages

Language:Rust 71.2%Language:TypeScript 15.9%Language:Python 11.6%Language:Makefile 0.6%Language:Shell 0.4%Language:JavaScript 0.2%Language:Dockerfile 0.1%