Standardizing code and markdown separators

Question

Standardizing code and markdown separators

sho-87 opened this issue a year ago · comments

Would it be possible to add both opening and closing separators for code cells, especially when converting from an existing ipynb? This is what's currently done for markdown cells because docstrings require the closing tag, but there are a few benefits for enforcing this format for code cells too when converting:

it simplifies visual scanning in some ways because it fully encapsulates a code block - you know that everything between 2 separators belongs to the same chunk of code
if you want to have 2 code blocks in a row, you currently need to separate them with a # %%, which means the separator in that case is functioning as both a block end and block start indicator
it will make some future functionality easier to implement (whether in the core plugin, or defined by the user via keybinds). I was trying to add some keybinds to add new cells above/below the current, or to move cells, or to switch 2 cell locations, or to toggle between cell types (eg code -> markdown). it worked, but was difficult to reason about because of (2). with the new format, a new cell above will always be inserted before a start separator, new cell below will always be below an end separator, toggling cell type will always change 2 lines etc. this will simplify keybinding things like "new cell below" without needing to check for things like file-end, or the presence of a next separator
a stricter format would eliminate niche situations like "if there is one or more cells, it works as a notebook mode. Contents before the first cell are ignored, so use it as a heading", where markdown sometimes requires separators, and sometimes doesn't, depending on location and existence of other types of cells

so the proposed standard would be something like:

> myfile.ju.py

"""%%
Title
%%"""

# %%
code block 1
# <new closing tag>

# %%
code block 2
# <new closing tag>

The downside of course is that ju files become a bit more difficult to read as there will be more separators, but I think shortsighted mode helps a lot with this. And it would be fairly simple to add bg highlights to separator lines to indicate the cells (like the way jukit does it)

Kiyoon Kim · Answer 1 · Sat Feb 25 2023 19:51:37 GMT+0800 (China Standard Time)

I thought about the separator highlighting but I couldn't think of a easy solution for this yet.
Python jupynium server already parses the jupynium file and it knows the separator line locations, so it can maybe simply update the highlight.
If I want to make it highlight without the server, I need to reimplement the parser logic in lua, and in that case it will be harder to maintain.

Kiyoon Kim · Answer 2 · Sat Feb 25 2023 19:57:40 GMT+0800 (China Standard Time)

About your point 4: The reason it behaves like a markdown without a separator is that, I wanted it to work as intended when you open a markdown file. So you can use Jupynium as a markdown preview utility.

Kiyoon Kim · Answer 3 · Sat Feb 25 2023 20:09:23 GMT+0800 (China Standard Time)

This is the reasoning behind the current formatting.

It is almost 90% Jupytext compatible. So if people already use something like Jupyter Ascending and want to open their Jupytext files, they can simply do that without converting the file format. Even if some lines aren't parsed correctly it will be easy to fix some lines. Also, Jupytext is more standardised format so I don't want to create huge differences from that format.
The reason I had to modify a little from the Jupytext formatting is parsing. The aim of Jupytext and Jupynium is quite different: Jupytext provides offline parsing and conversion, but I wanted to make it possible to sync in real-time, without parsing the whole file for each keystroke. This will be very expensive. So Jupynium format follows this strict requirement:

Given one line of change, it needs to know if users want to 1. make cell, 2. delete cell, 3. change cell type, or 4. just modify cell content.
Of course, you can split cell too (variation of making cell)

If the format requires parsing a larger code block like what you suggested, then it's likely that we'll need to parse larger context and it will be slow. Even if we implement it efficiently, it will be a much more complicated parser, then it will be harder to maintain and contribution will be more difficult as well.

Kiyoon Kim · Answer 4 · Sat Feb 25 2023 20:13:41 GMT+0800 (China Standard Time)

I saw your dotfiles and I'm guessing you're suggesting it because you were adding key maps to add cell and modify cell types etc. I'd say the benefit of current formatting is that:

You don't need keymapping and in most cases just typing to manipulate cells isn't too difficult in my opinion. Less error-prone because users don't need to remember to close the cell.
If you want automation, it may require parsing previous cell and next cell separator. But still, it won't be too complicated.
If some features like that are useful, we can add an API to get and set current cell type etc.

Simon Ho · Answer 5 · Sun Feb 26 2023 00:57:30 GMT+0800 (China Standard Time)

ah that all makes sense - was not aware of the desire to keep format consistency with Jupytext

re: keymapping vs direct manipulation ... it wasn't a case of manipulating cells being too difficult, but primarily because if coming from other editors like Jupyter Lab/VScode, there are already keybinds (e.g. a/b) for quickly adding cells above and below the current that users may already be used to. so I was trying to mimic that workflow for my own setup

Simon Ho · Answer 6 · Sun Feb 26 2023 01:20:37 GMT+0800 (China Standard Time)

for parsing and highlighting, i noticed that some plugins like headlines just handles the highlighting and relies on a treesitter ft to handle all the parsing. i guess that would be too big of a dependancy in this case?

Kiyoon Kim · Answer 7 · Sun Feb 26 2023 01:28:27 GMT+0800 (China Standard Time)

for parsing and highlighting, i noticed that some plugins like headlines just handles the highlighting and relies on a treesitter ft to handle all the parsing. i guess that would be too big of a dependancy in this case?

I think treesitter dependency is okay, and since it already parses the code it makes sense to use treesitter. I haven't looked into it in detail, but it should be easy.

We need to check comments, whose content is # %% etc.
We need to check that the comment node is starting at the beginning of the line.

However, a difficulty is that treesitter parsers depends on each language. Since Jupynium can be used with Julia or R etc. it's probably making it harder to implement, compared to just simple line parsing.

Kiyoon Kim · Answer 8 · Tue Feb 28 2023 01:22:39 GMT+0800 (China Standard Time)

@sho-87 Added syntax highlighting (by modifying the shortsighted code) in #56