asg017 / dataflow

An experimental self-hosted Observable notebook editor, with support for FileAttachments, Secrets, custom standard libraries, and more!

Home Page:https://alexgarcia.xyz/dataflow/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reload FileAttachments when FA is added/removed

asg017 opened this issue · comments

/*
FileAttachments:
  a: ./a.txt
  b: ./b.txt
  c: ./c.txt
*/

files = [FileAttachment("a"), FileAttachment("b"), FileAttachment("c")]

When a.txt gets deleted, the rendered notebook should reflect that (so authors know if they accidentally broke their notebook). When b.txt gets added, the rendered notebook should reflect that (so authors know if they "fixed' their notebook).

But if c.txt updates (ie file metadata change or contents change), then FileAttachment shouldn't refresh, that's the job for LiveFileAttachment.

Interesting. I suppose the intended difference is "one could be compiled in, another cannot?"

But probably we can compile whatever version of file is available at compilation time, and otherwise make every FileAttachment live?

Yeah, I have a few reasons why I want FileAttachments to "refresh" when changes are made:

  1. To ensure that what you see when running dataflow run notebook.ojs will be similar to what you see after running dataflow compile notebook.ojs. So if you accidentally delete a FA or change the path or something, you should know right away
  2. Faster integration when changing data sources, so when you change a FA's path, the notebook can instantly reflect the new content (also kinda tied to 1)

But there is a technical limitation here, the way FileAttachments currently update is that the FileAttachment cell gets redefined, meaning all cells that reference a FileAttachment will reload, even if the FileAttachment it references didn't update. And the FileAttachment cell only updates when either 1) a new FA is defined, 2) a previously defined FA is deleted, or 3) a previously defined FA has a new path. It does NOT change when the underlying FA file has updated, that logic is entirely inside LiveFileAttachment.

I think in an ideal world, Dataflow would only have one file attachment cell, FileAttachment, that works like this:

/*
FileAttachments:
  a: ./a.txt
  b: ./b.txt
*/

aCell = FileAttachment("a").text()

bCell = FileAttachment("b").text()

aCell should only update when:

  • The 'a' file attachment is removed from the configuration comment
  • The 'a' file attachment path gets updated (e.g. from ./a.txt to ./new-a.txt
  • ./a.txt gets updates, ie new file contents.

^ And whenever any of those 3 events happen, ONLY aCell updates, bCell shall remain the same. This doesnt happen right now because the entire FileAttachment builtin refreshes, which would cause both aCell and bCell to refresh.

The only way that I think this could be done is if we create implicit cells for every file attachment and define them as async generators that update whenever the file attachment file updates (similar to the LiveFileAttachment logic right now). Then, whenever that FileAttachment is referenced somewhere, we inject that implicit FA cell as a dependency to the new cell that references that FA, so that only that cell updates when the FA file contents update. Kinda a long winded and complicated solution, but it would be very smooth!

entire FileAttachment builtin refreshes

Oh, i see.

create implicit cells for every file attachment

This actually sounds easy, if hacky %) since the file names are constants anyway, we can just replace FileAttachment("a") with FileAttachment_a_ before parsing, make that a cell and that's it :)