Creating `newAssets` based on 1 input source file.

Question

Creating `newAssets` based on 1 input source file.

ericdmoore opened this issue 4 years ago · comments

Subject of the feature

Create a standard (or at least accepted community) mechanism for plugins to attach new assets based on values in a source asset.

Problem

If you parse an MD/HTML doc and want to generate new thumbnailed images for the referenced images, you need to emit new assets to the destination system (usually the 'fs')

If you want to generate an SVG file when you parse an MD doc with a mermaid code fence block, there should be a standard way to attach the new SVG from the parse code block.

Expected behavior

In these categories of problems I would expect that plugin authors could rally behind a behavior of attaching new vfiles to the source vfile.data.newAssets or perhaps vfile.newAssets which would be an array of more vfiles

Then a new plugin say vfile-newAsset-generate would pluck off the new vfile array. and recurse into the tree and emit each of them to the destination location.

In my mind the user would configure an object with function closures that grant access to the fs, s3,RDBMS, etc.

Where perhaps the input options look like:

interface INewAssetOptions {
   read: (path : string)=>Promise<boolean>
   write: (vf : vfile)=>Promise<boolean>
}

Alternatives

I have not found obvious alternatives - which is my usual process - that is to say - they may be out there but they are non-obvious (at least to me)

Eric D Moore · Answer 1 · Sat Oct 24 2020 10:37:19 GMT+0800 (China Standard Time)

the read function could be used in the write function to not write if exists, and resolve a false - as in failed write to destination

Eric D Moore · Answer 2 · Sat Oct 24 2020 10:37:53 GMT+0800 (China Standard Time)

Lastly, is this a unifed util or vfile utility?

Christian Murphy · Answer 3 · Sat Oct 24 2020 22:09:20 GMT+0800 (China Standard Time)

An interesting idea 🤔

Create a standard (or at least accepted community) mechanism for plugins to attach new assets based on values in a source asset.

vFile already supports adding additional attributes directly to the vFile object.
Though in this case I'd recommend using vfile.data https://github.com/vfile/vfile#vfiledata as you already noted above.

Is there a particular plugin which conflicts with your desire to add a new property?
Are there multiple plugins looking to adopt this pattern, which would allow it to be standardized across them?

Then a new plugin say vfile-newAsset-generate would pluck off the new vfile array.
In my mind the user would configure an object with function closures that grant access to the fs, s3, RDBMS, etc

So vfile-newAsset-generate itself would be:

const vfileAssetGenerate = (vfile, callback) => vfile?.data?.assets.forEach(callback)

?
everything else would come from another plugin/provider?

and recurse into the tree and emit each of them to the destination location.

could you expand on what you mean here?

Where perhaps the input options look like:

interface INewAssetOptions {
  read: (path : string)=>Promise<boolean>
  write: (vf : vfile)=>Promise<boolean>
}

would all assets be sent to the same destination? (e.g. everything to fs or everthing to s3, not a mix)
how would sources that are not path based work? (e.g. S3 which is bucket+key, or RDBMS which would be DB+table+column+primary key)

Titus · Answer 4 · Sat Oct 24 2020 22:46:03 GMT+0800 (China Standard Time)

Some loose thoughts:

For a while now I’ve been interested, but never really worked on, a higher level thing above unified, which could take care of this.
The idea (codenamed uniflow / unicorn), is to take unified a level higher, to where multiple files can be processed. This is very similar to Gulp (which also has fs, ftp, s3, etc adapters), but for ASTs instead of streams/buffers. (I have had some chats w/ Blaine from Gulp about that).

Here’s a dump of a static site generator example I wrote 2 years ago that I believe I got working:

var path = require('path')
var Handlebars = require('handlebars')
var postcss = require('postcss')
var env = require('postcss-preset-env')
var cssnano = require('cssnano')
var stylelint = require('stylelint')
var stylelintConfig = require('stylelint-config-standard')
var browserify = require('browserify')
var concat = require('concat-stream')
var report = require('vfile-reporter')
var vfile = require('to-vfile')
var unified = require('unified')
var remarkParse = require('remark-parse')
var math = require('remark-math')
var remark2rehype = require('remark-rehype')
var rehypeParse = require('rehype-parse')
var katex = require('rehype-katex')
var raw = require('rehype-raw')
var document = require('rehype-document')
var slug = require('rehype-slug')
var autolink = require('rehype-autolink-headings')
var highlight = require('rehype-highlight')
var minify = require('rehype-preset-minify')
var rehypeStringify = require('rehype-stringify')
var unicorn = require('../../packages/unicorn')
var glob = require('../../packages/unicorn-glob')
var layouts = require('../../packages/unicorn-layouts')
var filter = require('../../packages/unicorn-filter')
var batch = require('../../packages/unicorn-batch')
var watch = require('../../packages/unicorn-watch')
var matter = require('../../packages/vfile-frontmatter')
var mkdirp = require('../../packages/vfile-mkdirp')

var all = unicorn()
  .map(vfile.read)
  .use(filter, '.css', unicorn().map(style))
  .use(filter, '.js', unicorn().map(script))
  .use(
    filter,
    '.md',
    unicorn()
      .map(matter, {strip: true})
      .map(
        unified()
          .use(remarkParse)
          .use(math)
          .use(remark2rehype, {allowDangerousHTML: true})
          .use(raw)
          .use(rehypeStringify).process
      )
      .use(layouts, {
        base: 'layouts',
        data: {
          description: 'Hello! Welcome to my website.',
          generator: '🦄',
          url: 'https://unicorn.js.org'
        },
        compile: compile
      })
      .map(rename)
  )
  .use(
    filter,
    '.html',
    unicorn().map(
      unified()
        .use(rehypeParse)
        .use(slug)
        .use(autolink, {
          properties: {className: ['anchor']},
          content: {type: 'text', value: '#'}
        })
        .use(katex)
        .use(highlight)
        .use(document, {
          title: '🦄',
          css: [
            'https://cdn.jsdelivr.net/npm/katex@0.10.0-beta/dist/katex.min.css',
            '/index.css'
          ],
          js: '/index.js'
        })
        .use(minify)
        .use(rehypeStringify).process
    )
  )
  .map(move)
  .map(mkdirp)
  .map(vfile.write)
  .map(x => {
    x.stored = true
  })
  .use(function() {
    return function(set) {
      console.log(report(set.contents))
    }
  })

unicorn()
  .use(glob, 'src/**/*.*')
  .use(batch, {parralel: true, size: 1}, all)
  .use(watch, all)
  .process(__dirname, function(err) {
    if (err) {
      console.error(err)
    }
  })

function compile(layout) {
  return Handlebars.compile(String(layout))
}

function rename(file) {
  if (path.basename(file.dirname) === 'posts') {
    file.dirname += path.sep + file.stem
  }

  file.basename = 'index.html'
}

function style(file) {
  return postcss()
    .use(stylelint({config: stylelintConfig}))
    .use(env())
    .use(cssnano)
    .process(file.contents, {from: file.path})
    .then(ok)

  function ok(res) {
    res.messages.forEach(warn)
    file.contents = res.css
    return file
  }

  function warn(message) {
    var origin = [message.plugin, message.rule].join(':')
    var point = {line: message.line, column: message.column + 1}

    if (message.type === 'warning') {
      file.message(
        message.text.slice(0, message.text.lastIndexOf('(') - 1),
        point,
        origin
      )
    }
  }
}

function script(file, next) {
  var fp = path.resolve(file.cwd, file.path)

  browserify([{id: fp, source: String(file.contents)}], {basedir: file.cwd})
    .plugin('tinyify')
    .bundle()
    .pipe(concat(onconcat))
    .on('error', onerror)

  function onconcat(buf) {
    file.contents = buf
    next()
  }

  function onerror(err) {
    next(err)
  }
}

function move(file) {
  file.dirname = file.dirname.replace(/^src/, 'build')
}

To generate assets, base support would be nice, as cwd is probably too frail.

A different thing, is that I’ve recently done some digging in Word files (.docx), which are essentially zipped XML files. EPUB is similar. The link with this idea is that it’s the opposite: a group of files treated as one.

Eric D Moore · Answer 5 · Mon Oct 26 2020 11:49:01 GMT+0800 (China Standard Time)

@ChristianMurphy, great clarifications!

Also, I feel a bit convicted by the recent HN post on the XY question so I am attempting to step back for a moment.. and discuss more broadly the problem...

To date, I have wanted to make 3 plugins for the unfied collective, and 2 of them needed the pattern of:

Read in a source file, parse, transform, and output.
But when I do the output phase, I needed to output the input file (of course) + some other newly derived files based on the input.

No doubt my use cases and thinking have been largely shaped by a gulpjs mindset. But when using unified it's often just cleaner if unified could handles that last part too.

Problem Space Revisited

Example1 - mermaidjs

mermaidjs - I wanted to be able to parse an MD file that has a mermaid definition in code fence (I think that's what its called)

something like:

<mydocWithAMermaidChart.md>

My Markdown File

with text and an embedded diagram which really only reads correctly as markdown.

graph LR
    A[Square Rect] -- Link text --> B((Circle))
    A --> C(Round Rect)
    B --> D{Rhombus}
    C --> D

So the usage pattern would be:

parse document
find a content section that should be represented as an <img src="___.svg" alt="mermaid diagram" />
generate a new assets into vfiles
and take the input vfile and add new entries into the vfile.data.newAssets

Note: Yes, Im aware that there is a mermaid plugin already... I was going to refactor it to a side-effect free, functionally purity plugin...

Example2 thumbnail images.

I am developing a plugin (that I suppose I could harden up into a FOSS contribution - hence my homework questions here) where I will.

parse an HTML file,
find the picture > img elements, pluck out some data attributes about what sizes and formats the thumb nails should be
generate those using the wonderful sharp image processing package
and add those images back as new entries into the vfile.data.newAssets

Example3 CSS Removal

I am just brain storming here. I have been toying with the idea of using the rehype-css-inline and rehype-remove-unused-css and when a site is about to be deployed I might leave the inlined (depending on the site, clearly) but irrespective of site, I would almost always want it separated into its own "tree-shook css file" (forgive me if the js bundler metaphor is a bad one)

Q & A

Is there a particular plugin which conflicts with your desire to add a new property? - @ChristianMurphy

There is no plugin that conflicts with this proposed new data field. I was more asking as a way of seeing if this is mildly useful, and just curious to see how community contributions were handled. I was hoping to not have to deal with some IANA style RFI regarding what object keys are in use by the community.

Are there multiple plugins looking to adopt this pattern, which would allow it to be standardized across them? - @ChristianMurphy

I can only speak for my self that I am considering adding 3 plugins using this pattern. 2 are data writers. and then singular data reader named vfile-newAsset-generate above.
3.everything else would come from another plugin/provider?` - @ChristianMurphy

I would imagine if this is even mildly useful, it could be documented "as a pattern that works"... where a gross summary of the doc changes would state: "If your plugin wants to need to generate new files content based on the content from an input file, then do this... make some new vfiles for your new derived assets, and push them into vfile.data.newAssets and make sure you add the vfile-newAsset-generate to your plugin pipeline or bundle it into your published preset.

would all assets be sent to the same destination? (e.g. everything to fs or everthing to s3, not a mix)

I was thinking all vfiles found in the newAssets key would go to the same destination, for users wanting to multicast them out.... they would just stack the plugin pipeline, and configure them with different read/write functions to emit files to fs, to s3, to redis, etc.

how would sources that are not path based work? (e.g. S3 which is bucket+key, or RDBMS which would be DB+table+column+primary key)

const credentials = new aws.SharedCredentialsInIFile()
const s3c = new aws.S3({credentials})
const pgClient = new Client()

await pgClient.connect()

unified()
.use(parse)
.use(extractMermaid2Svg)
.use(vfileNewAssetGenerate, {
    // going to use this to emit new mermaid chart SVGs to s3
    // doing this in GitHub by memory.. forgive me if its not runnable
   read: async (s)=> {
      const result = await s3c.getObject({Bucket:'mybucket', Key:s }).promise()
      return doesHaveData(result) // as boolean using a pretend function
   },
   write: async (vf)=> {
      const result = await s3c.putObject({Key: vf.path, Bucket:'mybucket'}).promise()
      return doesHaveData(result) // as boolean using a pretend function
   } 
})
.use(vfileNewAssetGenerate, {
    // going to use this to emit new mermaid chart SVGs to postgres
    // doing this in GitHub by memory.. forgive me if its not runnable
   read: async (s)=> {
      const res = await client.query({name:"doesExist", query:'SELECT * FROM Assets WHERE pathID = $1', values: [s])
      return doesHaveData(result) // as boolean using a pretend function
   },
   write: async (vf)=> {
      const result = await client.query({name:"derivedAsset", query:'INSERT INTO Assets(path, contents) VALUES($1, $2) RETURNING *', values: vf.data.newAssets.map(n=>{path, contents}))
      return doesHaveData(result) // as boolean using a pretend function
   } 
})
.use(stringify)
.process(vfileInput, (err, vfile)=>{
   // do stuff with transformed input
   console.log({vfile})
})

await pgClient.end()

Feedback

I feel honored that I am only 2 years behind @wooorm on having these same ideas ;) I am still looking through your code dump.. Do you have that in a repo any where so I could look at the surrounding system too?
DOCX is an intriguing example where you start with a zipped bundle of a vfile, and then end up with a collection. Do you think you would open the docx file unzip, and then add each child file to the docx parent vfile.data.newAssets ?
@wooorm, regarding naming do you feel this is a vfile or unifed utility?

Eric D Moore · Answer 6 · Mon Oct 26 2020 23:51:32 GMT+0800 (China Standard Time)

I started some repo/readmes to sketch out where I was thinking about going.

see: https://github.com/ericdmoore/rehype-all-the-thumbs

Titus · Answer 7 · Tue Oct 27 2020 02:04:05 GMT+0800 (China Standard Time)

nice! a quick response for what you’re working on now. I did the inverse: for the unified website, sharp generates images, rehype checks for which versions exist (this also allows darkmode images, which are different and optional). rehype: https://github.com/unifiedjs/unifiedjs.github.io/blob/src/generate/plugin/rehype-pictures.js. Sharp: https://github.com/unifiedjs/unifiedjs.github.io/blob/src/generate/asset.js#L127 (a slightly different version I made later for my own website: https://github.com/wooorm/wooorm.github.io/blob/src/generate/rehype-pictures.js)

Christian Murphy · Answer 8 · Tue Feb 16 2021 05:19:20 GMT+0800 (China Standard Time)

Thanks for starting the discussion @ericdmoore!
We're in the process unifying ideas in with discussions unifiedjs/collective#44
If you'd like to continue this thread, or start a new one https://github.com/unifiedjs/unified/discussions will be the home for ideas going forward.
Thanks again!