feathersjs-ecosystem / feathers-blob

Feathers service for blob storage, like S3.

Home Page:http://feathersjs.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why datauris?

thebarndog opened this issue · comments

I really like feathers-blob because it enables me to not have to use an intermediary library (like the Amazon S3 iOS library) on mobile; everything can go straight to the server.

However, some questions/issues have popped up along the way:

  1. I like the service but how do you upload multipart requests to the blob service? So far I’ve only been able to upload images or videos that have been base 64 encoded in data uris
  2. When you get a blob object, the entire request waits until the blob is read from the stream, which in my case with videos is a long long time, so how can I download just the blob metadata and then only download the data uri
  3. Why data uris? In my particular case I’m writing an ios app and in order to store videos or images, I have to get the extension of the file (which is pretty hard if you just have a data object) and then base 64 encode it (which is easy). Why not just store the base 64 representation of a file?
commented

hey @startupthekid,

to be honest, the reason i implemented this using Data URIs is because it was the easiest way i could get it working given my understanding of Feathers, but it's certainly not ideal and you're touching on all the reasons it's not sufficient when large files are involved. 😄

personally i'd much rather this be implemented with streams (abstract-blob-store is based on streams), but i'm not sure how to do this with how Feathers is currently implemented (http's req and res streams are hidden). for an example of how this looks using only http, here's a similar module i half-wrote late one night.

there might be a way to do this given the current non-streaming interface provided by Feathers, maybe a create request can be a new part to an existing blob being created. hmm. if you want to give this a go i'll happily accept any pull requests.

@daffl @ekryski am i missing anything on how to stream requests / responses in Feathers? if not, i wonder how we could re-surface a streaming interface here in feathers. 🐳 for inspiration, i look to similar systems like multilevel, multiplex-rpc, and muxrpc).

as a strawman, i'm thinking of something along the lines of:

const service = {
  create: function (body, params, cb) {
    body.pipe(fs.createWriteStream('./blob'))
      .on('error', function (err) { cb(err) }
      .on('end', function () { cb(null, { id: 'blob' }) })
  },
  get: function (id, params) {
    return fs.createReadStream('./blob')
  }
}

// ...

videoStream.pipe(client.create(function (err, meta) {
  console.log('meta', meta)
})

@ahdinosaur I agree, it's a tricky problem. I like your idea of wrapping the client in a pipe and an extension of that is maybe to have a blob service and a stream wrapper that passes in the request and response object, maybe in the params, and then returns the modified service. That way the user would still just have to do app.use("/blobs", blobService) but internally it'd be wrapped by our streaming function so it can support streaming without a visible change.

After thinking a lot about it and reading a ton of source code, I found this: feathersjs-ecosystem/feathers-hooks#40 which seems to be more along the lines of what we're looking for. We need a provider independent way of adding the request object to the params, which would probably be a plugin.

@ahdinosaur For now I'm sort of shimming streaming by using app middleware that attaches the request into the req.feathers object so that it gets passed into the hooks and subsequently the params for the service methods. That can be a temporary solution of sorts.

@startupthekid that's what I was going to recommend for supporting streams right now. It feels like a bit of a hack but until we test out "streaming" data over sockets or using a different transport it's probably the best way.

I have a preliminary implementation here: https://gist.github.com/startupthekid/4f253baecfdc1f42eb30acd3bb1ac9c7
Everything's the same except the get and create methods (and I removed the init function). I haven't written tests yet but if you guys could give this a look at identify any potential problems, that'd be much appreciated. I'm not an expert with node streams yet so I'm sure there's things that could be improved upon.

Ideally the write stream should be piped into right from the request but I wasn't sure how to do that without losing content type and extension information.

@ekryski @ahdinosaur I updated the gist with my finished implementation and tests for the blob service. For now I'm not streaming from the request object because I don't want to lose the extension information but I may change that later.

I should mention that the create function takes in a data uri but only streams the binary data in the get function so the consumer doesn't have to deal with uri parsing. After the data is streamed, I just resolve the blob id.

commented

@startupthekid in a http post request, you can send the extension information of the body as the 'Content-Type' header.

Right right, silly me. I'll update the create method to just stream directly from the request.

commented

hmm, still thinking about the best way to support large files without boiling the ocean (being compatible with feathers@2). leaning towards my gut feeling where the client calls a series of create methods with each chunk of the blob and some common identifier per blob. i reckon it'll also be good to bundle a client-side implementation that makes this easy in ./client.

some notes:

commented

ah, what if we modelled this like a es6 iterator? we call create to either send a full data uri blob (same as now) or get an identifier to which we can update in more values until done is true.

for example:

blobs.create({
  contentType: "image/jpeg"
})
-> {
  id: "123.jpeg",
  contentType: "image/jpeg",
  size: 0,
  done: false
}

blobs.update("123.jpeg", {
  value: "base64.....",
  done: false
})
-> { id: "123.jpeg", size: 32, done: false }


blobs.update("123.jpeg", {
  value: "base64.....",
  done: true
})
-> { id: "123.jpeg", size: 84, done: true }

as a side note: with a streaming approach, it becomes more expensive to do the id as the hash without a proper rename operation (see content-addressable-blob-store for how that module does it using fs.rename), so hmm.

Oh I like that, sort of an ad-hock mutli part upload thing going on. As far as renames go, currently I just use shortid and that's worked out pretty well so far.

@ahdinosaur has there been any progress on doing something like that? I'd be glad to help out if needed

commented

@startupthekid nah, i haven't needed this yet for any paid project and personal mad science is in other experiments, so you're welcome to jump in on this. do you feel comfortable implementing the api described in #2 (comment)?

FWIW @DenJohX put together a pretty nice guide for supporting file upload in a more Feathers way. http://docs.feathersjs.com/guides/file-uploading.html. It supports large files and I think accomplishes what you are after @startupthekid. Correct me if I'm wrong there.

I think we can probably close this for now. If we need to we can open a new issue or re-open to discuss supporting a better streaming interface or uploading over sockets.