postrank-labs / goliath

Goliath is a non-blocking Ruby web server framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to stream content through a Goliath app?

janko opened this issue · comments

I'm writing a Goliath wrapper around tus-ruby-server, and I'm not able to solve the last piece of the puzzle, which is streaming data through a Goliath app. Let's assume I have the following app:

# app.rb
require "goliath"

class App < Goliath::API
  def response(env)
    body = Enumerator.new { |y| 3.times { sleep 1; y << "chunk" } }
    [200, {}, body]
  end
end
ruby app.rb

I would expect a request to the following app to stream the response body, sending "chunk" to the client after each 3 seconds. However, that's not the behaviour I get, what happens instead is that the response is returned to the client only after 3 seconds. Until then the client doesn't even receive response headers. I tested this using HTTP.rb:

require "http"

response = HTTP.get("http://localhost:9000") # lasts for 3 seconds
p response.headers
response.body.each { |chunk| p chunk }

Contrary to that, when I execute examples/stream.rb (which uses EventMachine timers for streaming), it works correctly, I get response headers immediately, and data is printed ever second.

I'm not understanding what is the core difference between that example and my example. If I understand the codebase correctly, this is supposed to be called in my example. So we can see that it's iterating over the request body and sending data to the connection, why is the client then receiving data only after everything is written, and not during writing?

    def post_process(results)
      begin
        status, headers, body = results
        return if status && status == Goliath::Connection::AsyncResponse.first

        callback do
          begin
            @response.status, @response.headers, @response.body = status, headers, body
            @response.each { |chunk| @conn.send_data(chunk) }  # <=========

            # ...
    end

Ok, managed to succeed with EM.defer!

class App < Goliath::API
  def response(env)
    EM.defer do
      3.times do
        sleep 1
        env.stream_send("chunk")
      end
    end

    [200, {}, Goliath::Response::STREAMING]
  end
end

Is this the right way? If yes, what do you think about making this the default behaviour? I'm not yet sure what is the difference between EM.defer and EM::Deferrable#callback, I don't know if there is any downside in using EM.defer in all cases.

For other readers, I found this wiki and the RDoc really nicely explaining EM.defer. This seems to be exactly my use case, I have a file being retrieved from S3 (which is slow), and I want to stream it into the response body.

Since other web servers (Puma, Unicorn) by default stream response body to the client as the body is iterated over, I think it would be a good default for Goliath too.

Just another modification: in order to guarantee order of execution, we should register a callback for status and headers before calling EM.defer:

class App < Goliath::API
  def response(env)
    env[STREAM_START].call(200, {})

    EM.defer do
      3.times do
        sleep 1
        env.stream_send("chunk")
      end
    end

    nil
  end
end

The env[STREAM_START].call(status, headers) is normally called by Goliath if we return Goliath::Response::STREAMING body, but since I didn't see any place to call EM.defer {} other than in Goliath::API#response, we have to call it manually, and return nil so that Goliath doesn't call it as well.

@janko-m I assume you've read through our streaming wiki page as well? :)

See: https://github.com/postrank-labs/goliath/wiki/Streaming

@igrigorik Yes, I've seen the wiki (it shows the examples/stream.rb I've mentioned). However, I couldn't make use of this construct, as my actual use case isn't a firehose to the client, it's returning a large response body in a streaming fashion.

I have an object that responds to #each (an Enumerator), which downloads chunks from an AWS S3 object. So I don't need to write to the response body every X seconds (in which case I could set an EventMachine timer), I instead need to write chunks to the response body as soon as they're downloaded from S3.

I found this bit in the EventMachine::Connection#send_data documentation:

Data is buffered to be sent at the end of this event loop tick (cycle).

Then it makes sense why this code doesn't stream response body to the client:

    def post_process(results)
      begin
        status, headers, body = results
        return if status && status == Goliath::Connection::AsyncResponse.first

        callback do
          begin
            @response.status, @response.headers, @response.body = status, headers, body
            @response.each { |chunk| @conn.send_data(chunk) }  # <=========

            # ...
    end

At the end I needed to avoid #post_process after all, not because of this limitation, but because calling body.each for my use case was downloading from S3, which is something I shouldn't have inside the event loop, I actually wanted to use EM.defer and send data from the thread, not for some workaround but to not impact the request throughput.

Anyway, just wanted to post my findings, feel free to close this.

👍 glad you got it working.