memorysafety / river

This repository is the home of the River reverse proxy application, based on the pingora library from Cloudflare.

Home Page:https://www.memorysafety.org/initiative/reverse-proxy/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

First Request Path Demo

jamesmunns opened this issue · comments

This is a brainstorming issue for tracking which demo(s) to prioritize for the Kickstart milestone.

  • pingora-proxy/examples/gateway has an interesting authorization check (for request_filter), as well as a bit of path->upstream routing (in upstream_peer)
  • pingora-proxy/examples/load_balancer uses a load balancing crate in the workspace (and this example goes further)
  • pingora-proxy/examples/modify_response has some basics, including adding and removing headers
  • Demonstrate warm reload, maybe with file watching?
  • Implement compression (from an uncompressed server)
  • A/B testing (random upstream or add random header)
  • Using domain to route to different servers (e.g. shared hosting)
  • Basic WAF elements, such as path-based null-routing

One thing to think about is how we shape this in our config. Right now, our config for basic proxy looks like this (psuedo rust):

BasicProxy {
    name: String,
    listeners: Vec<Listener>, // upstream(s)
    connector: Connector,     // downstream
}

One question is: Are certain rules, like matching/filtering/routing, specified as PART of the basic proxy config? e.g. as a field in the basic proxy struct, or do we specify rules externally, using something like the "name" field to match which proxy to use?

For example, NGINX has some level of scoping, but separates named upstreams from servers.

For example:

http {
    upstream backend {
        server backend1.example.com;
        server backend2.example.com;
        server 192.0.0.1 backup;
    }

    server {
        location / {
            proxy_pass http://backend;
        }
    }
}

(example from https://docs.nginx.com/nginx/admin-guide/load-balancer/http-load-balancer/)

This specifies:

  • An HTTP scope
  • A list of three upstreams, one serving as "backup", all in the group backend
  • A server, proxying / to the backend group

In contrast, Caddyfile syntax seems to be much looser, I need to look at it more to contrast 1:1 with nginx. However, using their JSON syntax:

{
	"apps": {
		"http": {
			"servers": {
				"hello": {
					"listen": [":443"],
					"routes": [
						{
							"match": [{
								"host": ["example.com"]
							}],
							"handle": [{
								"handler": "static_response",
								"body": "Hello, privacy!"
							}]
						}
					]
				}
			}
		}
	}
}

Shows the nesting of:

  • apps
  • http (apps)
  • servers (http)
  • hello (server)
  • listen/routes (hello)
  • match (hello: route)
  • static "hello, privacy" (hello: handler)

Zooming in to just apps::http looks like https://caddyserver.com/docs/json/apps/http/

Concerning the configuration structure:

The pseudo-rust you provide looks like there would be a mapping of n listeners to 1 Connector.

From what I understand, a connector is an abstraction representing a pool of connections to a single backend server which in reality can consist of redundant but identical servers.

In your nginx example the choice of different backends is based on the path, not the listener. The same is true for the Caddy example, there is no direct mapping from Listener to the backend. This is also how Apache httpd handles it.

Then, there is also the notion of SNI and virtual hosts where the SNI information and host headers, as well as the port number, might be taken into account in the choice of backend.

In Apache httpd it is further possible to choose the backend based on other criteria than the host header or request location. For example, using mod_rewrite, it is possible to choose the backend based on complex expressions, leveraging regular expressions and environment variables. This is probably quite an advanced use-case and not relevant for an initial version but might be worth keeping in mind.

As a side note: The ModSecurity rule engine mentioned in #8 can be used to formulate complex behaviour, including manipulation of environment variables, which can then be used by mod_rewrite to make decisions about which backend to use. But this is not used by the OWASP Core Rule Set.

In Apache httpd this would give something along the lines of:

|- server
||- listeners
||- backends (<Proxy> stanzas)
||- local directories
||- SNI-based decisions
||- Host-based decisions
||- Port-based decisions
||- Path-based decisions
||- other decision criteria
||- VirtualHost
|||- backends (<Proxy> stanzas)
|||- local directories
|||- Path-based decisions
|||- other decision criteria
||- VirtualHost
|||- backends (<Proxy> stanzas)
|||- local directories
|||- Path-based decisions
|||- other decision criteria
|| ...

The naming the configuration BasicProxy suggests that there would be implementations of more complex configuration items for more complex proxies in the future. On one hand I see the value of having a very simple config for the most common use cases. However, I wonder if this does not introduce hurdles to adding complexity in the future. I could imagine that migrating the configuration from BasicProxy to a ComplexProxy config structure could be prone to errors and misconfigurations if the more complex features are ever needed down the road.

Hey @studersi, thanks for the info! A couple points:

WRT BasicProxy: it's definitely just a first take at this, I expect multiple breaking changes before we reach any sort of "user friendly stability". At some point "avoiding breaking changes" will be prioritized, but that is not today.

Additionally, yes, at the moment only a single upstream/server/connector is allowed in BasicProxy, largely as I haven't designed any way to specify pooling, load balancing, etc.

In pingora, the "atomic unit" are Services, which are entities that perform the proxying action, and exclusively own their Listeners.

Thanks for the clarification. I'll be looking forward to how it turns out :-)