tfwiki / varnish

~~What happens when we have 4 mediawiki pods, and the mediawiki service is our backend host.~~

~~Do they all have PURGE access?~~

MediaWiki currently cannot purge Varnish, potentially cause by two obstacles:

Our VCL only permits a round-robin DNS record to Purge

varnish/mediawiki.vcl

Lines 9 to 11 in 711b69b

acl purge {

"%BACKEND_HOST%";

}
MediaWiki sends a PURGE request to a round-robin DNS record

These are only issues if both Varnish and MediaWiki do not automatically resolve round-robin DNS records into a list of acceptable backends. I don't currently know if either service does this – this needs looking into.

If these are valid issues, the latter can be worked around by:

Only hosting a single Varnish instance. This is undesirable as this is would eliminate redundancy and become a bottleneck.
Fix MediaWiki to multi-cast PURGE to round-robin DNS. This is worth looking into (maybe MediaWiki already does this, and my concern is moot)
Introduce a new service to relay & multi-cast all requests coming from MediaWiki to Varnish. https://github.com/newsdev/varnish-invalidator looks like it'll be perfect for this.

The former may be solvable by limiting PURGE to all internal IPs, but only if we have complete confidence that Google's HTTP load balancers won't forward on PURGE requests (LB will have an internal IP, but will pass through external requests). This needs looking into.

Possibly useful reading, from a quick Google:

https://blog.steve.fi/discovering_back_end_servers_automatically_.html
https://hub.docker.com/r/basi/varnish-consul-template/
https://info.varnish-software.com/blog/varnish-backends-in-the-cloud

Can we work around this by:

Allowing PURGE from all but load balancer? (or allow globally, if the load balancer will not pass through PURGE requests)
Have a single Varnish instance (check if Mediawiki will multi-cast PURGE requests, if not we'll roll with a single instance)

This looks perfect for multi-casting a PURGE request to multiple Varnish back-ends: https://github.com/newsdev/varnish-invalidator

If we run one of these in our stack, we only need to configure Varnish to allow purges from that single instance.

Gah, not quite so simple. The load balancing is managed by the Service, so the DNS record resolves to a single IP: The service's clusterIP.

Will have to look into the options for allowing a pod to cast to all pods behind a service.

kubernetes/kubernetes#18755

Look into:

Sharing storage backend (so PURGE should impact all varnishes, even if only one of them handles the request): https://varnish-cache.org/docs/trunk/users-guide/storage-backends.html
StatefulSet with Headless service (dns should resolve to all Pod endpoints): https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/

Much thanks for talking through beef @underyx

The former would require a shared ReadWriteMany (i.e. NFS) volume shared across all Varnish instances, which partly eliminates the benefits of scaling horizontally (it'll be a bottleneck and single point of failure).

The latter works. I attempted to implement that previously, but I completely cocked it up. I've been bashing my head against a wall struggling to bash the right keys to make it work, because I'm as dense as a bag of bricks (except without the air).

I believe I've cracked it now. 🤞

Not quite solved yet.

MediaWiki now has an array of all Varnish servers. It still only sends PURGE to one of those instances, however.

If we can't control this via MediaWiki config, looks like we'll be adding https://github.com/newsdev/varnish-invalidator to our stack.

Okay. PURGE requests not sending to all Varnish instances was solved via tfwiki/mediawiki@66ca813

What I'm seeing now is Varnish receiving the PURGE request, but not actually doing anything with it. This is a new issue to track separately: #2

	acl purge {
	"%BACKEND_HOST%";
	}

Purging from multiple (redundant) back-ends