darkweak / souin

An HTTP cache system, RFC compliant, compatible with @tyktechnologies, @traefik, @caddyserver, @go-chi, @bnkamalesh, @beego, @devfeel, @labstack, @gofiber, @go-goyave, @go-kratos, @gin-gonic, @roadrunner-server, @zalando, @zeromicro, @nginx and @apache

Home Page:https://docs.souin.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cache corruption

ssendev opened this issue · comments

I have a caddy compiled with

Dockerfile

FROM docker.io/library/caddy:2-builder AS caddy-builder
RUN xcaddy build \
    --with github.com/darkweak/souin/plugins/caddy

FROM docker.io/library/caddy:2 AS caddy
COPY --from=caddy-builder /usr/bin/caddy /usr/bin/caddy
COPY Caddyfile /etc/caddy/Caddyfile

Caddyfile

{
        order cache before rewrite
        cache {
                badger {
                        configuration {
                                ValueDir /data/cache
                        }
                }
                default_cache_control no-store
                stale 10m
                ttl 1000h
                key {
                        hide
                }
        }
}

example.com {
        cache {
                mode bypass_request
        }
        reverse_proxy web:8080
}

I'm mainly using it to cache dynamically resized images and after some time the images start to load only the top or nothing with a zero byte content length but regular pages are affected too.

It's possible to corrupt the cache by having firefox developer tools open with disable cache and then repeatedly pressing F5 it works best when varying the reload frequency between long press and spamming it as fast as possible and then gradually slowing down to once per second. Only seems to work when loading a html page that requests more resources. I could not corrupt the cache when reloading only a single image.

Or maybe the config / chosen cache backend is not optimal for the use case of caching MB responses for a long time.

Regarding the config i'm not exactly sure if the cache directory is specified correctly since there seems to be stuff written to /srv/souin_dir or maybe that's just temporary stuff and the cache from /data/cache will still be used when recreating the container.

the README contains

cache @match {
    ttl 5s
    badger {
        path /tmp/badger/first-match
        configuration {
            # Required value
            ValueDir <string>

but the badger config specification says only path or Configuration should be specified and path is a config file and not a directory

Hello @ssendev did you try nuts instead of badger? We discussed a lot with @adammakowskidev about badger issues with large responses. Maybe we should remove badger in favor of the otter lib that is faster than badger (not a requirement) but could be more stable.

He got this image corruption on his side
image

Hi.
Otter looks very interesting. I would be happy to do the tests.
I did dozens of hours of testing. Only redis cache works properly, but the performance is very poor. Souin requires a lot of work and fixes in my opinion.

Yes that's what my images looked like too. While images were the problem i spotted first i also had truncated js files. With the F5 spamming it would also currupt html results of a few KB. Interestingly it could even happen that spamming a html page that is not cacheable would suddenly produce a zero byte hit and then stay cached.

this also happens with nuts.

in my testing just now a html page that would normally respond with

HTTP/2 200 
alt-svc: h3=":443"; ma=2592000
cache-control: no-store
cache-status: Souin; fwd=uri-miss; detail=NO-STORE-DIRECTIVE; key=
content-encoding: gzip
content-type: text/html; charset=utf-8
date: Fri, 29 Dec 2023 09:32:15 GMT
etag: W/"3e56-LMcWXMp5FQA2JBaxtCm/SIIDhTQ"
server: Caddy
vary: Accept-Encoding
X-Firefox-Spdy: h2

was after pressing F5 three times responding with

HTTP/2 200 
age: 1
alt-svc: h3=":443"; ma=2592000
cache-status: Souin; hit; ttl=3599999; key=
date: Fri, 29 Dec 2023 09:32:53 GMT
server: Caddy
content-length: 0
X-Firefox-Spdy: h2

@ssendev this point is already discussed here #337 (comment). Because of a bug in the go library (golang/go#52183) it doesn't return any errors in the request context and if there are no error we consider the response as valid. As mentioned in this comment #337 (comment) maybe we should not store empty responses. WDYT?

@ssendev maybe this commit can fix the empty response storage b36e5f3

can you try with --with github.com/darkweak/souin/plugins/caddy@033229a6a8842b6ebf860e98c99b57f7a37b595d

Thats the only version where caching will work for me.

Cache pollution happend a lot on either badger & nutsdb on older versions. I havent checked on newer versions because I couldnt find the cause or replicate it so im stuck on an old tag for now. Somewhat related: caddyserver/cache-handler#27

@mattvb91 we have to work together to check if everything works now with your setup.

@darkweak b36e5f3 doesn't fix the empty response issue and also happens on port 80 with HTTP/1.1.

@mattvb91 at first i thought it's fixed but then i noticed that it's probably a release before mode bypass_request existed because it would never cache a response since i had disable_cache checked to avoid Firefox not sending a request. when i then unchecked the box nothing worked anymore and i only got connection reset even when starting a new container with cleared cache directory. Maybe it's because that commit isn't meant to be run with Caddy v2.7.6?

But i noticed something interesting going on withe the corrupted images and that is that the corruption is somewhat deterministic and depends on which page is being loaded. So the same cached image could return two distinct glitches depending on which other requests are occurring simultaneously.

I then tried saving the image since i wanted to look at the bytes of the image which was a little annoying since Firefox wouldn't just give me the image but would download it again which meant only a single request was in flight which meant the saved image was not dependent on the page i was on so would be the same. But after some tries i managed to save two different glitches and then ran xxd on them and compared the diff (which in hindsight wasn't necessary since i could have compared it to the original image) nevertheless what i discovered was that the glitched image now contained javascript that was also served by caddy.

So the error is probably not in the caching library and instead is a buffer being overwritten when serving the response.

@ssendev how did you build the caddy module with the commit b36e5f3e79c7b19e07eb1d5b2020b261515ae7a3?

I will implement the otter storage and we'll see if the corruption issue happens.

FROM docker.io/library/caddy:2.7.6-builder AS caddy-builder
RUN xcaddy build \
    --with github.com/darkweak/souin/plugins/caddy@b36e5f3e79c7b19e07eb1d5b2020b261515ae7a3 \
    --with github.com/caddy-dns/acmedns

FROM docker.io/library/caddy:2.7.6 AS caddy
COPY --from=caddy-builder /usr/bin/caddy /usr/bin/caddy
COPY Caddyfile /etc/caddy/Caddyfile
{
	order cache before rewrite
	cache {
		default_cache_control no-store
		ttl 1000h
		badger {
			configuration {
				ValueDir /data/cache/badger
			}
		}
	}
}

:80 {
	cache {
		mode bypass_request
	}
	reverse_proxy http://192.168.0.2:9100
}

@ssendev use xcaddy build --with github.com/darkweak/souin/plugins/caddy@b36e5f3e79c7b19e07eb1d5b2020b261515ae7a3 --with github.com/darkweak/souin@b36e5f3e79c7b19e07eb1d5b2020b261515ae7a3 without the --with github.com/darkweak/souin@b36e5f3e79c7b19e07eb1d5b2020b261515ae7a3 xcaddy will compute the github.com/darkweak/souin/plugins/caddy module with the dependency in the mod file and the referred github.com/darkweak/souin dep version is the latest release (v1.6.44) so we have to override the chore version using --with github.com/darkweak/souin@{HASH_COMMIT} too.

@darkweak This seems to fix the empty response issue.

@darkweak I tried the latest master but the responses are still overwriting each other.

# caddy build-info | grep souin
dep	github.com/darkweak/souin	v1.6.45-0.20240102214624-7fb48f52de3d	h1:tdyD2U3iDn6eKrUabwwx7ZHDHhqDiZtEkkeCVc6uVvU=
dep	github.com/darkweak/souin/plugins/caddy	v0.0.0-20240102214624-7fb48f52de3d	h1:00kruvspxDUmD/W3urJ4w3UZpTsPbz2BaxkSS2l0k3o=

Only redis cache works properly

I seem to have similar issue even with Redis now. It seems the cause is ESI, which causes Souin to mess up response length and effectively clients abort the request. I used RedisInsight to confirm that what's stored in Redis is 100% okay.

Also, surrogate keys don't seem to work with Redis.........