helm / chartmuseum

helm chart repository server

Home Page:https://chartmuseum.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

index.yaml regenerating every request even without any package updated

IMBlues opened this issue · comments

The index.yaml will regenerate every request to /index.yaml even though there is no any package changed.

When we compare the LastModified between objects from cache and storage:
https://github.com/helm/chartmuseum/blob/master/pkg/storage/storage.go#L64

The objects from cache is out of order due to map ranging....
https://github.com/helm/chartmuseum/blob/master/pkg/chartmuseum/server/multitenant/index.go#L107

func (server *MultiTenantServer) getRepoObjectSlice(entry *cacheEntry) []cm_storage.Object {
	var objects []cm_storage.Object
	for _, entry := range entry.RepoIndex.Entries {
		for _, chartVersion := range entry {
			object := cm_repo.StorageObjectFromChartVersion(chartVersion)
			objects = append(objects, object)
		}
	}
	return objects
}

And the LastModifed timestamp of object from storage seems has higher precision (I only test it on s3 backend)

LastModifed from cache: 2018-06-11T03:31:52.000Z
LastModifed from storage: 2018-06-11T03:31:52.094Z

Those above would cause the regenerating of index.yaml every request.

@IMBlues thanks for detailed issue and PR! Few things:

  1. How are you determining that "index.yaml will regenerate every request"? I have not experienced this, and there should be tests to confirm. It is the intent that the check for regeneration happens every time, but that regeneration only happens when necessary. It should only happen one time between any number of objects uploaded and /index.yaml being requested.

  2. If you run chartmuseum with --debug there should be something in the logs such as "Change detected between cache and storage" to indicate this event. Are you seeing this occur more than once after uploading a package?

  3. Why does the order returned by getRepoObjectSlice() matter? Is this an optimization for calculating the diff?

1 & 2. I can confirm this behavior because "Change detected between cache and storage" showed every request to /index.yaml with --debug
3. I reviewed the code and found it's none of business about order of objects, but LastModified of the object really matters:

The timestamp from storage(S3 backend, ceph rgw provides) has higher precision than from cache, so this comparison will always return true:

if !o1.LastModified.Equal(o2.LastModified) {
    diff.Updated = append(diff.Updated, o2)
}

print them:

o1.LastModifed: 2018-06-11T03:31:52.000Z (from cache)
o2.LastModifed: 2018-06-11T03:31:52.094Z (from storage)

I have not tested it on Amazon S3, so I'm not sure from whether it's related to ceph rgw.

@IMBlues sorry we left this unfinished. Were you able to determine if this is related to Ceph?

No, I didn't test it on other s3 backend and this pr does solve my problem. I think we should close it now, I will reopen it if I have more time to determine .

@IMBlues if the fix works for you, then we should add a config option that lets us get to the "ignore milliseconds" functionality. Maybe something like a --timestamp-comparison=ms (default)?

You then use something like ----timestamp-comparison=sec to fix your issue? Let me know what you think

@jdolitsky I think adding flag is better than what I did(directly ignore milliseconds in code), maybe I'll try it later this week and make another pr

Hello,

We also see this problem when using Ceph as a backend for chartmuseum.

We also suffered this issue with Dell ECS S3 storage backend. The previously provided PR by @IMBlues apparently has the Substraction operation reversed, as the o1 (in the cache) can never be newer than o2 (in the s3), so a new file will never be detected. I am providing a new PR with same fix but reversing the substraction order.

Same issue with OTC S3 backend. I would love to see a fix in the upcoming release. Otherwise I have to build chartmuseum on my own, just because of ~3 lines of code.

@IMBlues @mateuszkwiatkowski @airadier @jopfeffe Would one of you be able to test out master branch with new option --storage-amazon-nearestsecond / STORAGE_AMAZON_NEARESTSECOND=1? This should fix the issue.

You can use Docker image chartmuseum/chartmuseum:latest@sha256:75dd03cc7111cd5bc6fac87429a09c9f91e30d0acaeaf2cb2d7ac3568344d83d

Once I can get confirmation that this is finally fixed (as well as #280), I'll go ahead and release chartmuseum v1.12.0. Thanks in advance!

We are experiencing the same issue with our Openstack Swift based storage. The index is always rebuilt and takes around 30 seconds to complete every helm dep update. @jdolitsky is it possible to make this argument also effect swift backends too?

EDIT:
As suggested earlier, I propose that a new CLI flag --timestamp-comparison=<sec|msec> is implemented in a more generic sense to account for backends that report timestamps with reduced precision/accuracy.

@rajiteh - please see https://github.com/chartmuseum/storage/blob/master/amazon.go#L109

We will need to add a NearestSecond option to the openstack backend similarly.

and yes, youre probably right about this (--timestamp-comparison). Please feel free to propose this change in chartmuseum/storage

It turns out the issue I had with openstack is fixed by this change: chartmuseum/storage@a49401b

So my issue in particular should be resolved in 1.12.0 release.

However, I already went ahead and implemented the timestamp argument.

@rajiteh thank you very much, looking at your change

You can use Docker image chartmuseum/chartmuseum:latest@sha256:75dd03cc7111cd5bc6fac87429a09c9f91e30d0acaeaf2cb2d7ac3568344d83d

Works in my case.

Thank you very much for the effort. Looking forward for the 1.12.0 release.

@jopfeffe thanks for trying this out! glad to hear

Hi all, new version of chartmuseum has been released (v0.12.0) with a solution for this:

--storage-timestamp-tolerance=1s

Big thanks to @rajiteh for putting this together! Thanks all