fluxcd / stats

Flux project usage statistics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Track website visits data

dholbach opened this issue · comments

We have enabled Netlify Analytics, which gives you some very basic information about how many folks visited the page, but it cuts out at 30 days, so it'd be nice to save this data somewhere.

https://github.com/marketplace/actions/netlify-analytics-collector looks like a good fit - thanks @mewzherder for digging this out.

We can capture from the source and re-emit to CSV, then decide what else to do when we get there. (But at least we won't be losing rich data or fine detail while we figure it out)

I have some experience with "data warehousing" from past lives. My idea for now is just to go in, grab the data, store it in a git repository and append when the process runs again every week or fortnight. A weekly cadence is ideal because that will give us more time to notice something went wrong and fix it if the process stops working.

We should be able to do all of this in a GitHub Action, perhaps attach the content to the fluxcd/community repo that already exists? Or a new repo just for stats.

(Ah: I see the linked article already outputs to Google Spreadsheets, that might be more useful than CSV in git. Maybe better to have both, we can build on what's already done here.)

  • add NETLIFY_SITE to website secrets
  • add NETLIFY_TOKEN too (trying to set it up with a bot account)
  • add spreadsheet to Flux drive
  • figure out access ^ that is not tied to personal accounts

Re: storing in a branch might be easier to do just in f/website?

If we want to use a branch in f/website that will work, it's easy enough to make sure that branch doesn't trigger the workflow, but... it's also very easy to assign GitHub Actions in one repo, write access to another repo. I would suggest creating f/stats and use that as the target for the "data warehousing."

This way we could limit the risk of compromising content in f/website (which could be very bad, imagine someone figures out how to use the write access to overwrite the flux install link with some malicious script, we would notice quickly but it would be much better to not have that ever happen.)

I imagine we will want to keep them outside of the website repo as we have more stats to collect, like for example project stars from github, and everything that is currently being tracked manually, we can certainly roll up that type of aggregation toil into what we're doing here so nobody has to curate anything manually.

Is it difficult to get a new repo provisioned under fluxcd/stats and get admin access? I know enough about how to set up the permissions so a job in f/website could write to stats repo, there is "Action Permissions" in the repo settings.

Workflow permissions
Choose the default permissions granted to the GITHUB_TOKEN when running workflows in this repository. You can specify more granular permissions in the workflow using YAML.

Allow select actions
Only actions that match specified criteria, plus actions defined in a repository within kingdonb, can be used. Learn more about allowing specific actions to run.

Those are the upstream docs related to workflow permissions. The easiest thing to do (safely) is to create a new repo, and leave the default configuration, since it will have Read and Write access to the same repo by default, we can run all stats from there and aggregate them there also. (But if you've already added the secrets to f/website it's just as easy to grant a job here, permission to write there. That's what I'm trying to get across...)

But setting up permissions so that jobs can only write to a specific branch, is much harder. I'm not sure if that's a thing.

Ok, I have no input on this. Maybe somebody else from the @fluxcd/maintainers can comment?

@kingdonb Will you take care of this, now that fluxcd/stats has been created?

I'll move f/community@github-repo-stats.

Yes, it's on my radar to investigate this today. I will test any questionable parts in a separate repo first to minimize the noise 👍

I have disabled the parts that connect to Google Drive, I have my own machinery which is in Google Drive and I'm looking at optimal ways to integrate it. (If we have a machine account or an app, that will make it smoother, but I've been using a "dev" Google App for my Flux Bug Scrub sheet generation for some time, and it has test scaffolds built around it to some extent...)