githubexporter / github-exporter

:octocat: Prometheus exporter for github metrics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Incorrect `github_repo_pull_request_count` when more than one page of pull requests exists

steinbrueckri opened this issue · comments

Description

I've discovered an issue with how github_repo_pull_request_count is calculated in the exporter. It seems that the count only reflects the last page of pull requests when multiple pages are present due to GitHub's API pagination. As a result:

  • For repositories with fewer than 30 open pull requests, github_repo_pull_request_count accurately reports the correct number.
  • For repositories with more than 30 open pull requests, github_repo_pull_request_count incorrectly reports only the count from the last page. For example, if there are 36 open pull requests, it reports 6.

This discrepancy leads to inaccurate metrics for repositories with a high number of pull requests.

Steps to Reproduce

  1. Use the exporter on a repository with fewer than 30 open pull requests and note the correct count.
  2. Use the exporter on a repository with more than 30 open pull requests (e.g., 36).
  3. Observe that github_repo_pull_request_count only reflects the number of pull requests on the last page (e.g., 6 instead of 36).

Expected Behavior

github_repo_pull_request_count should accurately report the total number of open pull requests in a repository, regardless of how many pages of pull requests there are.

Actual Behavior

github_repo_pull_request_count only reports the number of pull requests on the last page due to not accounting for GitHub's API pagination.

Possible Solution / Workaround

I've applied a temporary workaround that modifies the request to fetch pull requests with a higher per_page limit, aiming to reduce the impact of pagination. However, this is not a scalable solution, especially for repositories with pull requests exceeding the per_page limit. Here's the patch:

diff --git a/exporter/gather.go b/exporter/gather.go
index 7a72db2..de73e5c 100644
--- a/exporter/gather.go
+++ b/exporter/gather.go
@@ -111,7 +111,7 @@ func getReleases(e *Exporter, url string, data *[]Release) {
 func getPRs(e *Exporter, url string, data *[]Pull) {
  i := strings.Index(url, "?")
  baseURL := url[:i]
- pullsURL := baseURL + "/pulls"
+ pullsURL := baseURL + "/pulls?per_page=100"
  pullsResponse, err := asyncHTTPGets([]string{pullsURL}, e.APIToken())
  
  if err != nil {

This workaround increases the per_page parameter to 100, but a more robust solution that iterates through all pages to calculate the total count is needed.

Additional Context

Another thing is that the test for checking github_repo_pull_request_count is also not working i guess because if dont make a real http request.

@henrymcconville Can you maybe take a look on this issue?

bump 😬

Hey @steinbrueckri, I don't have much capacity for the next few weeks to implement a proper fix but if you want to raise a PR to increase the default perpage limit, i'll happily approve. Thanks!

@henrymcconville First of all, thank you for your feedback. I really appreciate the work you do on maintaining this project. For me, this issue is not urgent at the moment because I am currently using a version with the workaround I described in the PR body. Please address it whenever you have the time.

I understand that maintaining an open-source project can be very demanding, and I truly value the effort you put into it. However, I believe that spending quality time with your family and taking care of personal matters is far more important. Your well-being and family should always come first.

I tried to fix the issue myself, but my Go skills are not sufficient. Therefore, I'm glad that you have this on your radar and are willing to look into it when you can. Thanks again for all your hard work. 🤗