microsoft / ghcrawler

Crawl GitHub APIs and store the discovered orgs, repos, commits, ...

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Allow defaulting the preferred queue based on information in the request

iamwillbar opened this issue · comments

Issue #72 is a prerequisite for this.

In the configuration for a crawler you should be able to specify patterns that will set the preferred queue for a request based on information in the request. The two scenarios we have so far are:

  1. Changing the priority based on the type of request
  2. Changing the priority based on the URL of the request

For example, you could imaging a configuration that looks like this:

priorityMappings: [
 {"url": "contoso", "preferredQueue": "later"},
 {"type": "Repo", "preferredQueue": "soon"},
 {"type": "Repo", "url": "(hello|world)", "preferredQueue": "immediate"}
]

When a request is being queued, if it doesn't already specify a preferred queue in the request, then the priority mappings would be evaluated in order, and the first one where all of the criteria match would define the preferred queue for that request. URL matching must support regular expressions.