scraper
A general purpose web scraper API built on Azure Container Apps
How to clone and deploy
Instructions use the Azure CLI exclusively, since that gives you the most control over the entire deployment.
-
Resource Group: make sure you are using the right account/subscription from the CLI.
- Show currently selected account:
az account show
- If you want to change the subscription:
To lookup the subscription id:
az account set --subscription <subscription-id>
az account list --output table
- Create resource group:
To list available locations:
az group create --name scraper --location <location>
az account list-locations --output table
- Ensure CLI extensions install automatically:
az config set extension.use_dynamic_install=yes_without_prompt
- Show currently selected account:
-
Log Analytics: setup a new log analytics workspace for logs from the app.
- Create it:
Copy the
az monitor log-analytics workspace create -g scraper -n scraper
customerId
value from the returned payload. - Get shared key:
az monitor log-analytics workspace get-shared-keys -g scraper -n scraper
- Create it:
-
Container App Environment: create and configure the app environment with the logs workspace created above.
- List available locations for container app environments:
az provider show -n Microsoft.App --query "resourceTypes[?resourceType=='managedEnvironments'].locations"
- Create with:
az containerapp env create -g scraper -n scraper --logs-workspace-id [customerId] --logs-workspace-key [sharedKey] --location <location>
- List available locations for container app environments:
-
Container Registry: deployments to an app come from images in a container registry, which is populated automatically from CI.
- Create with:
It's probably a good idea to use the same location as the app environment. Note the
az acr create -g scraper -n scraper --sku Basic --location <location>
loginServer
value in the returned payload. - Login to it with:
az acr login -n scraper
- Enable admin mode (so we can get the passwords) with:
az acr update -n scraper --admin-enabled true
- Retrieve the username/password for the registry with:
az acr credential show -n scraper
- Create with:
-
Container App: finally!
- Create with:
az containerapp create -g scraper -n scraper --environment scraper
- Enable HTTP ingress with:
Note: we don't really need HTTPS since Azure will automatically provide a proper HTTPS endpoint.
az containerapp ingress enable -g scraper -n scraper --type external --allow-insecure --target-port 80
- Set the container registry to use:
Note the
az containerapp registry set -g scraper -n scraper --server <loginServer> --username scraper --password <password>
--server
argument is theloginServer
from the previous section.
- Create with:
-
GitHub: on to the setup on the repo side.
- Clone the repo
- Create an Actions repository secret named
AZURE_CONTAINER_PWD
with the container registry password from step4.4
. - Create credentials to update the resource group from CI:
Copy the entire response payload.
az ad sp create-for-rbac --name scraper --role contributor --scopes "/subscriptions/<subscription>/resourceGroups/scraper" --sdk-auth
- Create an Actions repository secret named
AZURE_CREDENTIALS
with the copied value. - If you changed resource names, update the build.yml file accordingly.
Now you can run the build
workflow and see it live!