Mark1002 / gcp-crawler-proxy

crawler proxy base on gcp.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GCP crawler proxy

Introduction

This project is a crawler proxy project base on GCP. Use Squid, the forword proxy server as the docker image running on VM group, and use GCP tcp proxy load balancer as the access entrypoint.

Build and deploy

  1. build sqiud forword proxy docker image.
$ ./script/build_docker_push.sh
  1. create tf backend gcs and add backend.cof
$ ./script/create_state_bucket.sh 

backend.conf

bucket  = "<tf backend gcs>"
prefix  = "<terraform state prefix>"
  1. init terraform
$ terraform init -backend-config=backend.conf
  1. create terraform.tfvars to apply your env.
project_id = "<GCP PROJECT ID>"
service_account = "<SVC NAME>@<GCP PROJECT ID>.iam.gserviceaccount.com"
region = "us-central1"
target_size = 100
  1. apply terraform
$ terraform apply

Usage

import requests

proxies = {
  'https': 'http://35.190.69.208:8085', # your gcp tcp proxy address
}

res = requests.get('https://ifconfig.me/', proxies=proxies)
print(res.text)

reference

  1. https://harry-lin.blogspot.com/2019/05/docker-azuredockersquid-proxy.html
  2. https://github.com/sameersbn/docker-squid#configuration
  3. https://medium.com/google-cloud/squid-proxy-cluster-with-ssl-bump-on-google-cloud-7871ee257c27
  4. https://cloud.google.com/load-balancing/docs/tcp/setting-up-tcp#configuring_the_load_balancer

Open in Cloud Shell

About

crawler proxy base on gcp.


Languages

Language:HCL 52.2%Language:Shell 17.6%Language:Go 14.5%Language:Dockerfile 10.6%Language:Makefile 5.0%