- Python (selenium) Lambda Chromium Automation
- This bot allows to automate actions to Lost 112 page on AWS Lambda.
- build to amd64 library
docker buildx build --platform linux/amd64 -f ./Dockerfile -t wallet .
- connect to the ECR
aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 712218945685.dkr.ecr.us-west-2.amazonaws.com
- build to target for x86 & x64 environments
docker buildx build --platform linux/amd64 -f ./Dockerfile -t wallet .
- tag to the docker image
docker tag wallet:latest 712218945685.dkr.ecr.ap-northeast-2.amazonaws.com/wallet:latest
- push to the ECR
docker push 712218945685.dkr.ecr.ap-northeast-2.amazonaws.com/wallet:latest
All the process is explained here. Technologies used are:
- Python 3.6
- Selenium
- Chrome driver
- Small chromium binary
Install docker and dependencies:
make fetch-dependencies
- Installing Docker
- Installing Docker compose
- bin for selenium
- selenium youtube
- setting selenium API for AWS Lambda
To make local development easy, you can use the included docker-compose.
Have a look at the example in lambda_function.py
: it looks up “21 buttons” on Google and prints the first result.
Run it with: make docker-run
If your goal is to use selenium to download files instead of just scraping content from web pages, then
you will need to specify a download_dir
when initializing the WebDriverWrapper. Your download location
should be a writable Lambda directory such as /tmp
. For example, the first code in
lambda_handler
would become
driver = WebDriverWrapper(download_location='/tmp')
This will cause file downloads to automatically download into the download_location
without
requiring a confirmation dialog. You might need to sleep the handler until the file is downloaded
since this occurs asynchronously.
In order to download a file from a link that opens in a new tab (i.e. target='_blank'
) you will need to
call enable_download_in_headless_chrome
in your scraping script after navigating to the desired page, but before
clicking to download. This will replace all target='_blank'
with target='_self'
. For example:
# Navigate to download page
driver._driver.find_element_by_xpath('//a[@href="/downloads/"]').click()
# Enable headless chrome file download
driver.enable_download_in_headless_chrome()
# Click the download link
driver._driver.find_element_by_class_name("btn").click()
Everything is summarized into a simple Makefile so use:
make build-lambda-package
- Upload the
build.zip
resulting file to your AWS Lambda function - Set Lambda environment variables (same values as in docker-compose.yml)
PYTHONPATH=/var/task/src:/var/task/lib
PATH=/var/task/bin
- Adjust lambda function parameters to match your necessities, for the given example:
- Timeout: +10 seconds
- Memory: + 250MB
- CMD override on AWS Lambda (while nothing on the entrypoint)
lambda_function.lambda_handler