Running in AWS Lambda Containers
tomardern opened this issue · comments
Hi,
Now that AWS supports containers in Lambda is there a plan / has anyone attempted to get this repo to work using a container instead of the provided binaries/layers?
Thanks,
Yep. I got it to work.
You need to use "/tmp" for downloads, and you need to call the 'driverEnableHeadlessDownloads' function to enable headless chrome to be able to download files if you want that (see link in function for source). I pinned my selenium version to selenium==3.141
and use Python 3.7.9. Chromedriver and headless chrome versions are also pinned (see dockerfile below for versions).
I use the following python/selenium functions to set up the driver:
def driverEnableHeadlessDownloads(driver: webdriver, downloadDir: str) -> webdriver:
"""
Need this voodoo function to allow serverless chrome downloads.
From: https://github.com/shawnbutton/PythonHeadlessChrome/blob/master/driver_builder.py
Parameters
----------
driver: selenium webdriver
downloadDir: directory used for downloads
Returns
-------
selenium webdriver
"""
driver.command_executor._commands["send_command"] = (
"POST",
"/session/$sessionId/chromium/send_command",
)
params = {
"cmd": "Page.setDownloadBehavior",
"params": {"behavior": "allow", "downloadPath": downloadDir},
}
driver.execute("send_command", params)
def makeDefaultChromeOptions() -> webdriver.ChromeOptions:
"""
Set up default chrome options
Returns
-------
selenium webdriver
"""
options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_argument("--disable-gpu")
options.add_argument("--window-size=1280x1696")
options.add_argument("--disable-application-cache")
options.add_argument("--disable-infobars")
options.add_argument("--no-sandbox")
options.add_argument("--hide-scrollbars")
options.add_argument("--enable-logging")
options.add_argument("--log-level=0")
options.add_argument("--single-process")
options.add_argument("--ignore-certificate-errors")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--homedir=/var/task")
options.add_argument(
"user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (HTML, like Gecko) "
"Chrome/61.0.3163.100 Safari/537.36"
)
return options
class Driver:
def __init__(self, chromeDriver: str, prefs: dict, headlessChromeBinary: str):
if not pathlib.Path(chromeDriver).exists():
raise FileNotFoundError(f"Chrome driver not found at {chromeDriver}")
self.chromeDriver = chromeDriver
self.prefs = prefs
self.options = makeDefaultChromeOptions()
self.options.add_experimental_option("prefs", prefs)
self.options.binary_location = headlessChromeBinary
self.driver = None
def __enter__(self):
logger.info(
f"Setting up headless chrome-based browser with preferences {self.prefs}"
)
self.driver = webdriver.Chrome(self.chromeDriver, options=self.options)
driverEnableHeadlessDownloads(self.driver, "/tmp")
return self.driver
def __exit__(self, excType, excVal, excTb):
logger.info("Shutting down driver")
self.driver.close()
chromePrefs = {
"download.default_directory": chromeDownloadPath,
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"safebrowsing.enabled": False,
}
This is the Dockerfile I use for deployment:
FROM public.ecr.aws/lambda/python:3.7
RUN mkdir -p /opt/bin && mkdir -p /opt/extensions && mkdir /var/task/.downloads \
&& curl -SL https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-55/stable-headless-chromium-amazonlinux-2017-03.zip \
> /opt/bin/headless-chromium.zip \
&& unzip /opt/bin/headless-chromium.zip -d /opt/bin && rm /opt/bin/headless-chromium.zip \
&& curl -SL https://chromedriver.storage.googleapis.com/2.43/chromedriver_linux64.zip > /opt/bin/chromedriver.zip \
&& unzip /opt/bin/chromedriver.zip -d /opt/bin && rm /opt/bin/chromedriver.zip \
&& chmod 777 /opt/bin/chromedriver
# Add poetry files
ADD poetry.lock /var/task
ADD pyproject.toml /var/task
RUN pip install --upgrade pip \
&& pip install poetry --no-cache-dir \
# Export requirements from poetry project
&& poetry export -f requirements.txt --output /var/task/requirements.txt \
&& pip uninstall -y poetry \
&& pip install -r requirements.txt --target /var/task --no-cache-dir \
&& pip install awslambdaric --target /var/task --no-cache-dir
ADD awsLambda /var/task
CMD [ "main.handler" ]
And this is my pulumi function to create the lambda
lambdaFunction = lambda_.Function(
resource_name="myLambda",
image_uri="XXXXXXXXX.dkr.ecr.XXXXX.amazonaws.com"
f"/myLambda:latest-prod",
memory_size=1024,
role=role.arn,
package_type="Image",
description="This lambda does things.",
timeout=500,
tags={
"environment": "prod",
"creator": "pulumi",
"project": "myLambda",
"project-url": "https://github.com/XXXXXXX/XXXXXXX",
"maintainer": "myname",
"maintainer-email": "mymail@myprovider.com",
},
)
I test the lambda function locally by using the awslambdaric python module. After building the dockerfile, I call:
docker run -d -v ~/.aws-lambda-rie:/aws-lambda -p 9000:8080 \
--entrypoint /aws-lambda/aws-lambda-rie \
--env-file .temp/.env \
docker.io/myorg/myimg \
/var/lang/bin/python -m awslambdaric main.handler ## 'main' is my lambda file, 'handler' is the lambda name
Firing curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}'
in a terminal invokes the lambda. I usually just call docker ps
, note down the container id, and then call docker logs
on it.
Hope this helps someone!
@tomardern
I could make it. Please visit my repository https://github.com/umihico/docker-selenium-lambda
chromePrefs = { "download.default_directory": chromeDownloadPath, "download.prompt_for_download": False, "download.directory_upgrade": True, "safebrowsing.enabled": False, }
May i know which downlaod path i have to provide ?
Is /var/task/.Download