Running in AWS Lambda Containers

Question

Running in AWS Lambda Containers

tomardern opened this issue 4 years ago · comments

Hi,

Now that AWS supports containers in Lambda is there a plan / has anyone attempted to get this repo to work using a container instead of the provided binaries/layers?

Thanks,

Jasper Ginn · Answer 1 · Thu Mar 11 2021 16:17:35 GMT+0800 (China Standard Time)

Yep. I got it to work.

You need to use "/tmp" for downloads, and you need to call the 'driverEnableHeadlessDownloads' function to enable headless chrome to be able to download files if you want that (see link in function for source). I pinned my selenium version to selenium==3.141 and use Python 3.7.9. Chromedriver and headless chrome versions are also pinned (see dockerfile below for versions).

I use the following python/selenium functions to set up the driver:

def driverEnableHeadlessDownloads(driver: webdriver, downloadDir: str) -> webdriver:
    """
    Need this voodoo function to allow serverless chrome downloads.
     From: https://github.com/shawnbutton/PythonHeadlessChrome/blob/master/driver_builder.py
    Parameters
    ----------
    driver: selenium webdriver
    downloadDir: directory used for downloads
    Returns
    -------
    selenium webdriver
    """
    driver.command_executor._commands["send_command"] = (
        "POST",
        "/session/$sessionId/chromium/send_command",
    )
    params = {
        "cmd": "Page.setDownloadBehavior",
        "params": {"behavior": "allow", "downloadPath": downloadDir},
    }
    driver.execute("send_command", params)


def makeDefaultChromeOptions() -> webdriver.ChromeOptions:
    """
    Set up default chrome options
    Returns
    -------
    selenium webdriver
    """
    options = webdriver.ChromeOptions()
    options.add_argument("--headless")
    options.add_argument("--disable-gpu")
    options.add_argument("--window-size=1280x1696")
    options.add_argument("--disable-application-cache")
    options.add_argument("--disable-infobars")
    options.add_argument("--no-sandbox")
    options.add_argument("--hide-scrollbars")
    options.add_argument("--enable-logging")
    options.add_argument("--log-level=0")
    options.add_argument("--single-process")
    options.add_argument("--ignore-certificate-errors")
    options.add_argument("--disable-dev-shm-usage")
    options.add_argument("--homedir=/var/task")
    options.add_argument(
        "user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (HTML, like Gecko) "
        "Chrome/61.0.3163.100 Safari/537.36"
    )
    return options
    
class Driver:
    def __init__(self, chromeDriver: str, prefs: dict, headlessChromeBinary: str):
        if not pathlib.Path(chromeDriver).exists():
            raise FileNotFoundError(f"Chrome driver not found at {chromeDriver}")
        self.chromeDriver = chromeDriver
        self.prefs = prefs
        self.options = makeDefaultChromeOptions()
        self.options.add_experimental_option("prefs", prefs)
        self.options.binary_location = headlessChromeBinary
        self.driver = None

    def __enter__(self):
        logger.info(
            f"Setting up headless chrome-based browser with preferences {self.prefs}"
        )
        self.driver = webdriver.Chrome(self.chromeDriver, options=self.options)
        driverEnableHeadlessDownloads(self.driver, "/tmp")
        return self.driver

    def __exit__(self, excType, excVal, excTb):
        logger.info("Shutting down driver")
        self.driver.close()
        
 chromePrefs = {
            "download.default_directory": chromeDownloadPath,
            "download.prompt_for_download": False,
            "download.directory_upgrade": True,
            "safebrowsing.enabled": False,
        }

This is the Dockerfile I use for deployment:

FROM public.ecr.aws/lambda/python:3.7

RUN mkdir -p /opt/bin && mkdir -p /opt/extensions && mkdir /var/task/.downloads \
        && curl -SL https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-55/stable-headless-chromium-amazonlinux-2017-03.zip \
         > /opt/bin/headless-chromium.zip \
        && unzip /opt/bin/headless-chromium.zip -d /opt/bin && rm /opt/bin/headless-chromium.zip \
        && curl -SL https://chromedriver.storage.googleapis.com/2.43/chromedriver_linux64.zip > /opt/bin/chromedriver.zip \
        && unzip /opt/bin/chromedriver.zip -d /opt/bin && rm /opt/bin/chromedriver.zip \
        && chmod 777 /opt/bin/chromedriver

# Add poetry files
ADD poetry.lock /var/task
ADD pyproject.toml /var/task

RUN pip install --upgrade pip \
        && pip install poetry --no-cache-dir \
        # Export requirements from poetry project
        && poetry export -f requirements.txt --output /var/task/requirements.txt \
        && pip uninstall -y poetry \
        && pip install -r requirements.txt --target /var/task --no-cache-dir \
        && pip install awslambdaric --target /var/task --no-cache-dir

ADD awsLambda /var/task

CMD [ "main.handler" ]

And this is my pulumi function to create the lambda

lambdaFunction = lambda_.Function(
        resource_name="myLambda",
        image_uri="XXXXXXXXX.dkr.ecr.XXXXX.amazonaws.com"
        f"/myLambda:latest-prod",
        memory_size=1024,
        role=role.arn,
        package_type="Image",
        description="This lambda does things.",
        timeout=500,
        tags={
            "environment": "prod",
            "creator": "pulumi",
            "project": "myLambda",
            "project-url": "https://github.com/XXXXXXX/XXXXXXX",
            "maintainer": "myname",
            "maintainer-email": "mymail@myprovider.com",
        },
    )

I test the lambda function locally by using the awslambdaric python module. After building the dockerfile, I call:

docker run -d -v ~/.aws-lambda-rie:/aws-lambda -p 9000:8080 \
  --entrypoint /aws-lambda/aws-lambda-rie \
  --env-file .temp/.env \
   docker.io/myorg/myimg \
   /var/lang/bin/python -m awslambdaric main.handler ## 'main' is my lambda file, 'handler' is the lambda name

Firing curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}' in a terminal invokes the lambda. I usually just call docker ps, note down the container id, and then call docker logs on it.

Hope this helps someone!

Umihiko Iwasa · Answer 2 · Thu Apr 15 2021 12:03:40 GMT+0800 (China Standard Time)

@tomardern
I could make it. Please visit my repository https://github.com/umihico/docker-selenium-lambda

KARAN SANJAY KAJROLKAR · Answer 3 · Thu Nov 10 2022 19:35:33 GMT+0800 (China Standard Time)

chromePrefs = {
            "download.default_directory": chromeDownloadPath,
            "download.prompt_for_download": False,
            "download.directory_upgrade": True,
            "safebrowsing.enabled": False,
        }

May i know which downlaod path i have to provide ?
Is /var/task/.Download