serverless-chrome gets incomplete source in Lambda

Question

serverless-chrome gets incomplete source in Lambda

mynameissue opened this issue 3 years ago · comments

Hello,
I scrape the Johns Hopkins University's COVID-19 Map in a local environment using python and selenium to get the number of the cases by country and so on.
However, when I tried to do the same thing in aws Lambda, it failed.
The problem is that I can't get the value I want to get; when I try to get the html of covid-map, there is almost nothing inside the tag. ( I will note it at the end).
Firstly, I thought that is because that the serverless-chrome in my aws doesn't support webGL. However, I read the issue(#108) and enabled webGL, the problem still occurs. (I checked whether the browser supports webGL on this website.
As far as I can think of, the difference between the local environment and Lambda is whether using a regular Chrome or serverless-chrome browser.
Could anyone help to resolve this please?

This is the body element which serverless-chrome got.

<body>
    <script src="https://js.arcgis.com/4.19/init.js" data-amd="true"></script>
    <script src="assets/amd-loading-3b41833a646bb19c89df9de8fb3f1a27.js" data-amd-loading="true"></script>
    <div id="initialLoadingContainer" class="loader-icon-container">
        <div class="loader is-active padding-leader-3 padding-trailer-3">
            <div class="loader-bars"></div>
        </div>
    </div>
</body>

This is the code on Lambda.

from selenium import webdriver
from bs4 import BeautifulSoup
import time
import os

def lambda_handler(event, context):

    URL = "https://www.arcgis.com/apps/dashboards/85320e2ea5424dfaaa75ae62e5c06e61"
    
    options = webdriver.ChromeOptions()
    options.add_argument("--headless")
    options.add_argument("--disable-gpu")
    options.add_argument("--hide-scrollbars")
    options.add_argument("--single-process")
    options.add_argument("--ignore-certificate-errors")
    options.add_argument("--window-size=880x996")
    options.add_argument("--no-sandbox")
    options.add_argument("--homedir=/tmp")
    options.binary_location = "/opt/python/bin/headless-chromium"
    
  
    options.add_argument('--disable-dev-shm-usage')
    options.add_argument("--disable-application-cache")
    options.add_argument("--disable-infobars")
    options.add_argument("--enable-logging")
    options.add_argument("--log-level=0")
    
    options.add_argument('--blink-settings=imagesEnabled=false')
    options.add_argument('--disable-extensions')
    options.add_argument('--proxy-server="direct://"')
    options.add_argument('--proxy-bypass-list=*')
    options.add_argument('--start-maximized')
   
   
    
    options.add_argument('--ignore-gpu-blacklist')
    options.add_argument('--enable-webgl')
    options.add_argument('--disable-web-security')
    options.add_argument('--use-gl=osmesa')
    options.add_argument('--data-path=/tmp/data-path')
    options.add_argument('--disk-cache-dir=/tmp/cache-dir')
    
   
    
    browser = webdriver.Chrome(
        "/opt/python/bin/chromedriver",
        options=options
    )
    time.sleep(10)

    browser.get(URL)
    time.sleep(60)  
    html = browser.page_source
    soup = BeautifulSoup(html, 'html.parser')
    print(html)

DiMiTriFrog · Answer 1 · Wed Jun 30 2021 05:52:08 GMT+0800 (China Standard Time)

Same problem, any solution?

Tech Tribe · Answer 2 · Thu May 12 2022 11:23:24 GMT+0800 (China Standard Time)

Any updates here?