xvrh / puppeteer-dart

A Dart library to automate the Chrome browser over the DevTools Protocol. This is a port of the Puppeteer API

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to use puppeteer.connect on an instance, that runs on a google cloud function?

Wizzel1 opened this issue · comments

I am trying to connect to a puppeteer instance that runs on a google cloud function v2.

This is the setup inside the function:

import getPort from 'get-port';
import puppeteer from "puppeteer-extra";

const port = await getPort();
const browser = await puppeteer.launch({
    args: [
        `--remote-debugging-port=${port}`,
    ],
});

console.log(browser.wsEndpoint());

this logged an endpoint. I copied that endpoint and tried to connect to it via

await puppeteer.connect(myEndpoint);

but it threw an error:

SocketException: OS Error: The remote computer refused the network connection.
, errno = 1225, address = 127.0.0.1, port = 59250

How can I fix that?

This is a very interesting idea and I would love to know if it's possible.

The difficulty is not really about puppeteer but rather about how Cloud Function works. How to expose an external port and how to have a long-lived function running?

From the error message it seems chromium bound the port to a local address 127.0.0.1 and thus you cannot connect to it from your client.

Let us know if you make any progress on this project :-)

⚠️ EDIT: Solved this step, keeping it for reference

@xvrh thanks for taking the time to look at my issue. In the meantime I have decided to create my own server with dart_frog (mainly because I am more experienced in dart).

I am currently stuck on creating an image for the server.

I initially ran into this error message:

ProcessException: No such file or directory
Command: unzip .local-chromium / 901912_chrome-linux.zip -d .local-chromium / 901912

but I found #161 where you posted some code to download chromium to the container.

Unfortunately, I am not able to get this working, because I have never worked with docker before.

Since a custom dockerfile for dart_frog is required in this case, I have copy pasted this dockerfile as starting point.

I have tried to add your recommendation from #161 but apparently, I am doing something wrong because I get the error
=> ERROR [build 10/10] RUN dart bin/download_chromium.dart

This is my current dockerfile:

FROM debian:buster

# All the dependencies recommanded to install Chromium
RUN  apt-get update \
     && apt-get install -y ...
# An example of using a custom Dockerfile with Dart Frog
# Official Dart image: https://hub.docker.com/_/dart
# Specify the Dart SDK base image version using dart:<version> (ex: dart:2.17)
FROM dart:stable AS build

WORKDIR /app

# Resolve app dependencies.
COPY pubspec.* ./
RUN dart pub get

# Copy app source code and AOT compile it.
COPY . .

# Generate a production build.
RUN dart pub global activate dart_frog_cli
RUN dart pub global run dart_frog_cli:dart_frog build

# Ensure packages are still up-to-date if anything has changed.
RUN dart pub get --offline
RUN dart compile exe build/bin/server.dart -o build/bin/server
RUN dart bin/download_chromium.dart

# Build minimal serving image from AOT-compiled `/server` and required system
# libraries and configuration files stored in `/runtime/` from the build stage.
FROM scratch
COPY --from=build /runtime/ /
COPY --from=build /app/build/bin/server /app/bin/
# Uncomment the following line if you are serving static files.
# COPY --from=build /app/build/public /public/

# Start the server.
CMD ["/app/bin/server"]

@xvrh my bad, I forgot to create the download_chromium.dart file. I created it in the root directory of my project. (Please tell me if you would advise otherwise)

This is my dockerfile now:

FROM debian:buster
# All the dependencies recommanded to install Chromium
RUN  apt-get update \
     && apt-get install -y ... 

# An example of using a custom Dockerfile with Dart Frog
# Official Dart image: https://hub.docker.com/_/dart
# Specify the Dart SDK base image version using dart:<version> (ex: dart:2.17)
FROM dart:stable AS build

WORKDIR /app

# Resolve app dependencies.
COPY pubspec.* ./
RUN dart pub get

# Copy app source code and AOT compile it.
COPY . .

# Generate a production build.
RUN dart pub global activate dart_frog_cli
RUN dart pub global run dart_frog_cli:dart_frog build

# Ensure packages are still up-to-date if anything has changed.
RUN dart pub get --offline && dart download_chromium.dart
RUN dart compile exe build/bin/server.dart -o build/bin/server

# Build minimal serving image from AOT-compiled `/server` and required system
# libraries and configuration files stored in `/runtime/` from the build stage.
FROM scratch
COPY --from=build /runtime/ /
COPY --from=build /app/build/bin/server /app/bin/
# Uncomment the following line if you are serving static files.
# COPY --from=build /app/build/public /public/

# Start the server.
CMD ["/app/bin/server"]

However when I run this image and try to use puppeteer in it, I am getting an error:
FileSystemException: Creation of temporary directory failed, path = '/tmp' (OS Error: No such file or directory, errno = 2)

Do you know if that is related to puppeteer?

I think you need to re-work a bit the Dockerfile. You can't use a FROM scratch image because Chrome requires more dependencies.

Here is my experiment:

FROM dart:stable

RUN  apt-get update \
     && apt-get install -y wget gnupg ca-certificates procps libxss1 \
     && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
     && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
     && apt-get update \
     # We install Chrome to get all the OS level dependencies, but Chrome itself
     # is not actually used as it's packaged in the node puppeteer library.
     # Alternatively, we could could include the entire dep list ourselves
     # (https://github.com/puppeteer/puppeteer/blob/master/docs/troubleshooting.md#chrome-headless-doesnt-launch-on-unix)
     # but that seems too easy to get out of date.
     && apt-get install -y google-chrome-stable curl unzip sed git bash xz-utils libglvnd0 ssh xauth x11-xserver-utils libpulse0 libxcomposite1 libgl1-mesa-glx sudo \
     && rm -rf /var/lib/{apt,dpkg,cache,log} \
     && wget --quiet https://raw.githubusercontent.com/vishnubob/wait-for-it/master/wait-for-it.sh -O /usr/sbin/wait-for-it.sh \
     && chmod +x /usr/sbin/wait-for-it.sh

ENV CHROME_FORCE_NO_SANDBOX=true

COPY . .
RUN dart pub get
RUN dart bin/download_chromium.dart

ENTRYPOINT ["dart", "bin/server.dart"]

EXPOSE 8080

You can make a few changes if you which:

  • You can pre-compile the server with RUN dart compile exe -o /server bin/server.dart and have ENTRYPOINT ["/server"]
  • The RUN dart bin/download_chromium.dart is not really well placed because every time you build the docker image it will download again chromium (severals hundreds MO). We should try to rewrite this script in shell scripts and move the command above "COPY . .". Alternatively, remove this command and just ensure you have the ".local-chromium" folder in you directory

@xvrh I would prefer not to mess with the default folder structure because I do not have the experience to fix potential bugs.
Or do you say that

You can pre-compile the server with RUN dart compile exe -o /server bin/server.dart and have ENTRYPOINT ["/server"]

Is a requirement?

I have built the image with this file:

# An example of using a custom Dockerfile with Dart Frog
# Official Dart image: https://hub.docker.com/_/dart
# Specify the Dart SDK base image version using dart:<version> (ex: dart:2.17)
FROM dart:stable AS build

WORKDIR /app

# Resolve app dependencies.
COPY pubspec.* ./
RUN dart pub get

RUN  apt-get update \
     && apt-get install -y wget gnupg ca-certificates procps libxss1 \
     && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
     && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
     && apt-get update \
     # We install Chrome to get all the OS level dependencies, but Chrome itself
     # is not actually used as it's packaged in the node puppeteer library.
     # Alternatively, we could could include the entire dep list ourselves
     # (https://github.com/puppeteer/puppeteer/blob/master/docs/troubleshooting.md#chrome-headless-doesnt-launch-on-unix)
     # but that seems too easy to get out of date.
     && apt-get install -y google-chrome-stable curl unzip sed git bash xz-utils libglvnd0 ssh xauth x11-xserver-utils libpulse0 libxcomposite1 libgl1-mesa-glx sudo \
     && rm -rf /var/lib/{apt,dpkg,cache,log} \
     && wget --quiet https://raw.githubusercontent.com/vishnubob/wait-for-it/master/wait-for-it.sh -O /usr/sbin/wait-for-it.sh \
     && chmod +x /usr/sbin/wait-for-it.sh

ENV CHROME_FORCE_NO_SANDBOX=true

# Copy app source code and AOT compile it.
COPY . .

# Generate a production build.
RUN dart pub global activate dart_frog_cli
RUN dart pub global run dart_frog_cli:dart_frog build

# Ensure packages are still up-to-date if anything has changed.
RUN dart pub get --offline
RUN dart compile exe build/bin/server.dart -o build/bin/server

# # Build minimal serving image from AOT-compiled `/server` and required system
# # libraries and configuration files stored in `/runtime/` from the build stage.
# FROM scratch
# COPY --from=build /runtime/ /
# COPY --from=build /app/build/bin/server /app/bin/
# # Uncomment the following line if you are serving static files.
# # COPY --from=build /app/build/public /public/

# # Start the server.
# CMD ["/app/bin/server"]


ENTRYPOINT ["dart", "build/bin/server.dart"]

EXPOSE 8080

but when I now call the puppeteer entry point I get this error:
Exception: Websocket url not found

as in #128 you recommended running puppeteer.launch with noSandboxFlag: true so this is how I launch puppeteer:

    browser = await puppeteer.puppeteer.launch(
      headless: false,
      noSandboxFlag: true,
      slowMo: const Duration(milliseconds: 20),
      defaultViewport:
          const puppeteer.DeviceViewport(width: 1920, height: 1080),
    );

@Wizzel1 Can you share your whole project so I can try it myself?

@xvrh sorry, a bit more context: This is the documentation how to run a default dart_frog server and this is the documentation for the custom dockerfile

@Wizzel1 I just quickly tried and it works for me if you remove the headless: false in "routes/index.dart".

The first request is a bit slow because it has to download chromium. (since you removed the RUN download_chromium.dart from the Dockerfile).

@xvrh your are right, removing headless: false works for me too.
This brings up 2 new questions:

  1. Is it not possible at all to run puppeteer in headful mode on a server?
  2. Is it possible to connect a headful chromium instance to a headless one?
  1. I think you can. But what is your use-case? Is it to run Chrome extensions?
    Here is an adapted Dockerfile:
Dockerfile
# An example of using a custom Dockerfile with Dart Frog
# Official Dart image: https://hub.docker.com/_/dart
# Specify the Dart SDK base image version using dart:<version> (ex: dart:2.17)
FROM dart:stable AS build

RUN  apt-get update \
     && apt-get install -y wget gnupg ca-certificates procps libxss1 xvfb \
     && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
     && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
     && apt-get update \
     # We install Chrome to get all the OS level dependencies, but Chrome itself
     # is not actually used as it's packaged in the node puppeteer library.
     # Alternatively, we could could include the entire dep list ourselves
     # (https://github.com/puppeteer/puppeteer/blob/master/docs/troubleshooting.md#chrome-headless-doesnt-launch-on-unix)
     # but that seems too easy to get out of date.
     && apt-get install -y google-chrome-stable curl unzip sed git bash xz-utils libglvnd0 ssh xauth x11-xserver-utils libpulse0 libxcomposite1 libgl1-mesa-glx sudo \
     && rm -rf /var/lib/{apt,dpkg,cache,log} \
     && wget --quiet https://raw.githubusercontent.com/vishnubob/wait-for-it/master/wait-for-it.sh -O /usr/sbin/wait-for-it.sh \
     && chmod +x /usr/sbin/wait-for-it.sh

WORKDIR /app

# Resolve app dependencies.
COPY pubspec.* ./
RUN dart pub get

ENV CHROME_FORCE_NO_SANDBOX=true

# Copy app source code and AOT compile it.
COPY . .

# Generate a production build.
RUN dart pub global activate dart_frog_cli
RUN dart pub global run dart_frog_cli:dart_frog build

# Ensure packages are still up-to-date if anything has changed.
RUN dart pub get --offline
RUN dart compile exe build/bin/server.dart -o build/bin/server

COPY entrypoint.sh /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]

EXPOSE 8080
entrypoint.sh
#!/bin/sh

export DISPLAY=:99.0
Xvfb -ac :99 -screen 0 1280x1024x16 > /dev/null 2>&1 &

dart build/bin/server.dart
  1. To launch a chromium process you use puppeteer.launch() (with or without headless).
    Internally it will:
  • Start a process (with dart:io Process.start).
  • Wait until the internal devtool server of chrome is started
  • Connect to this server with a Websocket
  • Create a client for this server and return it (The Browser class)

You can also connect to an existing devtool server with puppeteer.connect. It just perform the last 2 steps. You can use this function on a mobile app or on a web page because it doesn't start a process, it just open a websocket connection.

@xvrh My use case is the following:

I want to automate some tasks with puppeteer on a server but sometimes the user has to enter an OTP or solve a recaptcha.

In this case I want to notify the user, connect a local puppeteer to the one that is running on the server, solve the recaptcha and close the window, so that the instance on the server can continue its job.

It's not possible to do puppeteer.connect(headful: true) and open a window painting a remote browser.

I can think of a few alternatives but they all have risks and drawbacks:

  • Take screenshots on the server (with page.screenshot), present the screenshots to the user, record its actions (clicks, keyboard inputs...) and replay the action on the server each time.

  • Use a real Remote Desktop Viewer technology to remotely take control of the Chrome window on the server.

  • When the server detects a captcha during an authentication, it gives up. The local client is notified and try to perform the same action with a headful chrome. The user solves the captcha. Once it is done, the local puppeteer collects the credentials (cookies etc...) and send them to the server to continue the job.

@xvrh That's a bummer.
Thanks for clarifying.
I would prefer option 1 or 2 in this case I think because I can't be sure that the recaptcha triggers again when I close the server-instance and open the page again locally, right?

Do you know if step 2 is possible with dart / flutter out of the box by any chance?

I can't be sure that the recaptcha triggers again when I close the server-instance and open the page again locally

You can't be sure of anything with a captcha, it's purpose is to prevent automation :-) On some websites, it can be presented at anytime if a behaviour is considered suspect.

Do you know if step 2 is possible with dart / flutter out of the box by any chance?

I think this is beyond the scope of flutter. You have to look at technology like https://en.wikipedia.org/wiki/Virtual_Network_Computing and softwares using it: https://en.wikipedia.org/wiki/Comparison_of_remote_desktop_software

You can't be sure of anything with a captcha, it's purpose is to prevent automation :-)

Good point! :D

I think this is beyond the scope of flutter. You have to look at technology like https://en.wikipedia.org/wiki/Virtual_Network_Computing and softwares using it: https://en.wikipedia.org/wiki/Comparison_of_remote_desktop_software

I think I will try capturing the cookies when the user first sets up his account. Or maybe I should do the whole automation locally so the user can take control of the browser if needed, though this isn't the user experience I would have liked to provide.

One thing I don't fully understand is this:

In the documentation of this package you say

You can still use puppeteer-dart on Flutter either with:
Flutter on Mobile BUT with the actual Chrome instance running on a server and accessed from the mobile app using puppeteer.connect

which got me thinking that it should be possible what I want to do. Can you explain how my use case is different from your example here?

Flutter on Mobile BUT with the actual Chrome instance running on a server and accessed from the mobile app using puppeteer.connect

What I describe is: the Chrome process runs on the server (headless or headful) and the script commanding it runs on a mobile app. There is no browser running in the mobile app, just a Websocket connection.

In the mobile app, you use the puppeteer client to send command like:

  • Page.screenshot. The real browser takes the screenshot and respond to the client with the png bytes).
  • Input.dispatchMouseEvent. The real browser simulates a mouse event in the current page.
  • Runtime.evaluate. The real browser inject the piece of Javascript and returns the result of the execution to the client.

@xvrh I see. Thanks for answering all my questions. No matter how this project turns out, being able to use puppeteer in dart is invaluable for me, so thanks for maintaining this repository!

@Wizzel1 I am also looking to host a Chromium instance on the server, from your project:

try {
  browser = await puppeteer.puppeteer.launch(
    noSandboxFlag: true,
  );
  return Response();
}

you're returning an empty response and it works? How can an empty response connect to Flutter app via websocket?

@sukhcha-in No, since the package author mentioned that what I am trying to do is not possible, I didn't look further into connecting chrome via websockets. Sorry

@Wizzel1 Alright, thank you for the info. So the only way is to create a server and perform specific actions using routes.

@Wizzel1 even though this didn't work out for you. This Dockerfile is exactly what I needed for my dart frog project. Thanks!