This project provides a Dockerized web scraping solution using Puppeteer, running on a Linux x86 architecture with the ability to emulate x86 on ARM platforms. It includes a Python client for testing the scraping functionality.
- Docker installed on your system
- Python 3.x installed on your system
Clone the repository to your local machine:
git clone https://github.com/yourusername/puppeteer-web-scraper.git
docker build --platform linux/amd64 -t puppeteer-web-scraper .
To run the Docker container, use the following command:
docker run --rm --platform linux/amd64 -e PORT=5129 -p 5129:5129 puppeteer-web-scraper
This command starts the web scraper server on port 5129.
The Python client script test_scraper.py is used to test the scraping functionality.
- URL: The URL to scrape (optional, defaults to https://www.example.com).
- Container: The CSS selector of the container to scrape (optional).
Using the Default URL and No Container:
python test_scraper.py
python test_scraper.py "https://www.google.com"
python test_scraper.py "https://www.example.com" ".main-content"
docker build --platform linux/amd64 -t puppeteer-web-scraper .
docker run --rm --platform linux/amd64 -e PORT=5129 -p 5129:5129 puppeteer-web-scraper
python test_scraper.py "https://www.example.com" ".main-content"
Contributions are welcome! Please submit a pull request or open an issue to discuss changes.