This is a Puppeteer script written in TypeScript for web scraping purposes. The script automates browser actions to interact with a website, solve reCAPTCHA challenges, and download a PDF file. It uses additional Puppeteer plugins for stealth and reCAPTCHA solving.
Before running the script, ensure you have the following installed and configured:
- Node.js and npm: Download and install Node.js
- Git: Download and install Git
-
Clone the repository:
git clone https://github.com/PrantaDas/puppetter-web-scrapping.git
-
Navigate to the project directory:
cd puppetter-web-scrapping
-
Install dependencies:
pnpm install
-
Create a
.env
file in the root of the project. -
Add the following environment variables to the
.env
file:URL= https://www.gob.mx/curp # Replace with the target URL IDENTIFIER= replace with the sample CURP or identifier CAPTCHA_TOKEN=your_captcha_token # Replace with your 2Captcha token
Run the script using the following command:
pnpm dev