maxwell-martin / netflix-movies-scraper

This is a Node.js and Puppeteer project that scrapes movies from Netflix, gets movie information from the OMDB API, and exports all data to a CSV file.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Netflix Movies Scraper

This project scrapes "all" movies from Netflix based on the main netflix genres, gathers information about each movie from the OMDB database, and downloads all data to a csv file in the project folder. To do this, I use Node.js and Puppeteer, and I gather movie information by making requests to the OMDB API. Sometimes the movie title taken from Netflix is the same as another movie that is older or newer. When movie data is requested from OMDB, OMDB returns the latest movie. This can cause some inaccurate results.

DISCLAIMER and Future Development

This project was solely for fun with the hopes of making it easier to choose a movie to watch. As of 12/20/2019, I do not plan to continue with this project. The branch feature-get-movie-years is my latest attempt to get more accurate movie information. It does not currently work. Netflix appears to implement a flagging policy that locks profiles (and maybe accounts) from accessing the service after making too many requests (e.g. Failing login multiple times, opening too many tabs of Netflix, making reverse-engineered API requests for too large of a payload/too often). When this happens, Netflix displays an error message that says, "Netflix Site Error. We are unable to process your request. Please go to the Netflix home page by clicking the button below." Netflix has never admitted to a flagging policy that I know of, but I believe the policy exists. While doing this project, I experienced this Netflix error more than once. It resolved for me within hours. Lastly, Netflix does not allow scraping. If you visit, netflix.com/robots.txt, you will see that it only allows specific bots to scrape their site. Because I like having a Netflix account, I am no longer going to attempt to scrape their site. If you choose to try using my program, you are choosing to do so at your own risk.

How do you use this?

  1. If you do not have Node.js, download it.
  2. Clone the repository.
  3. Unzip the downloaded directory.
  4. Change into the directory via command line.
  5. Type: npm ci. This will clean install the project and download the required modules. You should see a folder called 'node_modules' now inside the directory.
  6. Option 1: Install project globally to NPM - Type in command line: npm install -g .. Now, type in netflix-movies-scraper, and click enter.
  7. Option 2: Run project without installing to NPM - Type in command line: node .\index.js, and click enter.
  8. You will be prompted via the command line to enter your Netflix username, password, and profile name. Answer each question by typing in your response and clicking enter. After the last question, the program will begin scraping. If you are worried about entering in your username and password, please read the code. A valid username and password is required so that Puppeteer can login to Netflix via the headless browser and begin scraping.
  9. View the scraping status in the command line.
  10. When the scraper is done, you will have a CSV file inside the project directory called netflix-movies-as-of-DATE.csv.

Whose code and which websites/articles did I view when making this program?

The names and links below are my attempt to give credit to those whose public information/code helped me. I also spent a lot of time on Stack Overflow figuring out how to do things. I have only included links from Stack Overflow related to Puppeteer, Node.js, or an NPM module.

About

This is a Node.js and Puppeteer project that scrapes movies from Netflix, gets movie information from the OMDB API, and exports all data to a CSV file.

License:MIT License


Languages

Language:JavaScript 100.0%