ccraddock / scrape-parler-R

Scrape the conservative alternative social media network "Parler" in R.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

scrape-parler-R

This repository scrapes parts of the conservative alternative social media network "Parler."

Currently, scrape_parler.R allows you to save the "Parleys" section of Parler's "Discover" page.

scrape_parler_news.R adapts castlelemongrab/parlance's CLI and scrapes the "Affiliates" newsfeed section.

To clone this repository, use:

git clone http://github.com/inh2102/scrape_parler.

Getting Started

Sign up for a Parler account. Be sure to use an email address and phone number you can verify.

Using Selenium

This script relies on the usage of Selenium remote webdrivers. scrape_parler.R uses the RSelenium package, and it's easiest to use Docker to handle remote drives on your machine. Be sure to follow all of the installation instructions, and after you have opened the application, type the following into Terminal/Command Line:

docker pull selenium/standalone-firefox.

When you're ready to start a remote webdriver instance, use:

docker run -d -p 4445:4444 selenium/standalone-firefox:2.53.1.

If you'd like to kill all running instances of Docker, use:

docker stop $(docker ps -q).

Scraping Parler Posts

Once you have cloned this repository, open the scrape_parler.R file (which handles references to the functions.R file - the meat of this repo).

The packages() call will ensure you have installed the three R packages this script uses (RSelenium, rvest, tidyverse).

The df <- scrape_parler(scrolls=10) call begins the scraping, which will take a few minutes. The scrolls argument controls the number of times Selenium scrolls down the page and grabs new posts; the default value of 10 returns 50 posts on my machine (the maximum Parler seems to allow for my feed).

The R console provides periodic status updates on the scraping process -- including required inputs for username (email) and password. After a few moments, a CAPTCHA image will display in your R viewer (scroll to center on the letters), and you must enter the case-sensitive CAPTCHA to proceed. Finally, Parler will text your mobile phone a 6-digit code that you must enter.

Once the scraping has completed, R will return df, which is a list of top posts and Parler's top trending hashtags.

Scraping Parler Affiliates' 'News'

get_parler_news.R adapts the castlelemongrab/parlance CLI to R and scrapes the affiliates' newsfeed of articles.

From parlance:

Then, log in to Parler using an ordinary web browser. Use your browser's development tools and/or cookie storage interface to find Parler's MST (Master Session >Token) and JST (a short-lived session token). Use the init subcommand to create an authorization file using the MST and JST values from your browser. If your >browser supplies you with URI-encoded versions of these values, you should decode them prior to use to avoid duplicate HTTPS requests and/or warning messages from >the tool. Any automation of the above login process is unlikely to be accepted.

Steps (Chrome macOS):

  1. Right-click your browser while on Parler and "Inspect."

  1. Select "Application" on the top bar.

  1. Find Parler's "Cookies" on the left, and locate the key values for jst and mst.

Once you have the long mst and jst cookie string values, feed them into do_credentials() and proceed.

About

Scrape the conservative alternative social media network "Parler" in R.


Languages

Language:R 100.0%