tunsheng / RegexScraper

Web scraper for Facebook

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

INTRODUCTION

There are three parts to the installation process Setup Terminal, Setup Workspace and Run with.

SETUP Terminal

NOTE: REPLACE yourusername WITH THE NAME OF YOUR ACCOUNT.

FOR WINDOWS 10

  1. If you are using WINDOWS 10, then follow this link to setup Windows Subsystem For Linux.
  2. Copy setup_win10.ps1 from Installer folder to your C:\Users\yourusername\Documents folder.
  3. Run powershell from Windows Start Menu. In the powershell, type:
    cd C:\Users\yourusername\Documents
    powershell -ExecutionPolicy ByPass -File setup_win10.ps1
    You are done. Skip to Setup Workspace section.

FOR WINDOWS [alternative]

  1. Copy setup.ps1 a.k.a windows shell script from the repository to your C:\Users\yourusername\Documents folder.
  2. Run powershell from Windows Start Menu.
  3. In the powershell, type:
    cd C:\Users\yourusername\Documents
  4. To start downloading, in the powershell type:
    powershell -ExecutionPolicy ByPass -File setup.ps1
  5. If everything goes well, then you are done.

FOR macOS & LINUX

  1. Copy setup.sh to your Downloads folder.
  2. Open terminal.
  3. In the terminal, type:
    cd ~/Downloads
  4. To start downloading, in the terminal type:
    sh setup.sh
  5. If everything goes well, then you are done.

Setup Workspace

NOTE: For macOS/Linux, replace C:\Users\yourusername with just the symbol ~ and forwardslash (\) with backslash (/) in the path. Instead of launching cmder, you will be launching terminal.

IMPORTANT: Use the link from the ABOUT page of Facebook profile.

  1. Replace the word yourusername with the name of your account whenever u see below.
  2. Create a folder in C:\Users\yourusername\Documents folder. For example, name it ServiceList.
  3. Copy iterate.sh, getBusiness.sh, getInfo.pl to the C:\Users\yourusername\Documents\ServiceList folder.
  4. Create a list of links that you want in a text file called list.txt in the C:\Users\yourusername\Documents\ServiceList folder. Enter all the links that you have in separate lines.
  5. Now proceeed to the relevant Run with section.

Run with cmder

  1. Run cmder.exe and go into C:\Users\yourusername\Documents\ServiceList folder by typing:
    cd C:\Users\yourusername\Documents\ServiceList
  2. Start running by: (assuming u already created a list of links)
    sh iterate.sh --input list.txt
    If you have a list of html, then u can do:
    sh iterate.sh --html --input list.txt
    If you have a list with different name (say SHREK.txt):
    sh iterate.sh --input SHREK.txt
    If you created list.txt using Windows' notepad, then use this:
    sh iterate.sh --win-input --input list.txt

Run with Windows Linux Subsystem

  1. Open your linux terminal. Go to your directory by typing the following into your terminal.
  cd mnt/c/Users/yourusername/Documents/ServiceList
  1. Convert DOS formatted script to UNIX compatible script [Optional].
 ./dos2unix.sh iterate.sh
 ./dos2unix.sh getBusiness.sh
 ./dos2unix.sh getInfo.pl
  1. Type the following if you use terminal to create your list.
 ./iterate.sh --win-subsys --input list.txt
  1. Type the following if you use Notepad to create your list.
 ./iterate.sh --win-subsys --win-input --input list.txt

About

Web scraper for Facebook


Languages

Language:Shell 65.1%Language:Perl 21.4%Language:PowerShell 13.6%