ellisonpatterson / archive

YouTube Annotation Archive

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

For cloudrac3r's work, see README.md in the node folder.


Youtube Annotation Archive

Provides scripts for archiving YouTube Annotations. See the wiki for information about how it works.

Annotations on every YouTube video will be deleted forever on the 15th of January. The purpose of this project is to archive as much annotation data as possible before that happens.

The current process is to scrape as many channel IDs as possible, then to scrape video IDs from those channels, then to download annotation data for those videos.

Usage

Installing and running a worker (Node.js):

With Docker:

Download the Dockerfile located in the /docker folder, navigate to the folder you downloaded the file into and run the following command to build the image:

$ docker build -t archive .

Use the following commands to create a container with the image and run it to begin the archiving process:

$ docker create --name=archive-worker archive:latest
$ docker container start archive-worker

On Ubuntu:

# Install dependencies
$ sudo apt-get install curl python-software-properties
$ curl -sL https://deb.nodesource.com/setup_10.x | sudo -E bash -
$ sudo apt-get install nodejs gcc g++ make

$ git clone https://github.com/omarroth/archive
$ cd archive/node
$ npm install
$ cd worker
$ node index.js

Installing and running a worker (Crystal):

On Ubuntu:

# Install dependencies
$ curl -sSL https://dist.crystal-lang.org/apt/setup.sh | sudo bash
$ sudo apt-get update
$ sudo apt-get install crystal libssl-dev libxml2-dev libyaml-dev libgmp-dev libreadline-dev librsvg2-dev

$ git clone https://github.com/omarroth/archive
$ cd archive
$ shards
$ crystal build src/worker.cr --release
$ ./worker -u https://archive.omar.yt -t 20
$ ./worker -h
    -u URL, --batch-url=URL          Master server URL
    -t THREADS, --max-threads=THREADS
                                     Number of threads for downloading annotations
    -h, --help                       Show this help

Contributors

About

YouTube Annotation Archive

License:GNU Affero General Public License v3.0


Languages

Language:Crystal 55.7%Language:JavaScript 43.9%Language:Shell 0.3%Language:Dockerfile 0.2%