zhangjh / node_tumblr_spider

A crawler written by Node.js to download tumblr videos.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

node_tumblr_spider

A spider written to crawl tumblr resources by Node.js

以中文阅读

GPL Licence Open Source Love Edit By zhangjh

Description

  • node_tumblr_spider is a spider project to crawl tumblr resources
  • You can configure the torrent user or resource type such as video to crawl.The spider will crawl start from the torrent user and then crawl the user which reblogged by the torrent user and go on until all the users have been downloaded.

Social links

Welcome to cantact me.

Usage

1. Install

Visit Node.js official website to install the latest node enviroment.

It's better to install babun, a better terminal replacement of cmd on Windows system.

2. Dependence

It uses redis as queue. So you must install redis first.

download -> tar xvf xxx.tgz -> make -> cd src && ./redis-server

3. Download Project

zip OR git clone https://github.com/zhangjh/node_tumblr_spider.git

4. Install dependence

    cd node_tumblr_spider
    npm install -d

5. Start to crawl

   npm run start

6. About configuration

You can modify the configuration at ./conf/config.js.

    USER - the torrent user's name
    DOWNLOAD_PRE - the download dir prefix,default as `./download/${user}`
    REDIS_HOST - redis server's host
    REDIS_PORT - redis server's port
    LOG_MODE - if set false, the spider will not show the doawload progress info

Demo

About

A crawler written by Node.js to download tumblr videos.

License:GNU General Public License v3.0


Languages

Language:JavaScript 97.1%Language:Shell 2.9%