yuriak / UniversalDataCrawler

The release version of DufeDataCrawler

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Distributed Universal Data Crawler

Features

  • Extensible plugin architecture (support runtime add-on)
  • Customized commands for plugins
  • Wired with Kafka and HDFS easily
  • WebUI Panel for controlling
  • Distributed deployment over clusters
  • Enhancement based on WebCollector

Usage

  • Fork the repository and add all libs into your classpath
  • Develop your parsing logic by extends the Plugin class
  • Pack your plugin class as jar file, and add into the plugin path
  • Register your plugin in config file

Note that The project has stopped maintenance

About

The release version of DufeDataCrawler


Languages

Language:Java 83.7%Language:JavaScript 10.4%Language:CSS 5.8%