VRDate / spider-flow

A new generation of crawler platform that defines the crawler process in a graphical way, and completes the crawler without writing code.

Home Page:https://www.spiderflow.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

introduction | feature | plugin | DEMO site | Documentation | Update Log | [Screenshot](#Project Part Screenshot) | [Other Open Source](#Other Open Source Project) | Disclaimer

introduce

The platform defines crawlers in the form of flow charts, and is a highly flexible and configurable crawler platform

Features

  • Support Xpath/JsonPath/css selector/regex extraction/mashup extraction
  • Support JSON/XML/Binary format
  • Support multiple data sources, SQL select/selectInt/selectOne/insert/update/delete
  • Support crawling JS dynamically rendered (or ajax) pages
  • Support Proxy
  • Support auto save to database/file
  • Commonly used string, date, file, encryption and decryption functions
  • Support for plugin extensions (custom executors, custom methods)
  • Task monitoring, task log
  • Support HTTP interface
  • Support automatic cookie management
  • Support for custom functions

Plugins

Partial screenshot of the project

crawler list

Crawler List

crawler test

Crawler Test

Debug

Debug

logs

Log

Other open source projects

Disclaimer

Do not apply spider-flow to any work that may violate legal regulations and moral constraints, please use spider-flow friendly, abide by the spider agreement, and do not use spider-flow for any illegal purposes. If you choose to use spider-flow, it means that you abide by this agreement, and the author will not bear any legal risks and losses caused by your violation of this agreement, and all consequences will be borne by you.

About

A new generation of crawler platform that defines the crawler process in a graphical way, and completes the crawler without writing code.

https://www.spiderflow.org

License:MIT License


Languages

Language:Java 100.0%Language:Dockerfile 0.0%