A scalable crawler framework. It covers the whole lifecycle of crawler: downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler.And This is a good c# crawler framework easy to use.Surely support .Net Framework2.0-.Net Framework4.7 ; .Net Standard2.0; .NetCore2.0; Xamarin.Forms ; Xamarin.Android; Xamarin.iOS; Xamarin.Mac; Xamarin.Gtk ; WPF; SliverLight; WindowsForm; Thank you to use.
This project is no longer maintained.Please go to see:
https://github.com/dotnetcore/DotnetSpider
So
- Simple core with high flexibility.
- Simple API for html extracting.
- Annotation with POJO to customize a crawler, no configuration.
- Multi-thread and Distribution support.
- Easy to be integrated.
- .Net Framework2.0 ~ .Net Framework4.7
- .Net Standard2.0
- .NetCore2.0
- Xamarin : Xamarin.Forms ; Xamarin.Android; Xamarin.iOS; Xamarin.Mac; Xamarin.Gtk
- WPF; SliverLight; WindowsForm etc...
- WebMagicSharp:
Install-Package WebMagicSharp -Version 0.0.1.0 dotnet add package WebMagicSharp --version 0.0.1
- WebMagicSharp.Extensions:
Install-Package WebMagicSharp.Extensions -Version 0.0.1. dotnet add package WebMagicSharp.Extensions --version 0.0.1
To write webmagic, I refered to the projects below :
-
WebMagic
Main Reference to write WebMagicSharp.
-
Scrapy
A crawler framework in Python.
-
Spiderman
Another crawler framework in Java.