itpleb / NCrawler

.NET based webcrawler

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NCrawler

.NET based webcrawler

Simple and very efficient multithreaded web crawler with pipeline based processing written in C#. Contains HTML, Text, PDF, and IFilter document processors and language detection(Google). Easy to add pipeline steps to extract, use and alter information.

Total rewrite of NCrawler from 2010 using more modern programming. Now on v4

About

.NET based webcrawler

License:Apache License 2.0


Languages

Language:C# 69.8%Language:HTML 28.3%Language:PowerShell 1.9%Language:Batchfile 0.1%