ast-interview / spiderframe

A web spider/crawler written in C

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This spiderframe is a lightweight web crawler for capturing internet content like news, comments and video descriptions. It's written in C, and uses libcurl to download web pages and uses pcre to extract text. Neither DOM nor XPATH is supported.

To make:
cd libjson/
make
make install

cd ..
make

To run:
write down your config in sf_conf.xml and url.txt
run objs/sf
check the output files named pattern*.txt

About

A web spider/crawler written in C


Languages

Language:C 98.0%Language:Makefile 1.6%Language:Shell 0.2%Language:C++ 0.2%