MaryJaneInChain / WechatHomepageSpider

A web crawler for Wechat homepage, written in perl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WechatHomepageSpider

Synopsis

A web crawler for Wechat homepage, written in perl

Dependencies

  • Switch
  • Encode
  • LWP
  • JSON
  • HTML::TreeBuilder

How to use

Screenshot

# example.pl
#!/bin/env perl

use strict;
use warnings;
use WechatHomepageSpider;

my $biz = 'Mzi42kJfmSAA==';
my $hid = 1;
my $agent_name = 'SampleSpider/0.1';

WechatHomepageSpider::scan_homepage($biz, $hid, $agent_name);
WechatHomepageSpider::download_homepage($biz, $hid, $agent_name);

and then

$ chmod 755 ./example.pl
$ ./example.pl

or

$ perl ./example.pl

the downloaded files will be like

Homepage Title ─┬─ Channel 1
                ├─ Channel 2
                ├─ Channel 3
                └─ Channel 4 ─┬─ Article 1.txt 
                              ├─ Article 1
                              ├─ Article 2.txt
                              └─ Article 2     ─┬─ cover.jpg
                                                ├─ img_0.jpg
                                                └─ img_1.jpg

TODO List

  • Add special characters handle
  • Multithreading

About

A web crawler for Wechat homepage, written in perl

License:MIT License


Languages

Language:Perl 100.0%