magictomagic / policy_crawler_v1

Crawl the latest policies

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Policy Crawler

Background

Many marketing professionals, salespeople, and company executives require access to the latest policy updates to align their market strategies accordingly. Numerous companies designate specific personnel or departments to crawl and organize information from the web, which is then provided to their staff or sold online. This project focuses on crawling the latest publicly released information from major government websites in East and South Mainland China. Initially, the information is disseminated freely via RSS feeds, with considerations for other distribution models in later stages.

Architecture

DDD

Packet dependent constraint

Like maven, constraint specific packages only use in specific layer, and layers can only invoke each other in a specific direction.

  • pdm - too new, wait for years

TODO

  • db redesign to contain: region, openapi summary, picture

  • different feed via params such as: .../zhejiang/feed

  • multimedia download

image download:
    https://fzggw.zj.gov.cn/art/2022/9/1/art_1599545_58934718.html 
    https://fzggw.zj.gov.cn/art/2022/11/1/art_1599545_58935041.html 

License

GPL

About

Crawl the latest policies


Languages

Language:HTML 78.6%Language:Python 21.4%