sanlunainiu / Google-Patents-Scraper

Automatically download all PDF files of searching results & their patent families found on Google Patents.

Home Page:https://wenyalintw.github.io/project/google-patents-scraper/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Spoken-Digit Recognizer

Google Patents Scraper

(1) Automatically download all PDF files of searching results & their patent families.
(2) Generate an overview report of searching results.

Table of contents

Application Demo

Introduction

This application scrape Google Patents by two steps:

  • Set Proxy (Optional)
  • Search & Download Patents

Set Proxy (Optional)

  • Set proxy to avoid current ip blocked by Google Patents

preprocessing

Search & Download Patents

  • Select an output directory to store downloaded/generated files
  • Search whatever you like (search terms' format same as Google Patents)
  • Download PDF files of searching results & their patent families

PDF files and auto-generated overview.md will then be stored in selected directory

preprocessing

File Structure of Output Directory

├── PDFs
│   ├── CN104321947A.pdf
│   ├── ...
│   └── readme.txt
├── Family_PDFs
│   ├── CN104321947A's\ Family
│   │   ├── EP2850716B1.pdf
│   │   ├── ...
│   │   └── readme.txt
│   ├── ...
│   └── ...
└── overview.md

Built With

Modules besides python built-ins

Getting Started

Prerequisites

Installation

  • Clone the repo
git clone https://github.com/wenyalintw/Google-Patents-Scraper.git
pip install -r /path/to/requirements.txt
  • Ready to go
cd src
python main.py

Acknowledgments

MIT License (2019), Wen-Ya Lin

About

Automatically download all PDF files of searching results & their patent families found on Google Patents.

https://wenyalintw.github.io/project/google-patents-scraper/

License:MIT License


Languages

Language:Python 100.0%