lorenzbr / GooglePatentsPdfDownloader

Download patents as PDF documents from Google Patents

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Google Patents PDF Downloader

Download patents as PDF documents from Google Patents

Installation

You can install the development version from GitHub with:

pip install git+https://github.com/lorenzbr/GooglePatentsPdfDownloader.git

Please make sure you have Google Chrome and the corresponding chromedriver.exe (see here) installed to access the website using Selenium.

Run GooglePatentsPdfDownloader

python -m GooglePatentsPdfDownloader
  patent      Patent number(s) to be downloaded

optional arguments:
  --driver    Path and file name of the Chrome driver exe
  --brave     Switch application from Google Chrome to Brave.
  --output    An output path where documents are saved. Default ./pdf
  --time      Waiting time in seconds for each request.
  --rm-kind   A list containing the patent kind codes which should be removed from patent numbers

Examples

Download a single patent to the current working directory (not found w/ kind code).

python -m GooglePatentsPdfDownloader US4405829A1 --rm_kind A1
python -m GooglePatentsPdfDownloader EP0551921B1

Download multiple patents using a list of inputs to directory ./patents.

python -m GooglePatentsPdfDownloader US4405829 EP0551921B1 --output "./patents"

With Brave browser download multiple patents using a txt file to director ./pdf.

python -m GooglePatentsPdfDownloader docs/data/patents.txt --brave

Examples (modular)

from GooglePatentsPdfDownloader import PatentDownloader
patent_downloader = PatentDownloader(chrome_driver='chromedriver.exe', brave=True)

# Download a single patent to the current working directory (not found w/ kind code)
patent_downloader.download(patent="US4405829A1", remove_kind_codes=['A1'])
patent_downloader.download(patent="EP0551921B1")


# Download multiple patents using a list of inputs to the current working directory
patent_downloader.download(
    patent=["US4405829A1", "EP0551921B1", "EP1304824B1"],
    output_path="./pdf_files",
    remove_kind_codes=["A1"]
)

# Download multiple patents using a txt file to the current working directory
patent_downloader.download(
    patent="docs/data/patents.txt", 
    output_path="",
    remove_kind_codes=["A1"]
)

License

This repository is licensed under the MIT license.

See here for further information.

About

Download patents as PDF documents from Google Patents

License:MIT License


Languages

Language:Python 100.0%