workfor-webapps / DataIngest

Extracts tabular data from PDF into Google Sheet to feed webapp metafinds.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PDEA

Pdf table data extraction and analysis repository

Contributors:

Running the scripts contained in this repository

Pre requisite:

For the sake of brevity, this document will assume the user is running on Linux. If a reader is using windows, they will be able to get a similar terminal interface by using WSL. The main method of using the app is via Docker container. You need to install Docker to your local system to be able to build and run the docker image. The folowing folders should be present in your [developer@workfor.com.au] google drive:

1- Get the client_secret file

  • a. Login to google cloud console
  • b. At the top left corner click [select a project] and select dataingest
  • c. From the top left Navigation bar, select [APIs and Services]
  • d. Click Credentials from the left side menu
  • e. Download the OAuth 2.0 Client IDs as a Json file and rename it to "client_secret.json"
  • f. Copy this file to the same directory as the Dockerfile

2- Docker build

a. In the main directory run: docker build -t pdea[:tag] .

3- Docker run

  • a. If the image is sucessfully built in the previouse step run docker run -d --name dataingest -e PORT=8080 -p 8080:8080 pdea[:tag]
  • b. You can check the logs of the running docker container by running: docker logs -f --details dataingest you should see something like this...

log

4- Open the application

a. the application by default will be served an http://localhost:8080/

5- Stopping and removing the container

  • a. Run docker stop dataingest
  • b. Run docker rm dataingest

Note 1: if at any point you get Credentials error please go to: http://localhost:8080/authorize and follow the steps

Note 2: If you change the code, you need to rebuild the docker image and run the new image

About

Extracts tabular data from PDF into Google Sheet to feed webapp metafinds.org


Languages

Language:Python 77.1%Language:JavaScript 10.8%Language:HTML 10.1%Language:Dockerfile 1.0%Language:CSS 1.0%