A dockerized environment, document convervsion system, and automated scraping tool for extracting policy audit text data from .doc/.docx & .pdf files
-
You'll need to start out by downloading git bash https://git-scm.com/downloads
-
Then ensure that you have WSL 2 on your machine.
- Check to see if you have it:
# run this in powershell
wsl -l -v
#output
NAME STATE VERSION
something something 2
-
If this is not what you see, then you'll need to either upgrade or install wsl
-
If the wsl command failed, simply run
wsl --install
in administrator powershell -
If it didn't fail, but you have version 1, you'll need to upgrade.
-
- in Git Bash (Windows) or your system terminal, run:
git clone https://www.github.com/hudnash/phil.git
- if your system doesn't have bash (Very likely if you're on Windows), get Git Bash, install it
- You can do this easily by running the following command while in the phil directory
mkdir data/zip
- If you have an Intel or AMD processor run
sh x64phil.sh
- If you have an Apple Silicon chip or otherwise have an arm64 CPU, run
sh arm64phil.sh