andreysaf / pii-redaction-openai-pdftron

This example leverages OpenAI for identifying PII (names, addresses, DOB) and PDFTron for text extraction and redaction.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pii-redaction-openai-pdftron

This example leverages OpenAI for identifying PII (names, addresses, DOB) and PDFTron for text extraction and redaction.

Screenshot

Installation

Inside of server/ create a new file called config.env and place the demo key from PDFTron and Open.AI:

PORT=9000
PDFTRONKEY=
OPENAI_API_KEY=

After in the terminal run the following:

cd client
npm i
npm start
cd server
npm i
npm start

Walkthrough

Node.js server will act as a file storage. PDFTron Node.js SDK will extract text, search, and create markup annotations. Open.AI will detect names and addresses from the text provided by PDFTron.

PII Identification

getNamesAndAddressesFromOpenAI accepts text extracted from a document, and builds a prompt that accepts a natural language command to extract names and addresses. It can be modified to search for other information. For testing purposes the function is commented out. Please uncomment and build your prompt as needed.

const getNamesAndAddressesFromOpenAI = async (text) => {
  return await openai.createCompletion('text-davinci-002', {
    prompt: `Extract names and address from this text: ${text}`,
    temperature: 0,
    max_tokens: 64,
    top_p: 1.0,
    frequency_penalty: 0.0,
    presence_penalty: 0.0,
  });
};

Summarization

Summarization of the contract works in a similar way to PII search, where inside of the prompt Tl;dr is added to the end of the string that needs to be summarized. For testing purposes the function is commented out. Please uncomment and build your prompt as needed.

const summarizeTheContract = async (text) => {
  return await openai.createCompletion('text-davinci-002', {
    prompt: `${text} \n\nTl;dr`,
    temperature: 0.7,
    max_tokens: 60,
    top_p: 1.0,
    frequency_penalty: 0.0,
    presence_penalty: 0.0,
  });
};

Here is a sample summarization of the file in the repository.

This is a contract between a company and a bank for the sale of goods. The company agrees to sell the goods to the bank for a sum of money, and the bank agrees to purchase the goods from the company. The contract includes terms and conditions for the sale and purchase of the goods

About

This example leverages OpenAI for identifying PII (names, addresses, DOB) and PDFTron for text extraction and redaction.

License:MIT License


Languages

Language:JavaScript 77.3%Language:HTML 14.6%Language:CSS 8.1%