ai api assistant chatbot gpt-4 hugging-face huggingface huggingface-transformers microsoft openai openai-api pdf pdf-generation windows word

💻 Ask your Documents 🤖

👋 Welcome to the Document QA system! This repository contains the code for a system that allows you to ask questions about your documents and get answers based on their contents. It supports a wide range of document formats, including PDF, Word, Excel, PowerPoint, text files, and even images!

🚀 Features

💻 Supports a variety of document formats, including PDF, Word, Excel, PowerPoint, text files, and images
🤖 Uses the Hugging Face Transformers library to create embeddings for document chunks
🔍 Uses the FAISS library to create an index for those embeddings, allowing for efficient similarity search
💬 Allows users to ask questions about their documents and get answers based on the contents of those documents
⚡️ Uses multiprocessing to parallelize the creation of the index for improved performance

📋 Requirements

Python 3.6 or higher
The following Python packages:
- transformers
- langchain
- fitz
- Pillow
- textract
- pandas
- python-pptx
- concurrent-futures
- opencv-python (for image support)

🔧 Usage

Clone this repository to your local machine:

git clone https://github.com/AiGptCode/AskyourDocuments.git

Install the required Python packages:

pip install transformers langchain fitz pillow textract pandas python-pptx opencv-python concurrent-futures

Set your Hugging Face API key as an environment variable:

export HUGGINGFACE_API_TOKEN=your-api-key

Run the main.py script and enter the path to the directory containing your documents:

python AskyourDocuments.py

Ask a question about your documents and get an answer based on the contents of those documents.

Note: If you want to include images in your search, make sure they are in a supported format (e.g., JPEG, PNG) and are located in the same directory as your other documents.

🤝 Contributing

If you would like to contribute to this project, please follow these steps:

Fork this repository to your own GitHub account.
Create a new branch for your changes:

git checkout -b my-feature-branch

Make your changes and commit them:

git commit -am 'Add some feature'

Push your changes to your fork:

git push origin my-feature-branch

Open a pull request against the original repository.

📄 License

This project is licensed under the MIT License.

🎉 Acknowledgments

The Hugging Face Transformers library for providing pre-trained models and tokenizers
The FAISS library for providing efficient similarity search and clustering of dense vectors
The langchain library for providing utilities for creating and working with language models
The fitz library for providing utilities for working with PDF files
The Pillow library for providing utilities for working with image files
The textract library for providing utilities for extracting text from various file formats
The pandas library for providing utilities for working with tabular data in Python
The python-pptx library for providing utilities for working with PowerPoint files
The concurrent-futures library for providing a high-level interface for asynchronously executing callables
The opencv-python library for providing utilities for working with image and video data (for image support)

About

Welcome to the Document QA system! This repository contains the code for a system that allows you to ask questions about your documents and get answers based on their contents. It supports a wide range of document formats, including PDF, Word, Excel, PowerPoint, text files, and even images!

ai api assistant chatbot gpt-4 hugging-face huggingface huggingface-transformers microsoft openai openai-api pdf pdf-generation windows word

Languages

Language:Python 100.0%