wayzeek / CustomGPT

CustomGPT is a cutting-edge, multilingual chatbot that streamlines text extraction and analysis from PDFs. Using advanced NLP and ML models, it facilitates dynamic conversations across various languages, enhancing productivity and engagement in data-rich environments.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

๐Ÿค– CustomGPT - Chat with your Data ๐Ÿ“š

CustomGPT is a sophisticated, multilingual chatbot designed to streamline the extraction, processing, and interaction with text data from PDF documents.
Leveraging advanced NLP and machine learning models, it enables rich, interactive communication across multiple languages, making it ideal for businesses, educational institutions or individuals dealing with diverse document formats.

๐Ÿ“š Table of Contents

๐Ÿ“– Introduction

CustomGPT harnesses the power of conversational AI to enhance the way organizations or individuals handle document-based information.
By automatically extracting and analyzing text from PDFs and facilitating dynamic interactions through its chatbot interface, CustomGPT transforms static data into actionable insights.
This integration of document processing with advanced dialogue systems offers a unique solution that significantly boosts productivity and user engagement.

Screenshot

image

โœจ Features

  • PDF Text Extraction: Utilizes PyPDF2 for efficient text extraction from PDFs, handling multiple layouts and formats.
  • Advanced Text Processing: Integrates tokenizers and Spacy text splitters for text segmentation, and employs Spacy Language Detection module for robust language detection, ensuring precise text analysis.
  • Multilingual Support: Powered by multiple instances of the transformer-based large language models Mistral-7B-Instruct-v0.2, supports interactions in multiple languages using Hugging Face API:
    • English ๐Ÿ‡ฌ๐Ÿ‡ง
    • Spanish ๐Ÿ‡ช๐Ÿ‡ธ
    • French ๐Ÿ‡ซ๐Ÿ‡ท
    • German ๐Ÿ‡ฉ๐Ÿ‡ช
    • Italian ๐Ÿ‡ฎ๐Ÿ‡น
    • Ukrainian ๐Ÿ‡บ๐Ÿ‡ฆ
    • Russian ๐Ÿ‡ท๐Ÿ‡บ
    • Chinese ๐Ÿ‡จ๐Ÿ‡ณ
    • Japanese ๐Ÿ‡ฏ๐Ÿ‡ต
  • Interactive User Interface: Offers a user-friendly command-line interface that may evolve into a more graphical interface.

๐Ÿš€ Getting Started

โš™๏ธ Installation

  • Step 1: clone the repo
git clone https://github.com/wayzeek/CustomGPT.git
  • Step 2: navigate to the directory
cd CustomGPT
  • Step 3: install dependencies
bash install.sh
  • Step 4: move to virtual environment
source .venv/bin/activate
  • Step 5: start application
python3 main.py 

๐Ÿ” Usage

Process PDFs

  • Step 1: add your PDFs to the data directory
  • Step 2: launch application
python3 main.py
  • Step 3: select if your PDFs is structured by Markdowns (Chapters, Titles, ...) or not
  • Step 4: Choose the chunk size aka the average sizes of your paragraph
  • Step 5: Wait & enjoy chating with your data !

๐Ÿค Contributing

  1. Fork the repo
  2. Create your feature branch (git checkout -b feature/amazingFeature)
  3. Commit your changes (git commit -am 'Add some amazingFeature')
  4. Push to the branch (git push origin feature/amazingFeature)
  5. Open a pull request

๐Ÿ† Credits

This is a solo project made by myself

โš–๏ธ License

MIT License - see the LICENSE file for details

About

CustomGPT is a cutting-edge, multilingual chatbot that streamlines text extraction and analysis from PDFs. Using advanced NLP and ML models, it facilitates dynamic conversations across various languages, enhancing productivity and engagement in data-rich environments.

License:MIT License


Languages

Language:Python 98.5%Language:Shell 1.5%