metantonio / readme_ml_description

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Readme

Note: Installation have been proved on Windows, still working to adapt it to Unix based systems

Backend Index

Frontend Index

0. Testing Demo without API

0.1 Python

Python version recommended: 3.11.6

pip install pandas
pip install scikit-learn
pip install joblib

0.2 How to run Demo version

0.2.1 Create a Normal model for vectorization of description

python train.py

0.2.2 Create/Re-train a Random Forest model

python random_forest_model.py

0.2.3 Prediction of a given description with a Random Forest Model

python run_saved_model.py

1. API Installation and Use

1.1 Python Installation

1.1.1 Windows

1.1.1.A Python globally:

1.1.1.B Python local environment (optional but recommended):

If you want to control the Python version to use and change between others:

  • Install pyenv (windows):
Invoke-WebRequest -UseBasicParsing -Uri "https://raw.githubusercontent.com/pyenv-win/pyenv-win/master/pyenv-win/install-pyenv-win.ps1" -OutFile "./install-pyenv-win.ps1"; &"./install-pyenv-win.ps1"

You may need to add pyenv to environment PATH.

  • Install Python Version:
pyenv install 3.11.6
  • Use Python Version:
pyenv local 3.11.6

1.1.1 UNIX

1.1.1.A Python local environment (optional but recommended):

curl https://pyenv.run | bash

You may need to restart the shell. You will need add pyenv command to ~/.bashrc:

vim ~/.bashrc

Add the following line into the editor:

eval "$(pyenv virtualenv-init -)"

Save the changes, and now reload the file:

source ~/.bashrc
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init -)"' >> ~/.bashrc
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bash_profile
echo '[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bash_profile
echo 'eval "$(pyenv init -)"' >> ~/.bash_profile

Restart the shell:

exec "$SHELL"

Install python:

  • Install Python Version:
pyenv install 3.11.6
  • Use Python Version:
pyenv local 3.11.6

1.2 Installation of Poetry (virtual environment for python libraries)

1.2.1 Install Poetry

1.2.1.1 Windows

Use Power Shell with administrator permission

(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py -

Note: If py is not recognized, change it with python.

  • Maybe you'll need to add poetry to the PATH environment variable:
%AppData%\Roaming\pypoetry\venv\Scripts

1.2.1.2 Unix

Installation:

curl -sSL https://install.python-poetry.org | python3 -

Add poetry to PATH. Open the editor:

vim ~/.bashrc

Add the following lines:

export PATH="$HOME/.local/bin:$PATH"

Save the changes, and now reload the file:

source ~/.bashrc

You should be able now to run the poetry command, test with:

poetry --version

1.3 Install Python dependencies into virtual environment

  • Create poetry.lock to resolve all dependencies and their sub-dependencies in the pyproject.toml file:
poetry lock --no-update

if fails, use:

poetry lock
  • Install dependencies:
poetry install --with local

Note: On UNIX, some libraries will fail like the ones related with nvidia. Don't worry, after some re-tryings it will be stop.

poetry add pydantic

Note: if poetry is already installed but there are new added libraries, you must do:

poetry update

for ingest .doc and .docx files, i recommend install the following library globally into the python environment:

poetry run pip install docx2txt
  • Copy the .env.example and create .env file (complete the necessary settings for Python server):
cp .env.example .env

2. Create - Delete - Update Database with flask-sqlalchemy ORM

2.1 Create Database for first time

  • Be sure to delete migration folder (Windows)
rmdir "./migrations" -Force -Recurse
echo rm -R -f ./migrations

Then create flask DB instance (inside of virtual environment):

poetry run flask db init

Delete old DB (if exists), and create a new one. (make sure to change the following info with database url, user and name info):

mysql -h localhost -u root -p -e "DROP DATABASE mldescription;"
mysql -h localhost -u root -p -e "CREATE DATABASE mldescription;"

Create tables from models ("modelos") folder:

poetry run flask db migrate
poetry run flask db upgrade

2.2 Update db models

Update tables from models folder:

poetry run flask db migrate
poetry run flask db upgrade

3. Run server with Poetry

  • Run Flask server with Poetry for development (it will restart if detects changes):
poetry run flask run -p 3341 -h 0.0.0.0
  • Run Flask server with Poetry for development-stable (it will not restart if detects changes). Use this to test, or upload large documents:
poetry run flask run -p 3341 -h 0.0.0.0 --no-reload

Note: Backend server will be running at port 3341

4. Install LLM Embeddings (on development yet) this will enable ingestion of documents without make ML models

For Windows 10/11

To be able to install the python library named: "llama-cpp-python", it's necesary to compile it, and for that you will need a C++ compiler.

To install a C++ compiler on Windows 10/11, follow these steps:

    1. Be sure to have at least 35 GB free space on disk. (Yes i know, 35 GB to compile just a library ZzZ, but every CPU is diferent so..., for GPU it will need CUDA and that is 12 GB more of disk space, GPU Installation)
    1. Install Visual Studio 2022.
    1. Make sure the following components are selected:
    • 2.1. Universal Windows Platform development
    • 2.2 C++ CMake tools for Windows

Some times, it's possible that you'll need:

    1. Download the MinGW installer from the MinGW website.
    1. Run the installer and select the gcc component.

Ok, if installations are ready, then you'll add llama-cpp-python (you can do it globally with pip)

poetry run python setup.py
poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python

5. Troubleshooting

Delete virtual environment to re-install all libraries from scratch

For this, be sure to be in the root of this project and do this command to know the name of the virtual environment:

poetry env list

after this, run:

poetry env remove <name.of.the.environment-py3.xx >

or more agressive (if is need it):

poetry env remove python

Reinstall following the Steps in 1.2 Installation of Poetry (virtual environment for python libraries).

Delete vector database (QDrant)

For this, just remove all content inside of the local_data folder (except .gitignore and readme.md). Don't worry, the vector database will be created next time you ingest a file.

Pydantic.v1 not found

If this happens, that means that the wrong version was installed or not even installed. Please do the following command to update pydantic.

poetry add pydantic

WinError 206

Error: Could not install packages due to an OSError: [WinError 206] The filename or extension is too long:

To fix this error on your Windows machine on regedit and navigate to Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem and edit LongPathsEnabled and set value from 0 to 1

ModuleNotFoundError

Recently, in few OS, some libraries could have problems to install into virtual environment. One workaround is to do it directly into poetry. Some of the libraries that problems could be found:

poetry run pip install flask-sqlalchemy==2.5.1
poetry run pip install Flask-Migrate==3.1.0
poetry run pip install Flask==2.2.5
poetry run pip install flask_swagger==0.2.14
poetry run pip install flask_cors==3.0.10
poetry run pip install flask_admin==1.6.0
poetry run pip install flask_jwt_extended==4.4.0
poetry run pip install flask_bcrypt==1.0.1
poetry run pip install flask_apscheduler==1.13.1
poetry run pip install injector==0.21.0
poetry run pip install llama_index==0.9.3
poetry run pip install mysql-connector-python==8.2.0
poetry run pip install SQLAlchemy==1.4.45

Error trying to install pyenv

pyenv.ps1 is not digitally signed then a Security warning error after installing.

First, open Power Shell in administrator mode, and execute:

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

Press A, then try to install pyenv.

6. Endpoints documentation

For this, please go to the folder docs

0. Frontend Requirements:

1. Frontend installation:

1.1 Install the packages (just need to do this the first time):

npm install --legacy-peer-deps

1.2. Edit a .env file (create .env if do not exist):

Edit the section of Front-end variables as you need

1.2.1 Creation of .env file if it doesn't exist:

cp .env.example .env

2. Run front-end server:

npm run start

3. Open Browser:

Note that port by default is 3002. http://localhost:3002

Develop testting app (for testing, design fast, etc, but not 100% usable because you need to login to access to protected endpoints): http://localhost:3002/iq-gpt-develop

4. Build Frontend Project:

This should create the necessary bundle files to upload this app in any hosting server

npm run build

5. Frontend Documentation

For this, please go to the folder docs

Localtunnel

Localtunnel is a nodeJS library to expose a port to internet in order to share some apps on internet.

Open a new prompt and expose port 3002 to expose web UI:

lt --port 3002

This should retrieve an url, put that URL into .env file to configure correctly the front-end.

Your localtunnel's password will be at:

https://loca.lt/mytunnelpassword

Create image with Docker:

A. Automatic process

docker-compose up

B. Manual process

Create Image:

docker build --tag ubuntu-qlx-gpt .

Enter to Image:

docker run -it ubuntu-qlx-gpt

Theory of RAG's

For this, please go to the folder docs/theory/

About