Welcome to Astra Assistant, your personal AI-powered assistant designed to enhance productivity and streamline tasks with a sophisticated user interface and advanced audio-visual capabilities.
- Speech Recognition π€: Utilize the latest models for accurate voice command processing.
- Text-to-Speech π£οΈ: High-quality voice output using state-of-the-art TTS models.
- Image Analysis πΌοΈ: Analyze and interpret images directly within the app.
- Macro Customization β¨οΈ: Set up and customize macros to suit your workflow.
- User-Friendly UI π₯οΈ: Intuitive and responsive interface built with
customtkinter
.
- Python 3.8 or higher
- Required packages (install via
pip
):tkinter
customtkinter
dotenv
sounddevice
numpy
Pillow
logging
opencv-python
keyboard
speech_recognition
whisper
pygame
-
Clone the repository:
git clone https://github.com/Y4rd13/Astra.git cd Astra
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables:
- Create a
.env
file in the root directory. - Add your OpenAI API key:
OPENAI_API_KEY=your_openai_api_key
- Create a
To start the application, run:
python app.py
- Text Area: Displays messages from the assistant.
- User Input Box: Type your commands or questions here.
- Send Button: Click to send your text input.
- Record Button: Toggle audio recording to interact with the assistant via voice commands.
- Settings Button: Access and customize application settings.
Customize your experience by adjusting settings for:
- Sound Input Device: Choose and test your preferred microphone.
- Macros: Configure keyboard shortcuts for quick actions.
- Models: Select preferred TTS and STT models, and configure the AI model used by the assistant.
- app.py: Main application logic and UI setup.
- settings.py: Configuration management for user settings.
- core.py: Core functionality for handling commands and interactions.
- stt.py: Speech-to-text processing.
- tts.py: Text-to-speech synthesis.
- typer.py: Automated text typing and code parsing.
- vision.py: Image capture and analysis.
- Logging is configured to provide detailed runtime information.
- Logs are displayed in the console with timestamps for easy debugging.
We welcome contributions! Please read our Contributing Guidelines for more information.
This project is licensed under the MIT License. See the LICENSE file for details.
Special thanks to the contributors and the open-source community for their invaluable support and tools.
Enjoy using Astra Assistant! If you encounter any issues, feel free to open an issue on GitHub.
-
Core
- Implement memory for the assistant to remember previous interactions.
- Limit memory/chat-history to a certain number of messages according to the token limit for the current model.
- Implement LangChain for multiple languages as Agents for the assistant.
- Implement local LLMs such as:
- MistralAI
- Codestral
- Whisper fine-tuned models to hear Astra's name.
-
STT
- Optimize response time for STT processing.
- Voice Activity Detection: Automatically detects when you start and stop speaking.
- Always listening: Enable the assistant to listen continuously for voice commands.
- Wake Word Activation: Can activate upon detecting a designated wake word.
- Realtime Transcription: Transforms speech to text in real-time (fast-whisper).
- Integrate Faster-Whisper for faster STT processing.
- vad_filter integration: Enable the voice activity detection (VAD) to filter out parts of the audio without speech. This step is using the Silero VAD model.
- Implement function calling to stop the audio (i.e. Astra stop).
- New user's input will be priority over the current audio input.
- Loading async
Astra_es_windows_v3_0_0.ppn
andporcupine_params_es.pv
from Hugging Face. - Update input audio device settings to allow users to select the desired microphone.
-
Vision
- semantic-chunking for video chunking analysis, instead of the current implementation.
-
Fix
typer.py
- Correct the indentation issue when writing code.
- Ensure the generated code is properly formatted.
- Verify that writing code in different languages maintains the appropriate indentation.
-
Audio Visualizer:
- Add a visualizer to display audio input levels (STT).
- Implement a visualizer for audio output (TTS).
- Make the visualizer responsive.
- Fix visualizer to generate sound across all the plot line.
-
UI
- Add switch button to keep active screen and/or cam vision.
- Add a button to attach a file to the chat. (Image, audio, video, etc.)
- Add a button to clear the chat history.
- Make the chat box responsive to the window size.
- Add sound effects with threading to avoid blocking the UI.
- Add icons to the buttons.
- Make overlay widget.
- Add transparency
- Implement "design settings" to allow users to customize the UI (dark mode, light mode, adjust transparency, etc.)
- Adding Welcome Sound
- Add new icons
-
Fix Default Macro
- Ensure the
ctrl+shift+a
key combination works correctly. - Allow customization of the macro through the settings.
- Ensure the
-
Essentials:
- Astra response time optimization
- General optimization: core (general methods) + stt (loading model) + tts (chunk processing)
- Improve response time for STT
- Implement setting to adjust noise reduction for STT
- Improve response time for the vision module
- Other
- Add more constants (images path, sounds path, etc) to
constants.py
to avoid hardcoding.
La imagen de record_image_path y stop_image_path en realidad no estan fucionando correctamente, la idea es la siguiente:
- Por defecto la imagen utiliza
neuralgt-icon.png
- Cuando el usuario dice el wake word, entonces la imagen cambia a
neuralgt-icon-active.png
- Cuando el usuario deja de hablar, la imagen cambia a
neuralgt-icon.png
En resumen:
- La imagen por defecto es
neuralgt-icon.png
, que tambien indica que no se ha dicho el wake word, o que el proceso del STT ha finalizado (el usuario ya no esta hablando, obviamente luego de decir el wake word). - Cuando se detecta el wake word y luego el usuario esta hablando,
neuralgt-icon-active.png
actua como un indicador para el usuario de que ha dicho el wake word y que su audio esta siendo "escuchado" procesado por el STT.
por defecto -> neuralgt-icon.png
-> luego usuario dice el wake word y empieza a hablar -> neuralgt-icon-active.png
-> el usuario ha dejado de hablar -> vuelta al estado por defecto -> neuralgt-icon.png