AutomindGPT is an AI-driven assistant that leverages advanced language models and speech-to-text/text-to-speech functionalities to provide an interactive user experience. This project integrates various tools and APIs to deliver seamless voice and text-based interactions.
- Voice Recording and Transcription: Record audio and transcribe it to text using the WhisperModel.
- Text-to-Speech: Convert text responses to audio using ElevenLabs.
- Advanced Language Models: Use OpenAI's GPT-4 for intelligent conversations and task handling.
- Secure and Sanitized Input: Ensure all user inputs are sanitized for safety.
- Automated Task Execution: Employs autogen's AssistantAgent and UserProxyAgent for task automation.
-
Clone the repository:
git clone https://github.com/yourusername/automindgpt.git cd automindgpt
-
Set up the environment: Ensure you have the necessary API keys and environment variables set up. Refer to FAQ for more details.
-
Install dependencies:
pip install -r requirements.txt
-
Recording Audio:
from autogen import Cache from elevenlabs import play FORMAT = pyaudio.paInt16 CHANNELS = 2 RATE = 16000 CHUNK = 1024 RECORD_SECONDS = 15 WAVE_OUTPUT_FILENAME = "file.wav" def recording(WAVE_OUTPUT_FILENAME): audio = pyaudio.PyAudio() stream = audio.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=CHUNK) print("recording...") frames = [] for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)): data = stream.read(CHUNK) frames.append(data) print("finished recording") stream.stop_stream() stream.close() audio.terminate() waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb') waveFile.setnchannels(CHANNELS) waveFile.setsampwidth(audio.get_sample_size(FORMAT)) waveFile.setframerate(RATE) waveFile.writeframes(b''.join(frames)) waveFile.close()
-
Transcribing Audio:
def transcribe(WAVE_OUTPUT_FILENAME): with open(WAVE_OUTPUT_FILENAME, "rb") as file: transcription = client.audio.transcriptions.create( file=(WAVE_OUTPUT_FILENAME, file.read()), model="whisper-large-v3", prompt="Specify context or spelling", response_format="json", language="en", temperature=0.0 ) print(transcription.text) return transcription.text
-
Text-to-Speech:
ELEVENLABS_API_KEY = "api-key" def synthesis(text): client = ElevenLabs(api_key=ELEVENLABS_API_KEY) audio = client.generate( optimize_streaming_latency="0", text=text, voice="06oPEcZqPWhZ2IeTcOJc", model="eleven_turbo_v2" ) play(audio)
-
Setting up the Assistant and User Proxy:
from autogen import AssistantAgent, UserProxyAgent from autogen.coding import LocalCommandLineCodeExecutor llm_config = { "config_list": [{"model": "gpt-4o-mini", "api_key": os.environ["OPENAI_API_KEY"]}], } assistant = autogen.AssistantAgent( name="assistant", llm_config=llm_config ) user_proxy = autogen.UserProxyAgent( name="user_proxy", human_input_mode="TERMINATE", max_consecutive_auto_reply=10, is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"), code_execution_config={ "executor": LocalCommandLineCodeExecutor(work_dir="coding"), }, llm_config=llm_config, system_message="""Reply TERMINATE if the task has been solved at full satisfaction. Otherwise, reply CONTINUE, or the reason why the task is not solved yet.""", )
-
Sanitizing Input:
import re def sanitize_input(user_input: str) -> str: sanitized_input = re.sub(r'[^\w\s]', '', user_input) return sanitized_input
-
Initiating Chat:
synthesis("Welcome to AutomindGPT! Select 1 for Voice or 2 for manual input.") choice = input("Welcome to AutomindGPT! Select 1 for Voice or 2 for manual input.") if choice == "1": record = recording(WAVE_OUTPUT_FILENAME) autogen_input_1 = transcribe(WAVE_OUTPUT_FILENAME) synthesis(f"Write your prompt for autogen: {autogen_input_1}") autogen_input = sanitize_input(autogen_input_1) elif choice == "2": autogen_input_1 = input(f"Write your prompt for autogen: (default: 'Plot a chart of NVDA and TESLA stock price change YTD.'):") synthesis(f"Write your prompt for autogen: {autogen_input_1}") autogen_input = sanitize_input(autogen_input_1) if autogen_input_1 == "": autogen_input = "Plot a chart of NVDA and TESLA stock price change YTD." synthesis(autogen_input) with Cache.disk() as cache: messages = user_proxy.initiate_chat(assistant, message=autogen_input, cache=cache, summary_method="reflection_with_llm")
Feel free to open issues and submit pull requests. Contributions are welcome!
This project is licensed under the MIT License.