tomtyiu / CodeAutogenAI-LLM

Code AutoGen LLM

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AutomindGPT

AutomindGPT is an AI-driven assistant that leverages advanced language models and speech-to-text/text-to-speech functionalities to provide an interactive user experience. This project integrates various tools and APIs to deliver seamless voice and text-based interactions.

Features

  • Voice Recording and Transcription: Record audio and transcribe it to text using the WhisperModel.
  • Text-to-Speech: Convert text responses to audio using ElevenLabs.
  • Advanced Language Models: Use OpenAI's GPT-4 for intelligent conversations and task handling.
  • Secure and Sanitized Input: Ensure all user inputs are sanitized for safety.
  • Automated Task Execution: Employs autogen's AssistantAgent and UserProxyAgent for task automation.

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/automindgpt.git
    cd automindgpt
  2. Set up the environment: Ensure you have the necessary API keys and environment variables set up. Refer to FAQ for more details.

  3. Install dependencies:

    pip install -r requirements.txt

Usage

  1. Recording Audio:

    from autogen import Cache
    from elevenlabs import play
    
    FORMAT = pyaudio.paInt16
    CHANNELS = 2
    RATE = 16000
    CHUNK = 1024
    RECORD_SECONDS = 15
    WAVE_OUTPUT_FILENAME = "file.wav"
    
    def recording(WAVE_OUTPUT_FILENAME):
        audio = pyaudio.PyAudio()
        stream = audio.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=CHUNK)
        print("recording...")
        frames = []
        for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
            data = stream.read(CHUNK)
            frames.append(data)
        print("finished recording")
        stream.stop_stream()
        stream.close()
        audio.terminate()
        waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
        waveFile.setnchannels(CHANNELS)
        waveFile.setsampwidth(audio.get_sample_size(FORMAT))
        waveFile.setframerate(RATE)
        waveFile.writeframes(b''.join(frames))
        waveFile.close()
  2. Transcribing Audio:

    def transcribe(WAVE_OUTPUT_FILENAME):
        with open(WAVE_OUTPUT_FILENAME, "rb") as file:
            transcription = client.audio.transcriptions.create(
                file=(WAVE_OUTPUT_FILENAME, file.read()),
                model="whisper-large-v3",
                prompt="Specify context or spelling",
                response_format="json",
                language="en",
                temperature=0.0
            )
            print(transcription.text)
            return transcription.text
  3. Text-to-Speech:

    ELEVENLABS_API_KEY = "api-key"
    
    def synthesis(text):
        client = ElevenLabs(api_key=ELEVENLABS_API_KEY)
        audio = client.generate(
            optimize_streaming_latency="0",
            text=text,
            voice="06oPEcZqPWhZ2IeTcOJc",
            model="eleven_turbo_v2"
        )
        play(audio)
  4. Setting up the Assistant and User Proxy:

    from autogen import AssistantAgent, UserProxyAgent
    from autogen.coding import LocalCommandLineCodeExecutor
    
    llm_config = {
        "config_list": [{"model": "gpt-4o-mini", "api_key": os.environ["OPENAI_API_KEY"]}],
    }
    
    assistant = autogen.AssistantAgent(
        name="assistant",
        llm_config=llm_config
    )
    
    user_proxy = autogen.UserProxyAgent(
        name="user_proxy",
        human_input_mode="TERMINATE",
        max_consecutive_auto_reply=10,
        is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
        code_execution_config={
            "executor": LocalCommandLineCodeExecutor(work_dir="coding"),
        },
        llm_config=llm_config,
        system_message="""Reply TERMINATE if the task has been solved at full satisfaction.
        Otherwise, reply CONTINUE, or the reason why the task is not solved yet.""",
    )
  5. Sanitizing Input:

    import re
    
    def sanitize_input(user_input: str) -> str:
        sanitized_input = re.sub(r'[^\w\s]', '', user_input)
        return sanitized_input
  6. Initiating Chat:

    synthesis("Welcome to AutomindGPT! Select 1 for Voice or 2 for manual input.")
    choice = input("Welcome to AutomindGPT! Select 1 for Voice or 2 for manual input.")
    
    if choice == "1":
        record = recording(WAVE_OUTPUT_FILENAME)
        autogen_input_1 = transcribe(WAVE_OUTPUT_FILENAME)
        synthesis(f"Write your prompt for autogen: {autogen_input_1}")
        autogen_input = sanitize_input(autogen_input_1)
    
    elif choice == "2":
        autogen_input_1 = input(f"Write your prompt for autogen: (default: 'Plot a chart of NVDA and TESLA stock price change YTD.'):")
        synthesis(f"Write your prompt for autogen: {autogen_input_1}")
        autogen_input = sanitize_input(autogen_input_1)
        if autogen_input_1 == "":
            autogen_input = "Plot a chart of NVDA and TESLA stock price change YTD."
            synthesis(autogen_input)
    
    with Cache.disk() as cache:
        messages = user_proxy.initiate_chat(assistant, message=autogen_input, cache=cache, summary_method="reflection_with_llm")

Contributing

Feel free to open issues and submit pull requests. Contributions are welcome!

License

This project is licensed under the MIT License.

About

Code AutoGen LLM


Languages

Language:Python 100.0%