aicam / AI_course_generator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AI course generator

This project aims to create slides and transcript (text to speech) for a course based on a document. The document is assumed to be various formats including conversational, meeting, part of a book or research paper. Also, we consider different templates for different courses and the goal is to create a flexible service to generate slides with different templates.

Project framework & dependencies

The project is written as a single web server to receive document and return a JSON format course slides and speeches. The server is developed in Python using Django framework. Haystack library is used as the core library for creating RAG endpoint and OpenAI library is used to use latest features of ChatGPT even ones that are in Beta mode. Haystack supports various databases but here we used OpenSearch which can be easily switched to another database.

Course template

You can find available templates in mlops/templates directory. Templates are simple JSON files with the following structure:

{
  "name": "<name of the course>",
  "description": "<description of the course>",
  "num_slides": "<number of slides>",
  "slides": [
    {
      "header": {
        "component_name": "<name of the component used here>",
        "prompt": "<A specific prompt to add to the main prompt while creating this component output>",
        "params": ["<Array of parameters, each as a dictionary ex: {\"title\": \"slide_0_body_1_0\"}"],
        "rag_query": "<query passed to retrieval in RAG to find information>",
        "delimiter": "<if the component is multi-result generator, indicates delimiter to split result>",
        "output": "<this parameter is filled with the response and contains 'answer' and 'transcript'"
      },
      "body": [
        {
          ... same component structure
        }
  ]
}

As you can see, the main attribute is slides. It contains a list of slide and each slide has multiple components. Component is the smallest object of a slide, for example, in Powerpoint you can have a box with bullet points or a graph, each of these referred as a component here. Each slide has two sections, 1- header and 2- body. Header is a single component and is used to generate a title for the slide. Body can include as many components as needed.

Component structure

Each component is a result of a single API call to ChatGPT and one or multiple queries to the retriever. Each component is identified by its name, consequently, a function exist in template_processor files that handle the component. Each prompt in a component customize the response of ChatGPT for the specific template. The implemented function for a specific component (identified by name) includes a basic prompt that match the needs of the component but each template provides tailored prompts for its course that is appended to the basic prompt. params parameter indicates parameters that have been passed to this component by other components. Each component generates some parameters after being executed. These parameters are passed to the next running components to use. For example, if we create a list of chapter titles in the first slide, we want to have access to them in the second or third slide. It is important to mention that slides are added by order, in this regard, each component can only have access to the parameters that have been generated by their previous components. For example, a component in the slide 1 can not access parameters of the slide 2. params is an array of parameters. A single parameter is a dictionary with a key that indicates the name of the parameter and a value that points to the component that has this parameter. Addressing format is slide_<slide number>_body_<component index in the body>_<index of parameter>. rag_query is the query passed to the retriever to generate embeddings and search for similar documents in the OpenSearch database. delimiter is a simple separator (like \n) that can be used to create parameters.

Retrieval augmented generating (RAG) endpoint

POST mlops/query_rag is the endpoint that generate course based on running RAG on the uploaded document. The uploaded PDF file convert to txt file, then the txt file is broken down to chunks and upload on OpenSearch database, then using text-embedding-ada-002 the raw text is transformed into embedding vectors and stored in the database. Next, the template is run over the document to generate contents of the template. The response of the endpoint is the template json and slides component with the result. The result is stored in the output parameter in the component. Each output has one answer and one transcript. The answer is the component content that will be added to the slide and transcript is the speech associated to this component. The following is a sample response for the basic template and GPT 4 turbo as the model.

{
    "status": "ok",
    "result": {
        "name": "test",
        "description": "test",
        "num_slides": 2,
        "slides": [
            {
                "header": {
                    "component_name": "title",
                    "prompt": "A very short title less than 10 words describing the topic of the context. The short title does not need to be a full sentence but rather full of informative words.",
                    "params": [],
                    "rag_query": "important information",
                    "delimiter": "",
                    "output": {
                        "answer": "Accounting Principles, Bookkeeping, Internal Controls",
                        "transcript": "Accounting and bookkeeping are distinct yet interconnected disciplines. Bookkeeping involves recording transactions, while accounting encompasses interpreting, classifying, and summarizing financial data. Accounting principles guide these processes, ensuring accuracy and consistency. Internal controls are crucial mechanisms for safeguarding assets, enhancing reliability of financial reports, and ensuring compliance with laws and regulations. Understanding these concepts is fundamental for financial management and accountability.",
                        "params": []
                    }
                },
                "body": [
                    {
                        "component_name": "shortdescription",
                        "prompt": "A short description less than 50 words explaining the whole idea of the context. The short description should highlight keywords and concepts mentioned in the context.",
                        "params": [],
                        "rag_query": "important information",
                        "delimiter": "",
                        "output": {
                            "answer": "Explains key accounting principles, differentiates bookkeeping from accounting, introduces financial statements, and discusses internal controls' relevance, tracing accounting's evolution from ancient times to modern practices.",
                            "transcript": "Accounting encompasses a set of principles guiding financial reporting, distinct from bookkeeping's transactional recording. It's evolved from ancient record-keeping to today's complex standards, ensuring transparent, reliable financial statements, supported by crucial internal controls to safeguard assets and enhance operational integrity. Understanding this evolution is key to grasping modern accounting's role in business.",
                            "params": []
                        }
                    },
                    {
                        "component_name": "bulletpoints",
                        "prompt": "Read the whole context and divide the whole knowledge explained in the context into 5 chapters separated by \n and find a name less than 20 words for each chapter. Separate each chapter name by \n. Only write chapter names with \n.",
                        "params": [],
                        "rag_query": "Titles of the topics",
                        "delimiter": "\n",
                        "output": {
                            "answer": "Fundamentals of Accounting and Bookkeeping\nEssential Accounting Terms and Principles\nUnderstanding the Accounting Equation\nOverview of Basic Financial Statements\nConcepts and Importance of Internal Control",
                            "transcript": "In this chapter, we differentiate between bookkeeping and accounting, explore foundational accounting terms and principles, and grasp the accounting equation. We'll overview key financial statements and learn the significance of robust internal controls, guided by Lynn Fountain's expertise.",
                            "params": [
                                "Fundamentals of Accounting and Bookkeeping",
                                "Essential Accounting Terms and Principles",
                                "Understanding the Accounting Equation",
                                "Overview of Basic Financial Statements",
                                "Concepts and Importance of Internal Control"
                            ]
                        }
                    }
                ]
            },
            {
                "header": {
                    "component_name": "title-fixed",
                    "prompt": "",
                    "params": [
                        {
                            "title": "Fundamentals of Accounting and Bookkeeping"
                        }
                    ],
                    "rag_query": "Fundamentals of Accounting and Bookkeeping",
                    "delimiter": "",
                    "output": {
                        "answer": "Fundamentals of Accounting and Bookkeeping",
                        "transcript": "Understanding the difference between bookkeeping and accounting is crucial. Bookkeeping focuses on recording transactions methodically, laying the groundwork for accounting. Accounting, however, encompasses a broader scope, analyzing and summarizing financial data based on accounting principles, which informs the creation of financial statements and guides internal controls.",
                        "params": []
                    }
                },
                "body": [
                    {
                        "component_name": "shortdescription",
                        "prompt": "You are writing a short description less than 50 words about the chapter 'Fundamentals of Accounting and Bookkeeping'. First, find information related to the chapter and focus only on those. Second, find keywords and important information mentioned in the focused information about the chapter 'Fundamentals of Accounting and Bookkeeping'. And finally, write a short description less than 50 words with all those keywords.",
                        "params": [
                            {
                                "chapter-name": "Fundamentals of Accounting and Bookkeeping"
                            }
                        ],
                        "rag_query": "Fundamentals of Accounting and Bookkeeping",
                        "delimiter": "",
                        "output": {
                            "answer": "This chapter delineates the distinctions between bookkeeping and accounting, explicates key accounting principles, introduces the accounting equation, and discusses the importance of internal controls and financial statements in business entities.",
                            "transcript": "Bookkeeping records transactions; accounting interprets and analyzes financial data, applying principles to produce statements and ensure internal controls, crucial for business integrity and decision-making.",
                            "params": []
                        }
                    }
                ]
            }
        ]
    }
}

GPT 4 vision

GPT 4 vision is the latest ChatGPT model that can read images and understand every single detail (both as a graph or figure). POST mlops/query_gpt4_vision will parse the template and use ChatGPT 4 vision instead of RAG to generate slides and transcripts. In this scenario, after sending the PDF file to the endpoint, server split all pages and save them as JPEG files and create base64 encoding of them. These images will be appended to every request sent to the vision model. The following is a sample response from ChatGPT 4 vision.

{
    "status": "ok",
    "result": {
        "name": "test",
        "description": "test",
        "num_slides": 2,
        "slides": [
            {
                "header": {
                    "component_name": "title",
                    "prompt": "A very short title less than 10 words describing the topic of the context. The short title does not need to be a full sentence but rather full of informative words.",
                    "params": [],
                    "rag_query": "important information",
                    "delimiter": "",
                    "output": {
                        "answer": "Introduction to Accounting Principles and Controls",
                        "transcript": "Welcome to our discussion on accounting principles and internal controls—a foundational insight into the mechanics of financial reporting, the rigor of bookkeeping, and the safeguarding of company assets through established protocols. Let's decode these concepts essential for financial accuracy and integrity.",
                        "params": []
                    }
                },
                "body": [
                    {
                        "component_name": "shortdescription",
                        "prompt": "A short description less than 50 words explaining the whole idea of the context. The short description should highlight keywords and concepts mentioned in the context.",
                        "params": [],
                        "rag_query": "important information",
                        "delimiter": "",
                        "output": {
                            "answer": "This document provides an introduction to accounting principles, delineating the differences between bookkeeping and accounting, explaining the accounting equation, and outlining the preparation of financial statements. It details generally accepted accounting principles (GAAP), discusses internal controls per COSO framework, and underscores the importance of compliance, including the Sarbanes-Oxley Act requirements.",
                            "transcript": "This document serves as a foundational overview of accounting principles, highlighting the distinction between bookkeeping and accounting, and demystifying complex concepts like the accounting equation and financial statement preparation. It emphasizes the critical role of GAAP, the COSO framework for internal controls, and the necessity of adherence to compliance standards such as those mandated by the Sarbanes-Oxley Act.",
                            "params": []
                        }
                    },
                    {
                        "component_name": "bulletpoints",
                        "prompt": "Read the whole context and divide the whole knowledge explained in the context into 5 chapters separated by \n and find a name less than 20 words for each chapter. Separate each chapter name by \n. Only write chapter names with \n.",
                        "params": [],
                        "rag_query": "Titles of the topics",
                        "delimiter": "\n",
                        "output": {
                            "answer": "Introduction to Accounting and Internal Controls\nBookkeeping vs. Accounting: Definitions and Differences\nThe Accounting Equation and Financial Statements\nUnderstanding Debits, Credits, and the Double-Entry System\nInternal Control and COSO Framework Essentials",
                            "transcript": "Accounting encompasses bookkeeping but also includes analysis and reporting to support financial decision-making. The accounting equation, illustrating assets equaling liabilities plus equity, underpins financial statements. Double-entry system ensures every transaction is balanced by equal debits and credits. COSO framework provides a structure for robust internal controls within accounting processes.",
                            "params": [
                                "Introduction to Accounting and Internal Controls",
                                "Bookkeeping vs. Accounting: Definitions and Differences",
                                "The Accounting Equation and Financial Statements",
                                "Understanding Debits, Credits, and the Double-Entry System",
                                "Internal Control and COSO Framework Essentials"
                            ]
                        }
                    }
                ]
            },
            {
                "header": {
                    "component_name": "title-fixed",
                    "prompt": "",
                    "params": [
                        {
                            "title": "Introduction to Accounting and Internal Controls"
                        }
                    ],
                    "rag_query": "Introduction to Accounting and Internal Controls",
                    "delimiter": "",
                    "output": {
                        "answer": "Introduction to Accounting and Internal Controls",
                        "transcript": "Welcome to the course on Accounting Principles and Internal Controls, where we'll unlock the foundational elements of accounting, explore the crucial distinction between bookkeeping and accounting, delve into the historical evolution of the accounting profession, and understand the integral role of internal controls in ensuring the accuracy and reliability of financial information.",
                        "params": []
                    }
                },
                "body": [
                    {
                        "component_name": "shortdescription",
                        "prompt": "You are writing a short description less than 50 words about the chapter 'Introduction to Accounting and Internal Controls'. First, find information related to the chapter and focus only on those. Second, find keywords and important information mentioned in the focused information about the chapter 'Introduction to Accounting and Internal Controls'. And finally, write a short description less than 50 words with all those keywords.",
                        "params": [
                            {
                                "chapter-name": "Introduction to Accounting and Internal Controls"
                            }
                        ],
                        "rag_query": "Introduction to Accounting and Internal Controls",
                        "delimiter": "",
                        "output": {
                            "answer": "The chapter provides an overview of key differences between bookkeeping and accounting, basic accounting principles, financial statements, and the COSO internal control framework, setting the foundation for understanding accounting practices and internal controls within organizations.",
                            "transcript": "This chapter lays the groundwork by differentiating bookkeeping from accounting, introducing fundamental accounting principles, clarifying financial statements, and elucidating the COSO framework for internal controls—essential for grasping organizational accounting and control processes.",
                            "params": []
                        }
                    }
                ]
            }
        ]
    }
}

About


Languages

Language:Python 77.8%Language:Jupyter Notebook 20.9%Language:Dockerfile 1.3%