AI-as-a-Service: Architecting GenAI Application Governance on Azure with Azure API Management and Microsoft Fabric

This repo serves as a reference architecture for tracking usage of large and small language models on Azure. Many organizations want to understand AI metrics, including what models are being used, by whom, and how often. They also want to track tokens being consumed, and the prompts being passed in. This leads to the ability to create chargeback models for consuming applications and users and enables analysis to be done on prompt usage and best practices. Azure API Management recently announced a new policy to send token information to App Insights. This is a great feature, but doesn't enable long term usage analysis or handle other llm/slm deployments. This architecture provides a way to track all of the data needed to understand AI usage in a scalable and cost-effective manner.

Architecture Overview

Azure API Management serves as the cornerstone for this architecture as it enables different consumer access to the same api endpoint through the use of subscriptions or jwt tokens. APIM policy also allow the logging of request/response data to Event Hubs so that it can be processed outside of the request/response path. The data generated is suitable for analytics queries, so rather than land it in a traditional database, Microsoft Fabric becomes a cost-effective and scalable solution for storing the data. Power BI can then be used to create reports on the data in the Lakehouse.

The reference implementation consists of the following components:

Azure OpenAI: These are the models that are exposed as APIs using Azure API Management. You could deploy any combination of models through Azure AI Studio
Azure API Management: This is used to expose the OpenAI models as APIs and track usage data.
Event Hubs: This is used to ingest usage data from Azure API Management.
Microsoft Fabric: This is used to process and store the usage data in a scalable and cost-effective manner.

Flow:

A client makes a request to the model through Azure API Management using a subscription key.
Azure API Management forwards the request to the OpenAI model deployment.
Azure API Management logs the subscription id and request/response data to Event Hubs using a log-to-eventhub policy.
An Eventstream processor in Microsoft Fabric reads the data from Event Hubs.
The output of the stream is writen to a delta table in a Lakehouse.
The data in the delta table is then queried via a Power BI report or a Notebook.

Note, you can easily swap out the subscription key for tracking an individual user by using JWT tokens and associating the user with the token. This would allow you to track usage at the user level.

Setup

This tutorial assumes you have familiarity with the technologies used in this architecture and have deployed instances of each. If you are new to any of the technologies, please refer to the documentation provided by Microsoft.

Azure OpenAI

Create a model deployment in Azure OpenAI and note the endpoint.
Enable the System Assigned Identity on your Azure API Management instance and grant it 'Cognitive Services OpenAI User' role on your OpenAI instance. This will allow APIM to call the OpenAI endpoint without needing to rely on the subscription key for OpenAI.

Azure Event Hubs

Create an Event Hub called 'ai-usage' within your Event Hub instance.
Grant the APIM managed identity the 'Azure Event Hubs Data Sender' role on the 'ai-usage' Event Hub so that it can write to the Event Hub without needing a connection string.

Azure API Management

To create the EventLogger for APIM that is using the managed identity, we have to use the Rest API to create it. The easiest way to do this is the Try It feature in the docs. You will need to provide the following information:

loggerId: ai-usage
resourceGroupName: your_resource_group
serviceName: your_apim_instance_name
subscriptionId: your_subscription_id

body:

{
    "properties": {
        "loggerType": "azureEventHub",
        "description": "eventhub logger for ai usage",
        "credentials": {
            "endpointAddress":"your_eventhub_namespace.servicebus.windows.net",
            "identityClientId":"SystemAssigned",
            "name":"ai-usage"
        }
    }
}

Import the OpenAI inference specification into Azure API Management.
Update the API settings to rename the Subscription Header Name from 'Ocp-Apim-Subscription-Key' to 'api-key'. The OpenAI API expects the subscription key to be passed in as 'api-key'. Developers will set this value to their APIM subscription key.
Add the following policy to the 'All Operations' section of your OpenAI API in APIM. This will allow APIM to call Open AI using its managed identity, and then log the request/response data to Event Hubs.

<policies>
    <inbound>
        <base />
        <authentication-managed-identity resource="https://cognitiveservices.azure.com" output-token-variable-name="msi-access-token" ignore-error="false" />
        <set-header name="Authorization" exists-action="override">
            <value>@("Bearer " + (string)context.Variables["msi-access-token"])</value>
        </set-header>
        <set-variable name="requestBody" value="@(context.Request.Body.As<string>(preserveContent: true))" />
    </inbound>
    <backend>
        <base />
    </backend>
    <outbound>
        <base />
        <choose>
            <when condition="@(context.Response.StatusCode == 200)">
                <log-to-eventhub logger-id="ai-usage">@{
                    var responseBody = context.Response.Body?.As<string>(true);
                    var requestBody = (string)context.Variables["requestBody"];             
                    return new JObject(
                        new JProperty("EventTime", DateTime.UtcNow),
                        new JProperty("AppSubscriptionKey", context.Request.Headers.GetValueOrDefault("api-key",string.Empty)),                     
                        new JProperty("Request", requestBody),
                        new JProperty("Response",responseBody )  
                    ).ToString();
                }</log-to-eventhub>
            </when>
        </choose>
    </outbound>
    <on-error>
        <base />
    </on-error>
</policies>

Test that your APIM instance can call the OpenAI endpoint by using the 'Test' tab in the APIM portal.

Microsoft Fabric

At the time of this writing, connecting to an Event Hub from Fabric must be done using a Shared Access Key, so we cannot use a managed identity to connect to it. This means the connection string will be stored in the Event Hub stream configuration.

Ingestion

Create a new Workspace.
Create a Lakehouse to store the data.
In your Event Hub instance, create a create a Shared access policy for the ai-usage Event Hub that has 'Listen' permissions. Copy the Primary Key.
Create a new Event stream in the workspace
- The source will be the 'ai-usage' Event Hub.
- The destination will be a new managed Delta table in the Lakehouse called 'AIData'.
Invoke your OpenAI APIM endpoint several times to send some test data in. You should see the Delta table created and data in it.

Reporting

Switch to the SQL Analytics endpoint.
Run the following query to create a view that makes it easier to see the token usage by subscription key.

CREATE OR ALTER VIEW [dbo].[AIUsageView] AS
SELECT CAST(EventTime AS DateTime2) AS [EventTime],
[AppSubscriptionKey],
JSON_VALUE([Response], '$.object') AS [Operation],
JSON_VALUE([Response], '$.model') AS [Model],
[Request], 
[Response],
CAST(JSON_VALUE([Response], '$.usage.completion_tokens') AS INT) AS [CompletionTokens],
CAST(JSON_VALUE([Response], '$.usage.prompt_tokens') AS INT) AS [PromptTokens],
CAST(JSON_VALUE([Response], '$.usage.total_tokens') AS INT) AS [TotalTokens]
FROM 
[YOUR_LAKEHOUSE_NAME].[dbo].[AIData]

Refresh the Views to see the new one created.
Click on the Reporting tab and select 'Automatically update semantic model'
Create a new report using the AIUsageView as the data source.

Data Science

Create a new Notebook.
Load the managed delta table into a dataframe.

df = spark.sql("SELECT * FROM YOUR_LAKEHOUSE_NAME.AIData LIMIT 1000")
display(df)

sjuratov / AIAsAService