This project implements a Text Generation API using FastAPI. It leverages pre-trained models from the Hugging Face library to generate text responses based on user-provided prompts. The application supports multiple models (e.g., gpt2
, distilgpt2
) and allows fine-tuning of generation parameters.
The core of the application is built using FastAPI, a modern web framework for building APIs with Python. FastAPI is known for its speed, automatic OpenAPI documentation generation, and ease of use, which makes it an excellent choice for developing APIs quickly.
The project utilizes pre-trained language models from Hugging Face's Transformers library, specifically:
- GPT-2: A transformer-based model for natural language processing tasks, including text generation. It's known for producing human-like text based on given prompts.
- DistilGPT-2: A smaller and more efficient version of GPT-2, providing similar capabilities with reduced computational requirements.
These models are used to generate text, and they are pre-loaded into memory when the FastAPI application starts up. This ensures that text generation can occur quickly without the need to load the models on every request.
When a user sends a prompt to the API, the following happens:
-
The user specifies a prompt along with several optional parameters, such as
max_length
,temperature
,top_p
, andtop_k
.- max_length: Determines the maximum length of the generated text.
- temperature: Controls the randomness of the output (higher values make the output more random).
- top_p and top_k: Parameters for controlling the sampling method, where
top_p
refers to nucleus sampling andtop_k
refers to the number of top tokens considered at each generation step.
-
The API passes this prompt and the parameters to the pre-trained models, and the models generate text based on the prompt.
-
The generated text is then returned as a response to the user.
The API supports multiple models for text generation. By default, it includes GPT-2 and DistilGPT-2, allowing users to select or compare outputs from different models. The generated text for each model is included in the response, giving users multiple options based on the same input prompt.
The application is designed to handle multiple requests concurrently. It uses asynchronous programming to ensure that requests are processed efficiently, particularly when generating text from large models. This prevents the server from blocking while waiting for the model to generate text, allowing it to serve other requests in parallel.
The user has the ability to fine-tune the text generation process by adjusting various parameters:
- temperature: Controls randomness (higher values mean more unpredictable text).
- top_p and top_k: Control how the model samples words, with
top_p
implementing nucleus sampling andtop_k
considering only the topk
most probable tokens. - max_length: Defines the maximum number of tokens to generate.
- num_return_sequences: Specifies how many different sequences of text to generate for a single prompt.
These settings allow for a highly customizable text generation experience, tailored to the needs of different users or use cases.
├── gen
│ ├── main.py # Application entry point, starts FastAPI app
│ ├── repository
│ │ ├── gen.py # Core logic for text generation, including model loading and text generation functions
│ ├── routers
│ │ ├── gen_text.py # Defines the API route for text generation
│ └── schemas.py # Contains request and response schema definitions used by the API
├── README.md # Project documentation, setup instructions, and usage details
└── requirements.txt # Lists the Python dependencies for the project
- Python (>= 3.8)
- Pipenv (optional, for virtual environment management)
git clone https://github.com/Khailas12/Text-Gen-with-FastAPI.git
cd Text-Gen-with-FastAPI
uvicorn gen.main:app --reload
Generate text using multiple models based on the input prompt.
{
"prompt": "string",
"max_length": 100,
"num_return_sequences": 1,
"temperature": 1.0,
"top_p": 0.9,
"top_k": 50
}
{
"generated_texts": {
"gpt2": ["Generated text by GPT-2 model"],
"distilgpt2": ["Generated text by DistilGPT-2 model"]
}
}
- Multiple Models: Supports multiple text-generation models (
gpt2
,distilgpt2
). - Custom Parameters: Configure text generation with parameters like:
temperature
top_p
top_k
max_length
num_return_sequences
- Asynchronous Processing: Handles multiple requests concurrently for scalability.
- Resource Management: Models are loaded at startup and released during shutdown.
- Startup Event: Loads models into memory.
- Shutdown Event: Releases resources and clears loaded models.
- Implements the core logic for text generation using Hugging Face pipelines.
- Handles exceptions and provides sanitized responses.
- Supports concurrent execution with
async_gen_text
.
- Defines the API route for text generation.
- Add More Models: Extend the
models
dictionary ingen/repository/gen.py
. - Modify Default Parameters: Adjust
max_length
,temperature
, or other parameters as needed.
Feel free to open issues or submit pull requests if you'd like to improve this project!
This project is licensed under the MIT License. See the LICENSE file for details.