imharshanand / yt_summary_ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Here is a detailed README file for your project:


YouTube Video Transcript Summarizer

Project Overview

The YouTube Video Transcript Summarizer is a Python-based tool that extracts transcripts from YouTube videos and summarizes them using a pre-trained T5 model. The project leverages several powerful libraries and APIs to perform these tasks efficiently, making it easier to get concise summaries of lengthy video content.

Key Features

  • YouTube API Integration: Fetches the latest videos or specific video transcripts from a YouTube channel using the YouTube Data API.
  • Transcript Extraction: Retrieves transcripts of YouTube videos using the youtube-transcript-api.
  • Summarization with T5 Model: Uses the T5 (Text-to-Text Transfer Transformer) model from the Hugging Face Transformers library to summarize the extracted transcripts.
  • GPU Acceleration: Utilizes NVIDIA GPUs for faster model inference if available.

Project Structure

  • Model Setup: The project downloads and saves the T5 model and tokenizer locally for reuse, minimizing the need to redownload the model.
  • Text Processing: Includes functions for cleaning and preparing transcript text to ensure high-quality input for the summarization model.
  • Summarization: Implements a function to generate summaries using the T5 model, which condenses the content of the transcripts into concise summaries.

Detailed Description

What We Did

  1. API Integration:

    • Integrated the YouTube Data API to fetch video details and transcripts.
    • Utilized youtube-transcript-api to extract video transcripts.
  2. Model Utilization:

    • Employed the T5 model from the Hugging Face Transformers library for text summarization.
    • Saved the model and tokenizer locally to avoid redundant downloads and improve efficiency.
  3. Environment Setup:

    • Ensured compatibility with both GPU and CPU for model inference.
    • Provided detailed instructions for setting up the required environment and dependencies.
  4. Summarization Pipeline:

    • Implemented a text processing pipeline to clean and prepare transcripts for summarization.
    • Developed a summarization function that generates concise summaries of the transcripts.
  5. User Interaction:

    • Allowed users to input a YouTube video URL to get the summary of the video transcript.
    • Provided clear and detailed output, including the extracted video ID, transcript, and the generated summary.

Installation

To set up the project, you need to install the required libraries. Use the following commands:

pip install transformers
pip install torch
pip install google-api-python-client
pip install youtube-transcript-api

Acknowledgments

  • Hugging Face Transformers: For providing the T5 model and tokenizer.
  • YouTube Data API: For enabling access to video details and transcripts.
  • YouTube Transcript API: For facilitating easy extraction of YouTube video transcripts.

About


Languages

Language:Jupyter Notebook 90.1%Language:Python 9.9%