jdepoix / youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Missing transcript fetch intermittently

provhatrahman opened this issue · comments

DO NOT DELETE THIS! Please take the time to fill this out properly. I am not able to help you if I do not know what you are executing and what error messages you are getting. If you are having problems with a specific video make sure to include the video id.

To Reproduce

Steps to reproduce the behavior:

  • check for manual subs
  • if no manual check for auto gen subs
  • print transcript

What code / cli command are you executing?

For example: I am running

input_language_code = 'en' 
video_url = 'https://www.youtube.com/watch?v=rvxSwwCuXBs'
video_id = video_url.split("watch?v=")[-1]
transcript = get_youtube_transcript(video_id)

def get_youtube_transcript(video_id):
    transcript_text = ""
    transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
    try:
        # Try to find a manually created transcript in English
        manually_created_transcript = transcript_list.find_manually_created_transcript([input_language_code])
        full_transcript = manually_created_transcript.fetch()
        transcript_text = ' '.join([part['text'] for part in full_transcript])
        print(transcript_text)
    except:
        # If manually created transcript is not found, try to find an auto-generated transcript
        print('no manual')
        auto_generated_transcript = transcript_list.find_generated_transcript([input_language_code])
        print(auto_generated_transcript)
        translated_transcript = auto_generated_transcript.translate('en')
        print(translated_transcript)
        full_transcript = translated_transcript.fetch()
        print(full_transcript)

Which Python version are you using?

Python 11.7

Which version of youtube-transcript-api are you using?

youtube-transcript-api 0.6.2

Expected behavior

Describe what you expected to happen.

I expected to receive the english transcript in a list format as such:

no manual
en ("English (auto-generated)")[TRANSLATABLE]
en ("English")
[{'text': "[Music]  what's up guys welcome back to my ...]

Actual behaviour

Instead I received the following in the output module:

no manual
en ("English (auto-generated)")[TRANSLATABLE]
en ("English")
[]

i.e. it returns an empty list, despite running the same code previously returning the auto sub'd transcript

Hi @provhatrahman, there's this bug in YouTube where sometimes an empty transcript is returned when you're trying to translate a english subtitles to english. You can work around this by not calling translated_transcript = auto_generated_transcript.translate('en') when input_language_code == 'en'.
I'll close this now, as there is not much else to do here.