jdepoix / youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Flaky transcript translation to english

jannostudio opened this issue · comments

DO NOT DELETE THIS! Please take the time to fill this out properly. I am not able to help you if I do not know what you are executing and what error messages you are getting. If you are having problems with a specific video make sure to include the video id.

To Reproduce

Translation often returns empty list.
This is the transcript metadata for the video id. It doesnt seem to be flaky and seems to always contain en as a possible translation language.

from youtube_transcript_api import YouTubeTranscriptApi

transcript_list = YouTubeTranscriptApi.list_transcripts("R83VkHB7X68")

for transcript in transcript_list:
    # Create a formatted string for each translation language available
    translation_languages_formatted = ', '.join([f"{lang['language']} ({lang['language_code']})" for lang in transcript.translation_languages])
    
    # Print the transcript details in a structured format
    print(f"Video ID: {transcript.video_id}\n"
          f"Language: {transcript.language} ({transcript.language_code})\n"
          f"Generated: {'Yes' if transcript.is_generated else 'No'}\n"
          f"Translatable Languages: {translation_languages_formatted}\n"
          "--------------------------------------")

Video ID: R83VkHB7X68
Language: German (auto-generated) (de)
Generated: Yes
Translatable Languages: Afrikaans (af), Akan (ak), Albanian (sq), Amharic (am), Arabic (ar), Armenian (hy), Assamese (as), Aymara (ay), Azerbaijani (az), Bangla (bn), Basque (eu), Belarusian (be), Bhojpuri (bho), Bosnian (bs), Bulgarian (bg), Burmese (my), Catalan (ca), Cebuano (ceb), Chinese (Simplified) (zh-Hans), Chinese (Traditional) (zh-Hant), Corsican (co), Croatian (hr), Czech (cs), Danish (da), Divehi (dv), Dutch (nl), English (en), Esperanto (eo), Estonian (et), Ewe (ee), Filipino (fil), Finnish (fi), French (fr), Galician (gl), Ganda (lg), Georgian (ka), German (de), Greek (el), Guarani (gn), Gujarati (gu), Haitian Creole (ht), Hausa (ha), Hawaiian (haw), Hebrew (iw), Hindi (hi), Hmong (hmn), Hungarian (hu), Icelandic (is), Igbo (ig), Indonesian (id), Irish (ga), Italian (it), Japanese (ja), Javanese (jv), Kannada (kn), Kazakh (kk), Khmer (km), Kinyarwanda (rw), Korean (ko), Krio (kri), Kurdish (ku), Kyrgyz (ky), Lao (lo), Latin (la), Latvian (lv), Lingala (ln), Lithuanian (lt), Luxembourgish (lb), Macedonian (mk), Malagasy (mg), Malay (ms), Malayalam (ml), Maltese (mt), Māori (mi), Marathi (mr), Mongolian (mn), Nepali (ne), Northern Sotho (nso), Norwegian (no), Nyanja (ny), Odia (or), Oromo (om), Pashto (ps), Persian (fa), Polish (pl), Portuguese (pt), Punjabi (pa), Quechua (qu), Romanian (ro), Russian (ru), Samoan (sm), Sanskrit (sa), Scottish Gaelic (gd), Serbian (sr), Shona (sn), Sindhi (sd), Sinhala (si), Slovak (sk), Slovenian (sl), Somali (so), Southern Sotho (st), Spanish (es), Sundanese (su), Swahili (sw), Swedish (sv), Tajik (tg), Tamil (ta), Tatar (tt), Telugu (te), Thai (th), Tigrinya (ti), Tsonga (ts), Turkish (tr), Turkmen (tk), Ukrainian (uk), Urdu (ur), Uyghur (ug), Uzbek (uz), Vietnamese (vi), Welsh (cy), Western Frisian (fy), Xhosa (xh), Yiddish (yi), Yoruba (yo), Zulu (zu)

What code / cli command are you executing?

I am running

from youtube_transcript_api import YouTubeTranscriptApi
​
transcript_list = YouTubeTranscriptApi.list_transcripts("R83VkHB7X68")
​
​
for transcript in transcript_list:
    if transcript.language_code == 'en':
        print("Fetching English transcript directly for video ID:", transcript.video_id)
        # Fetch the English transcript directly
        english_transcript = transcript.fetch()  # This is a placeholder; replace with actual method to fetch transcript
    elif 'en' in [lang['language_code'] for lang in transcript.translation_languages]:
        print("Translating transcript to English for video ID:", transcript.video_id)
        # Translate the transcript to English
        english_transcript = transcript.translate("en").fetch()
    else:
        print("English transcript not available for video ID:", transcript.video_id)
        english_transcript = None
​
    print(english_transcript)
Translating transcript to English for video ID: R83VkHB7X68
[]

Which Python version are you using?

Tested on
Python [3.10.13 and 3.10.11 ]

Which version of youtube-transcript-api are you using?

youtube-transcript-api 0.6.2

Expected behavior

Describe what you expected to happen.

I expected to receive the english transcript

Actual behaviour

I either receive the translation as requested, or i receive an empty list

[]

Just fyi, this problem has been occuring less and less over the last couple of days and i currently can rarely reproduce it. I assume its an issue on youtubes end.

Hi @jannostudio,
thanks for the update! In fact, I have received multiple reports of YouTubes translation API being a bit flaky. It most commonly happens when you're trying to translate from en to en (not that this would make sense), where it just returns an empty list. I don't think that there's anything we can do about this unfortunately, as this is out of control of this module. Therefore, I will close this issue.