Flaky transcript translation to english
jannostudio opened this issue · comments
DO NOT DELETE THIS! Please take the time to fill this out properly. I am not able to help you if I do not know what you are executing and what error messages you are getting. If you are having problems with a specific video make sure to include the video id.
To Reproduce
Translation often returns empty list.
This is the transcript metadata for the video id. It doesnt seem to be flaky and seems to always contain en as a possible translation language.
from youtube_transcript_api import YouTubeTranscriptApi
transcript_list = YouTubeTranscriptApi.list_transcripts("R83VkHB7X68")
for transcript in transcript_list:
# Create a formatted string for each translation language available
translation_languages_formatted = ', '.join([f"{lang['language']} ({lang['language_code']})" for lang in transcript.translation_languages])
# Print the transcript details in a structured format
print(f"Video ID: {transcript.video_id}\n"
f"Language: {transcript.language} ({transcript.language_code})\n"
f"Generated: {'Yes' if transcript.is_generated else 'No'}\n"
f"Translatable Languages: {translation_languages_formatted}\n"
"--------------------------------------")
Video ID: R83VkHB7X68
Language: German (auto-generated) (de)
Generated: Yes
Translatable Languages: Afrikaans (af), Akan (ak), Albanian (sq), Amharic (am), Arabic (ar), Armenian (hy), Assamese (as), Aymara (ay), Azerbaijani (az), Bangla (bn), Basque (eu), Belarusian (be), Bhojpuri (bho), Bosnian (bs), Bulgarian (bg), Burmese (my), Catalan (ca), Cebuano (ceb), Chinese (Simplified) (zh-Hans), Chinese (Traditional) (zh-Hant), Corsican (co), Croatian (hr), Czech (cs), Danish (da), Divehi (dv), Dutch (nl), English (en), Esperanto (eo), Estonian (et), Ewe (ee), Filipino (fil), Finnish (fi), French (fr), Galician (gl), Ganda (lg), Georgian (ka), German (de), Greek (el), Guarani (gn), Gujarati (gu), Haitian Creole (ht), Hausa (ha), Hawaiian (haw), Hebrew (iw), Hindi (hi), Hmong (hmn), Hungarian (hu), Icelandic (is), Igbo (ig), Indonesian (id), Irish (ga), Italian (it), Japanese (ja), Javanese (jv), Kannada (kn), Kazakh (kk), Khmer (km), Kinyarwanda (rw), Korean (ko), Krio (kri), Kurdish (ku), Kyrgyz (ky), Lao (lo), Latin (la), Latvian (lv), Lingala (ln), Lithuanian (lt), Luxembourgish (lb), Macedonian (mk), Malagasy (mg), Malay (ms), Malayalam (ml), Maltese (mt), Māori (mi), Marathi (mr), Mongolian (mn), Nepali (ne), Northern Sotho (nso), Norwegian (no), Nyanja (ny), Odia (or), Oromo (om), Pashto (ps), Persian (fa), Polish (pl), Portuguese (pt), Punjabi (pa), Quechua (qu), Romanian (ro), Russian (ru), Samoan (sm), Sanskrit (sa), Scottish Gaelic (gd), Serbian (sr), Shona (sn), Sindhi (sd), Sinhala (si), Slovak (sk), Slovenian (sl), Somali (so), Southern Sotho (st), Spanish (es), Sundanese (su), Swahili (sw), Swedish (sv), Tajik (tg), Tamil (ta), Tatar (tt), Telugu (te), Thai (th), Tigrinya (ti), Tsonga (ts), Turkish (tr), Turkmen (tk), Ukrainian (uk), Urdu (ur), Uyghur (ug), Uzbek (uz), Vietnamese (vi), Welsh (cy), Western Frisian (fy), Xhosa (xh), Yiddish (yi), Yoruba (yo), Zulu (zu)
What code / cli command are you executing?
I am running
from youtube_transcript_api import YouTubeTranscriptApi
transcript_list = YouTubeTranscriptApi.list_transcripts("R83VkHB7X68")
for transcript in transcript_list:
if transcript.language_code == 'en':
print("Fetching English transcript directly for video ID:", transcript.video_id)
# Fetch the English transcript directly
english_transcript = transcript.fetch() # This is a placeholder; replace with actual method to fetch transcript
elif 'en' in [lang['language_code'] for lang in transcript.translation_languages]:
print("Translating transcript to English for video ID:", transcript.video_id)
# Translate the transcript to English
english_transcript = transcript.translate("en").fetch()
else:
print("English transcript not available for video ID:", transcript.video_id)
english_transcript = None
print(english_transcript)
Translating transcript to English for video ID: R83VkHB7X68
[]
Which Python version are you using?
Tested on
Python [3.10.13 and 3.10.11 ]
Which version of youtube-transcript-api are you using?
youtube-transcript-api 0.6.2
Expected behavior
Describe what you expected to happen.
I expected to receive the english transcript
Actual behaviour
I either receive the translation as requested, or i receive an empty list
[]
Just fyi, this problem has been occuring less and less over the last couple of days and i currently can rarely reproduce it. I assume its an issue on youtubes end.
Hi @jannostudio,
thanks for the update! In fact, I have received multiple reports of YouTubes translation API being a bit flaky. It most commonly happens when you're trying to translate from en to en (not that this would make sense), where it just returns an empty list. I don't think that there's anything we can do about this unfortunately, as this is out of control of this module. Therefore, I will close this issue.