Missing transcript fetch intermittently
provhatrahman opened this issue · comments
DO NOT DELETE THIS! Please take the time to fill this out properly. I am not able to help you if I do not know what you are executing and what error messages you are getting. If you are having problems with a specific video make sure to include the video id.
To Reproduce
Steps to reproduce the behavior:
- check for manual subs
- if no manual check for auto gen subs
- print transcript
What code / cli command are you executing?
For example: I am running
input_language_code = 'en'
video_url = 'https://www.youtube.com/watch?v=rvxSwwCuXBs'
video_id = video_url.split("watch?v=")[-1]
transcript = get_youtube_transcript(video_id)
def get_youtube_transcript(video_id):
transcript_text = ""
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
try:
# Try to find a manually created transcript in English
manually_created_transcript = transcript_list.find_manually_created_transcript([input_language_code])
full_transcript = manually_created_transcript.fetch()
transcript_text = ' '.join([part['text'] for part in full_transcript])
print(transcript_text)
except:
# If manually created transcript is not found, try to find an auto-generated transcript
print('no manual')
auto_generated_transcript = transcript_list.find_generated_transcript([input_language_code])
print(auto_generated_transcript)
translated_transcript = auto_generated_transcript.translate('en')
print(translated_transcript)
full_transcript = translated_transcript.fetch()
print(full_transcript)
Which Python version are you using?
Python 11.7
Which version of youtube-transcript-api are you using?
youtube-transcript-api 0.6.2
Expected behavior
Describe what you expected to happen.
I expected to receive the english transcript in a list format as such:
no manual
en ("English (auto-generated)")[TRANSLATABLE]
en ("English")
[{'text': "[Music] what's up guys welcome back to my ...]
Actual behaviour
Instead I received the following in the output module:
no manual
en ("English (auto-generated)")[TRANSLATABLE]
en ("English")
[]
i.e. it returns an empty list, despite running the same code previously returning the auto sub'd transcript
Hi @provhatrahman, there's this bug in YouTube where sometimes an empty transcript is returned when you're trying to translate a english subtitles to english. You can work around this by not calling translated_transcript = auto_generated_transcript.translate('en')
when input_language_code == 'en'
.
I'll close this now, as there is not much else to do here.