TextFormatter not working with format_transcripts
angel-luis opened this issue · comments
To Reproduce
Steps to reproduce the behavior:
transcripts = YouTubeTranscriptApi.get_transcripts(video_ids, languages=['en', 'es'])
formatter = TextFormatter()
formatter.format_transcripts(transcripts)
What code / cli command are you executing?
python my_file.py
Which Python version are you using?
Python 3.11.4
Which version of youtube-transcript-api are you using?
youtube-transcript-api 0.6.1
Expected behavior
The same code is working with JSONFormatter and PrettyPrintFormatter.
Actual behaviour
Instead I received the following error message:
line 71, in <genexpr>
return '\n'.join(line['text'] for line in transcript)
~~~~^^^^^^^^
TypeError: string indices must be integers, not 'str'
I provide a solution that is working for me:
def format_transcript(self, transcript, **kwargs):
video_id = list(transcript[0].keys())[0]
return '\n'.join(line['text'] for line in transcript[0][video_id])
Hi @angel-luis,
get_transcripts
returns a tuple containing a dict of transcripts and a list of videos which could not be retrieved (({str: [{'text': str, 'start': float, 'end': float}]}, [str]})
). However, the param for format_transcripts
should be a list of transcripts. So you will have to transform the output of get_transcripts
to a list of transcripts before using format_transcript
. Like:
transcript_dict, _ = YouTubeTranscriptApi.get_transcripts(video_ids, languages=['en', 'es'])
formatter.format_transcript(transcript_dict.values())
The code which you provided will only format the transcript of the first video in the list. If your list actually just contains one video, you can simply use formatter.format_transcript(YouTubeTranscriptApi.get_transcript(video_ids[0]))
instead.
I agree that the docstrings aren't very clear here.
I find it inconvenient here, as JSONFomatter can take in the result of YouTubeTranscriptApi.get_transcripts(video_ids, languages=['en', 'es'])
, but TextFormatter can't.
I'm still getting TypeError: string indices must be integers, not 'str'
and TypeError: list indices must be integers or slices, not str
. I think I can get the correct output soon.