jdepoix / youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add parameter verify to deal with SSL certification error

bhishanpdl opened this issue · comments

I am working on Windows Computer and get SSL Ceritication Error whenever I am trying to use requests.get in any website. We can pass the parameter verify = path_to_certificate and it works.

Can we add similar parameter in YouTubeTranscriptApi?

For example, this example did not work for me:

from youtube_transcript_api import YouTubeTranscriptApi

video_id = r'https://www.youtube.com/watch?v=poBfOPFGgUU'
YouTubeTranscriptApi.get_transcript(video_id)

Error

SSLError: HTTPSConnectionPool(host='[www.youtube.com](https://www.youtube.com/)', port=443): Max retries exceeded with url: /watch?v=https://www.youtube.com/watch?v=poBfOPFGgUU (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:992)')))

Suggestions

  • We can manually download the certificate from youtube.com and pass the path
from youtube_transcript_api import YouTubeTranscriptApi

video_id = r'https://www.youtube.com/watch?v=poBfOPFGgUU'

path_youtube_certificate = r'GTS Root R1.crt' # download this file from youtube.com and put in working directory
YouTubeTranscriptApi.get_transcript(video_id,verify=path_youtube_certificate)

I also tried to download certificate from youtube and placed in working directory. However, I am getting SSL Certification error. Can we circumvent this somehow and make the code working:

from youtube_transcript_api import YouTubeTranscriptApi
import os

try:
    os.environ["REQUESTS_CA_BUNDLE"] = r"GTS Root R1.crt" # downloaded from youtube.com and placed at PWD
    video_id = r'https://www.youtube.com/watch?v=poBfOPFGgUU'
    YouTubeTranscriptApi.get_transcript(video_id)
except BaseException as ex:
    print(str(ex))

Error

HTTPSConnectionPool(host='[www.youtube.com](https://www.youtube.com/)', port=443): Max retries exceeded with url: /watch?v=poBfOPFGgUU (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:992)')))

I have the same issue. Has anybody else thought of a way to get around it?

Hi @bhishanpdl,
Could it be that setting REQUESTS_CA_BUNDLE is not working because you are not using an absolute path? Maybe try with an absolute path.
Alternatively, instead of using the high-level YouTubeTranscriptApi interface, you could use the TranscriptListFetcher instead (which is also what's used by YouTubeTranscriptApi). It is currently not exposed nor documented (which I should probably change) but you can still import it with from youtube_transcript_api._transcripts import TranscriptListFetcher. You can then use it as such:

from youtube_transcript_api._transcripts import TranscriptListFetcher
import requests

session = requests.Session()
session.verify = "/path/to/issuer's certificate"
TranscriptListFetcher(session).fetch(video_id)

This one worked out for me using a .pem file

from youtube_transcript_api import YouTubeTranscriptApi
import os

os.environ["REQUESTS_CA_BUNDLE"] = r"path_to_your_pem_file.pem" # downloaded from youtube.com
YouTubeTranscriptApi.get_transcript(video_id)

I will close this now, as the issue seem resolved! 😊