re3turn / twicrawler

Crawling and upload videos and photos from the Twitter timeline to Google Photos

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

User-Agentを指定しないとInstagramのページを取得できない

re3turn opened this issue · comments

User-Agentを指定しないと、requests.get() でInstagramのページ取得できなくなっている

   def __init__(self, url: str) -> None:
        self.url = url
        self.headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6)' \
                                      'AppleWebKit/537.36 (KHTML, like Gecko)' \
                                      'Chrome/85.0.4183.102' }

    def _get_json_data(self) -> dict:
        res = requests.get(self.url, headers=self.headers)
        html = BeautifulSoup(res.content, 'html.parser')

        pattern = re.compile('window._sharedData = ({.*?});')
        script = html.find('script', text=pattern)
        data = pattern.search(script.text).group(1)  # type: ignore
        json_user_data = json.loads(data)

        return json_user_data