- 无设限的抓取:
- import requests
- resp=requests.get("https://www.douban.com/people/62513788/status/2017535708/")
- html_str=resp.text
- from bs4 import BeautifulSoup
- document=BeautifulSoup(html_str,"html.parser")
- document
- 设限抓取-加header:
- import requests
- import urllib.request
- url = "http://www.xiami.com/search/song-lyric/page/1?spm=a1z1s.3521881.0.0.&key=%E6%9E%97%E5%A4%95"
- req = urllib.request.Request(url)
- req.add_header('User-Agent','Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36')
- resp = urllib.request.urlopen(req)
- mainpage = resp.read()
- document=BeautifulSoup(mainpage,"html.parser")
- document