clemfromspace / scrapy-selenium

Scrapy middleware to handle javascript pages using selenium

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Json API Response's content-type is text/html;charset=UTF-8 by I want application/json;

workji opened this issue · comments

crawling vuejs site' background json data api

  1. The Request:
yield SeleniumRequest(
                    url=json_api_url,
                    wait_time=3,
                    callback=self.parse_api)
  1. The Origin Response:
{"data":{"list":[{"title":"adidas originals Yeezy 450 "Cloud White" H68038"},{"title":"adidas "Have A Good Game" H68038"}],"next":true,"total":2000},"result":1}
  1. I really Get Response:
    def parse_api(self, response):
        json_str = response.xpath('//body/text()').get()
        json_obj = json.loads(json_str)
{"data":{"list":[{"title":"adidas originals Yeezy 450 "Cloud White" H68038"},{"title":"adidas "Have A Good Game" H68038"}],"next":true,"total":2000},"result":1}
  1. The Problem:
json_obj = json.loads(json_str)              <- Go Error
json.decoder.JSONDecodeError: Expecting ',' delimiter: line
  1. The basic reason:
    when response's content-type is text/html;
    the HTML character entities ( &quot; ) changed to ( " ) and destory json format

so, my question is how can i change content-type [ text/html; ] to [ application/json; ] , or how can i avoid ( &quot; ) changed to ( " )
thank you very much!