Why my ouput wrong encoding rendering
skanel opened this issue · comments
import net.ruippeixotog.scalascraper.browser.JsoupBrowser
import net.ruippeixotog.scalascraper.dsl.DSL._
import net.ruippeixotog.scalascraper.dsl.DSL.Extract._
import net.ruippeixotog.scalascraper.dsl.DSL.Parse._
object Scraper {
val browser = JsoupBrowser()
val doc = browser.get("http://camhr.com")
def main(args: Array[String]): Unit = {
// Extract the <span> elements inside #menu
val items = doc >?> element("#footer")
print(items)
}
}
What I see in website is in English, but when I run this code I get in Chinese.
Hi @skanel, it seems that the site you mentioned sends the content in Chinese when the HTTP client doesn't specify an Accept-Language
header (which most, if not all, browsers send automatically).
If you create your browser like this:
import org.jsoup.Connection
val browser = new JsoupBrowser() {
override def requestSettings(conn: Connection) =
conn.header("Accept-Language", "en-US,en;q=0.8,pt;q=0.6")
}
You should be able to get all visible parts of the page in English.