LispCookbook / cl-cookbook

The Common Lisp Cookbook

Home Page:http://lispcookbook.github.io/cl-cookbook/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to do Dynamic Web Scraping?

cl-03 opened this issue · comments

commented

As more and more websites are dynamic web,how could we crawl the dynamic web?
《The Common Lisp Cookbook》 havn't introduce yet.

Hi, I am not sure this is a target for the Cookbook. This requires to inspect each website separately, to know the web, to use a scraper that hides a browser to evaluate JS etc. It is also a wide topic.

commented

In the chapter --Web Scraping of 《The Common Lisp Cookbook》,when I change a new website(url) as test object.

(ql:quickload '("dexador" "plump" "lquery" "lparallel"))
(defvar *url* "https://portal.astronergy.com/")
(defvar *request* (dex:get  *url* :basic-auth  '("chuangxiu.chen" . "CX4644cx") :verbose t))
(defvar *parsed-content* (lquery:$ (initialize *request*)))
(defvar *css-selector* "#content li") ;;;the *css-selector*can changed according to my needs
(lquery:$ *parsed-content* "#content li")

then it return a null array that I can't do another operates.

CL-USER> (lquery:$ *parsed-content* "#content li")
#()

In my opinions, in order to help the development of COMMON LISP,the book 《The Common Lisp Cookbook》must be practical.Otherwise,It maybe a jok that follow the book can only handle simple situations which are static sites.
Are there any useful handlers that we can do between request-step use dex:get and parse-the-page-step use plump?
Just as you said :"that hides a browser to evaluate JS ",then the dynamic page generate complete html page . Would you mind give me a guides,some related books or exampes or CommonLisp-packages?

See tutorials using PhantomJS, headless Firefox or Chrome (https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Headless_mode), Selenium, TestCafe etc. There's a CL interface to Selenium I think (https://github.com/search?l=Common+Lisp&q=cl-selenium&type=Repositories).

the book 《The Common Lisp Cookbook》must be practical.Otherwise,It maybe a jok that follow the book can only handle simple situations which are static sites.

Sure, but there is only so much we can do… and this topic is very wide, and not specific to CL. If you find a good Selenium binding then why not add a recipe.

I invite you to ask programming questions on another forum such as Stack Overflow.

commented

Thank you a lot.I'll have a try.

see this, it works for Selenium 4: