skrapeit / skrape.it

A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.

Home Page:https://docs.skrape.it

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[QUESTION] Are there cases where BrowserFetcher does not fully support CSR?

pistolcaffe opened this issue · comments

describe what you want to archive
I am going to create a user guide page for my app and I need to crawl that page in my app. (I need to crawl certain urls in the app as well as notion pages)
https://fundevstudio.notion.site/524eafbfa8f2414898d6d8d79f222c05?pvs=4

However, even if �i use the initial BrowserFetcher,cannot get the title of the loaded page.

Please let me know if there is any additional way I can do it.

Code Sample

fun main(args: Array<String>) {
    skrape(BrowserFetcher) {
        request {
            url = "https://fundevstudio.notion.site/524eafbfa8f2414898d6d8d79f222c05?pvs=4"
        }

        response {
            htmlDocument {
                println("title: $titleText")
            }
        }
    }
}

[expect] title: 인사이트 플로우 가이드
[but] title: Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.

If it is not possible, waitUntill property value similar to playwright, puppeteer: load, networkidle, documentLoaded
Please consider providing options.

When using htmlUnit directly, I found the following exception. net.sourceforge.htmlunit.corejs.javascript.EvaluatorException: identifier is a reserved word: class (https://fundevstudio.notion.site/8402-8521e6e24e557272e4c0.js#1)

Since htmlUnit is using an outdated Rhino, I think we may need to consider porting it to a V8 engine or something.

Of course, it's only speculation that the exception caused by the engine is the direct cause. If there is any additional information, I will write a comment.