[QUESTION] Are there cases where BrowserFetcher does not fully support CSR?

Question

[QUESTION] Are there cases where BrowserFetcher does not fully support CSR?

pistolcaffe opened this issue a year ago · comments

describe what you want to archive
I am going to create a user guide page for my app and I need to crawl that page in my app. (I need to crawl certain urls in the app as well as notion pages)
https://fundevstudio.notion.site/524eafbfa8f2414898d6d8d79f222c05?pvs=4

However, even if �i use the initial BrowserFetcher,cannot get the title of the loaded page.

Please let me know if there is any additional way I can do it.

Code Sample

fun main(args: Array<String>) {
    skrape(BrowserFetcher) {
        request {
            url = "https://fundevstudio.notion.site/524eafbfa8f2414898d6d8d79f222c05?pvs=4"
        }

        response {
            htmlDocument {
                println("title: $titleText")
            }
        }
    }
}

[expect] title: 인사이트 플로우 가이드
[but] title: Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.

If it is not possible, waitUntill property value similar to playwright, puppeteer: load, networkidle, documentLoaded
Please consider providing options.

pistolcaffe · Answer 1 · Mon Jun 26 2023 12:59:00 GMT+0800 (China Standard Time)

When using htmlUnit directly, I found the following exception. net.sourceforge.htmlunit.corejs.javascript.EvaluatorException: identifier is a reserved word: class (https://fundevstudio.notion.site/8402-8521e6e24e557272e4c0.js#1)

Since htmlUnit is using an outdated Rhino, I think we may need to consider porting it to a V8 engine or something.

Of course, it's only speculation that the exception caused by the engine is the direct cause. If there is any additional information, I will write a comment.