go-rod / rod

A Chrome DevTools Protocol driver for web automation and scraping.

Home Page:https://go-rod.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Panic on .HTML()

bazuker opened this issue · comments

Rod Version: v0.116.1

The following code panics when frame.HTML() is called. frame is confirmed not to be nil.

I can provide full iframe HTML code if necessary.

The code to demonstrate your question

	hasVerify, cloudflareIframe, err := page.Has("iframe[src*='https://challenges.cloudflare.com']")
	if err == nil && hasVerify {
		log.Println("human verification detected")
		cloudflareIframe.MustWaitStable()
		log.Println("trying to pass")
		cf, err := page.Element("iframe")
		if err != nil {
			return nil, fmt.Errorf("failed to get cloudflare iframe: %w", err)
		}
		log.Println("got iframe")
		frame, err := cf.Frame()
		if err != nil {
			return nil, fmt.Errorf("failed to unwrap cloudflare frame: %w", err)
		}
		log.Println("targeted", frame)
		fmt.Println(frame.HTML()) // <---- PANICS HERE
	}

Log and stack trace

2024/06/26 21:38:49 human verification detected
2024/06/26 21:38:50 trying to pass
2024/06/26 21:38:50 got iframe
2024/06/26 21:38:50 targeted <page:6B33B27E>

panic recovered:
runtime error: invalid memory address or nil pointer dereference
/usr/local/go/src/runtime/panic.go:261 (0x102701377)
        panicmem: panic(memoryError)
/usr/local/go/src/runtime/signal_unix.go:881 (0x102701344)
        sigpanic: panicmem()
/Users/bazuker/go/pkg/mod/github.com/go-rod/rod@v0.116.1/page_eval.go:350 (0x1029deaac)
        (*Page).getJSCtxID: obj, err := proto.DOMResolveNode{BackendNodeID: node.ContentDocument.BackendNodeID}.Call(p)
/Users/bazuker/go/pkg/mod/github.com/go-rod/rod@v0.116.1/page_eval.go:249 (0x1029de0bb)
        (*Page).ensureJSHelper: jsCtxID, err := p.getJSCtxID()
/Users/bazuker/go/pkg/mod/github.com/go-rod/rod@v0.116.1/page_eval.go:234 (0x1029ddeeb)
        (*Page).formatArgs: id, err := p.ensureJSHelper(obj)
/Users/bazuker/go/pkg/mod/github.com/go-rod/rod@v0.116.1/page_eval.go:150 (0x1029ddb2b)
        (*Page).evaluate: args, err := p.formatArgs(opts)
/Users/bazuker/go/pkg/mod/github.com/go-rod/rod@v0.116.1/page_eval.go:129 (0x1029dd6a7)
        (*Page).Evaluate: res, err = p.evaluate(opts)
/Users/bazuker/go/pkg/mod/github.com/go-rod/rod@v0.116.1/query.go:172 (0x1029dfa4b)
        (*Page).ElementByJS.func2: res, err = p.Evaluate(opts.ByObject())
/Users/bazuker/go/pkg/mod/github.com/go-rod/rod@v0.116.1/lib/utils/sleeper.go:140 (0x10287b85b)
        Retry: stop, err := fn()
/Users/bazuker/go/pkg/mod/github.com/go-rod/rod@v0.116.1/query.go:167 (0x1029df8df)
        (*Page).ElementByJS: err = utils.Retry(p.ctx, p.sleeper(), func() (bool, error) {
/Users/bazuker/go/pkg/mod/github.com/go-rod/rod@v0.116.1/query.go:143 (0x1029df607)
        (*Page).Element: return p.ElementByJS(evalHelper(js.Element, selector))
/Users/bazuker/go/pkg/mod/github.com/go-rod/rod@v0.116.1/page.go:106 (0x1029d9ef7)
        (*Page).HTML: el, err := p.Element("html")
/Users/bazuker/go/src/github.com/bazuker/hikebook/hikes_plugin.go:162 (0x102aba113)
        (*HikesPlugin).Run: fmt.Println(frame.HTML())
/Users/bazuker/go/pkg/mod/github.com/bazuker/browserbro@v1.0.2/pkg/manager/manager.go:190 (0x102ab8a8f)
        (*Manager).loadPlugins.func1: results, err := plugin.Run(params)
/Users/bazuker/go/pkg/mod/github.com/gin-gonic/gin@v1.10.0/context.go:185 (0x102aaac23)
        (*Context).Next: c.handlers[c.index](c)
/Users/bazuker/go/pkg/mod/github.com/gin-gonic/gin@v1.10.0/recovery.go:102 (0x102aaac04)
        CustomRecoveryWithWriter.func1: c.Next()
/Users/bazuker/go/pkg/mod/github.com/gin-gonic/gin@v1.10.0/context.go:185 (0x102aa686f)
        (*Context).Next: c.handlers[c.index](c)
/Users/bazuker/go/pkg/mod/github.com/bazuker/browserbro@v1.0.2/pkg/manager/middleware.go:19 (0x102ab855f)
        (*Manager).Run.loggerMiddleware.func4: c.Next()
/Users/bazuker/go/pkg/mod/github.com/gin-gonic/gin@v1.10.0/context.go:185 (0x102aa9ae3)
        (*Context).Next: c.handlers[c.index](c)
/Users/bazuker/go/pkg/mod/github.com/gin-gonic/gin@v1.10.0/gin.go:633 (0x102aa966c)
        (*Engine).handleHTTPRequest: c.Next()
/Users/bazuker/go/pkg/mod/github.com/gin-gonic/gin@v1.10.0/gin.go:589 (0x102aa93b3)
        (*Engine).ServeHTTP: engine.handleHTTPRequest(c)
/usr/local/go/src/net/http/server.go:3137 (0x10299558b)
        serverHandler.ServeHTTP: handler.ServeHTTP(rw, req)
/usr/local/go/src/net/http/server.go:2039 (0x1029918c7)
        (*conn).serve: serverHandler{c.server}.ServeHTTP(w, w.req)
/usr/local/go/src/runtime/asm_arm64.s:1222 (0x10271f473)
        goexit: MOVD    R0, R0  // NOP

Please fix the format of your markdown:

31 MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```"]
31 MD040/fenced-code-language Fenced code blocks should have a language specified [Context: "```"]

generated by check-issue

Just confirmed that almost any operation on that frame will results in panic. I tried frame.Element, frame.Has etc

This code works fine to me, no matter it's headless or not:

package main

import (
	"fmt"

	"github.com/go-rod/rod"
	"github.com/go-rod/rod/lib/launcher"
)

func main() {
	u := launcher.New().Headless(false).MustLaunch()
	page := rod.New().ControlURL(u).MustConnect().MustPage("https://dash.cloudflare.com/sign-up")
	f := page.MustElement(`iframe[src*="https://challenges.cloudflare.com"]`).MustFrame()
	fmt.Println(f.MustElement("#success").MustHTML())
}

@ysmood you can try for yourself on this page
Select a park, pick a date and time and press next. You will see cloudflare iframe that makes the code panic.

Also, I am running a managed version of Rod in docker, if that makes a difference.

My configuration:

	l = launcher.MustNewManaged(serviceURL).
		UserDataDir(userDataDir).
		Headless(false).
		Devtools(false).
		Leakless(true).XVFB("--server-num="+strconv.Itoa(serverID), "--server-args=-screen 0 1600x900x16")
	l.NoSandbox(true)
	l.Set("disable-web-security")
	l.Set("disable-blink-features", "AutomationControlled")
	l.Delete("enable-automation")
	l.Delete("disable-site-isolation-trials")

	br.browser.Client(l.MustClient())
	err = br.browser.Connect()
	if err != nil {
		return fmt.Errorf("failed to connect to browser: %w", err)
	}
	br.browser.MustIncognito()

how to bypass cloudflare challenge?

@bazuker I even can't open the page, when I use my personal browser to navigate to it I got a blank page and error in console:

CleanShot 2024-06-29 at 11 29 19@2x

@ysmood not sure where you are geographically located but this website definitely works in North America. This is the official website of British Columbia recreational parks and trails booking

It works fine to me:

CleanShot.2024-07-04.at.14.03.07-converted.mp4