chromedp / chromedp

A faster, simpler way to drive browsers supporting the Chrome DevTools Protocol.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Setting extra HTTP header results in different Javascript evaluation in DevTools console

edoardottt opened this issue · comments

TL;DR: Setting an extra HTTP header changes the Javascript evaluation in the Dev Tools console.

What versions are you running?

$ go list -m github.com/chromedp/chromedp
github.com/chromedp/chromedp v0.9.4
$ google-chrome --version
Google Chrome 121.0.6167.139
$ go version
go version go1.21.1 linux/amd64

What did you do? Include clear steps.

When I run this snippet:

var res string
	err := chromedp.Run(ctx, chromedp.Tasks{
		chromedp.Navigate(targetURL),
		chromedp.EvaluateAsDevTools(js, &res)},
	)

	return res, err

The result is a correct Javascript evaluation.

When I use instead this one (headers is map[string]interface{}{"Test":"test"}):

var res string
	err := chromedp.Run(ctx, chromedp.Tasks{
		network.Enable(),
		network.SetExtraHTTPHeaders(network.Headers(headers)),
		chromedp.Navigate(targetURL),
		chromedp.EvaluateAsDevTools(js, &res)},
	)

	return res, err

The result is err: encountered an undefined value

What did you expect to see?

Just setting a simple HTTP header like Test: test with the same result above, so a correct Javascript evaluation.

What did you see instead?

err != nil with err being encountered an undefined value

@edoardottt I'd need to see the rest of the script to really understand what you're doing. I quickly wrote this just now, which properly returns the javascript value:

package main

import (
	"context"
	"flag"
	"fmt"
	"os"

	"github.com/chromedp/cdproto/network"
	"github.com/chromedp/chromedp"
)

func main() {
	urlstr := flag.String("url", "https://google.com/", "url")
	flag.Parse()
	if err := run(context.Background(), *urlstr); err != nil {
		fmt.Fprintf(os.Stderr, "error: %v\n", err)
		os.Exit(1)
	}
}

func run(ctx context.Context, urlstr string) error {
	ctx, cancel := chromedp.NewContext(ctx)
	defer cancel()

	const script = `"a string"`

	headers := network.Headers{
		"x-header": "a header",
	}
	var res string
	err := chromedp.Run(ctx,
		network.Enable(),
		network.SetExtraHTTPHeaders(headers),
		chromedp.Navigate(urlstr),
		chromedp.EvaluateAsDevTools(script, &res),
	)
	fmt.Fprintf(os.Stdout, "err: %v\ngot: %q\n", err, res)
	return err
}

Running it:

$ go run main.go 
err: <nil>
got: "a string"

@edoardottt I don't specifically inject a lot of headers on when scraping with chromedp, as usually I'm probably not doing anything that would "need" a full blown Chrome instance when manipulating the headers directly. However, if I had to guess, it's maybe because you're sending a non string type as the value? Please note that network.Headers is rightly a map[string]interface{} which corresponds to the generic JSON object type.

From the PDL, you can see that the runtime.Headers file in the Chromium source tree is this:

  # Network domain allows tracking network activities of the page. It exposes information about http,
  # file, data and other requests and responses, their headers, bodies, timing, etc.
  domain Network
    depends on Debugger
    depends on Runtime
    depends on Security

    # Request / response headers as keys / values of JSON object.    
    type Headers extends object    

I am inferring here that Chrome is rejecting the set header request, because of badly formatted data. I would try changing the values you're sending as strings. This likely is causing a silent error that you're not catching in your script.

Apologies if this is the case, but chromedp/cdp more or less respects the defined protocol. I'll do some testing on my end to see if this is the likely cause.

I'm looking through your pphack repo that you linked here, and I don't see the specific payload you're trying to inject. Could you share an actual complex script and the actual header values you're injecting?

Hi @kenshaw, thank you so much for your reply.

I've added a new branch (https://github.com/edoardottt/pphack/tree/add-headers) in order to show how I use the headers in chromedp.

To make a test I comment these two lines (https://github.com/edoardottt/pphack/blob/add-headers/pkg/scan/chrome.go#L50-L51, network.Enable and network.SetExtraHTTPHeaders). Then I execute:

echo https://edoardottt.github.io/pp-test/ | go run cmd/pphack/main.go -H "test:test" -v

and the output is https://edoardottt.github.io/pp-test/?constructor.prototype. ... confirming that the err is nil and the JS evaluation is performed correctly.

if instead I use those two lines (not commenting them) and I use the same command I get no std output and this error:

[ERR] encountered an undefined value

Using a proxy I can see Test: test in the HTTP headers, so I guess the headers are set correctly.

I'll look at this further. BTW -- if you haven't already, you should try turning on the debug logging to see the messages going back and forth, as it might be helpful:

ectx, ecancel := chromedp.NewExecAllocator(context.Background(), copts...)
pctx, pcancel := chromedp.NewContext(ectx, chromedp.WithDebugf(log.Printf))

(in your project's scan/chrome.go)

Are you expecting different results based on the User-Agent?

Regarding the debug I've tried to look at it, but I don't see anything weird tbh. If someone can understand better I can provide those logs too. But as far as I can understand, the request is sent with the proper headers.

Are you expecting different results based on the User-Agent?

No, I just want to add some extra headers

So -- I believe the network.Enable() call is being called, and yours is resetting the UA. From what I can tell on the output, chromedp is working as intended, as it appears from the cdp protocol messages everything is sent/received correctly.

Specifically the error you are getting is because the JS value window.xxxx is not present. That value can't be unmarshaled to a string, asundefined doesn't have a corresponding value in Go that it could be unmarshaled to. The error here should be more of a "invalid destination type" or some such. Note that you could capture the actual raw value and then evaluate after the fact if it is a string or something else.

So -- I believe the network.Enable() call is being called, and yours is resetting the UA.

So I have to use something like SetUserAgentOverride for this, but that's not the point here...

window.xxxx is not present

How? Why setting an extra HTTP header like Test: test should change the JS evaluation of a static website? I'm still not understanding

I've checked and this behavior is present in other similar tools, e.g. https://github.com/kosmosec/proto-find/

I have no idea why it's not present. You can play around with this code:

func Scan(ctx context.Context, headers map[string]interface{}, js, targetURL string) (string, error) {
	var res *runtime.RemoteObject
	err := chromedp.Run(ctx, chromedp.Tasks{
		network.SetExtraHTTPHeaders(network.Headers(headers)),
		chromedp.Navigate(targetURL),
		chromedp.EvaluateAsDevTools(js, &res),
	})

	var s string
	if res.Type == runtime.TypeString { // this is also just "string"
		s = string(res.Value)
	}
	log.Printf("s: %q -- %v", s, err)
	return s, err
}

Unfortunately, I'm not able to dig further into your code. Please update here if you find the issue.

Okay, thanks for your help though.

Why you removed network.Enable? Is not necessary?

Because the package has to enable it for the base actions to work out of the box. As such, there's no need for an additional call.

@kenshaw Using the debug I've got something:

In the second one there's the error:

2024/02/05 09:42:53 <- {"method":"Runtime.exceptionThrown","params":{"timestamp":1.707122573427766e+12,"exceptionDetails":{"exceptionId":1,"text":"Uncaught","lineNumber":244,"columnNumber":2,"scriptId":"4","url":"https://rawcdn.githack.com/alrusdi/jquery-plugin-query-object/9e5871fbb531c5e246aac2aaf056b237bc7cc0a6/jquery.query-object.js","stackTrace":{"callFrames":[{"functionName":"","scriptId":"4","url":"https://rawcdn.githack.com/alrusdi/jquery-plugin-query-object/9e5871fbb531c5e246aac2aaf056b237bc7cc0a6/jquery.query-object.js","lineNumber":244,"columnNumber":2}]},"exception":{"type":"object","subtype":"error","className":"ReferenceError","description":"ReferenceError: jQuery is not defined\n    at https://rawcdn.githack.com/alrusdi/jquery-plugin-query-object/9e5871fbb531c5e246aac2aaf056b237bc7cc0a6/jquery.query-object.js:245:3","objectId":"7735756507232330443.2.1","preview":{"type":"object","subtype":"error","description":"ReferenceError: jQuery is not defined\n    at https://rawcdn.githack.com/alrusdi/jquery-plugin-query-object/9e5871fbb531c5e246aac2aaf056b237bc7cc0a6/jquery.query-object.js:245:3","overflow":false,"properties":[{"name":"stack","type":"string","value":"ReferenceError: jQuery is not defined\n    at https\u2026c2aaf056b237bc7cc0a6/jquery.query-object.js:245:3"},{"name":"message","type":"string","value":"jQuery is not defined"}]}},"executionContextId":2}},"sessionId":"5EB446D1FB128D2499FB60BC9B58875C"}

Seems like using the header changes something and jQuery is not loading properly. TBH it's hard to think it's a problem of the website, as it's static content and returns always the same content.

Ok, glad you were able to figure it out!

I've got that clue, but I'm not able to solve the issue @kenshaw.
As I wrote I guess it's related to chomedp, but I don't know how to fix that behavior

Hence, the issue should not be closed