FlareSolverr / FlareSolverr

Proxy server to bypass Cloudflare protection

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Site cookies not used in new request from same session

nopoz opened this issue · comments

Have you checked our README?

  • I have checked the README

Have you followed our Troubleshooting?

  • I have followed your Troubleshooting

Is there already an issue for your problem?

  • I have checked older issues, open and closed

Have you checked the discussions?

  • I have read the Discussions

Environment

- FlareSolverr version: 3.3.15
- Last working FlareSolverr version: n/a
- Operating system: Docker
- Are you using Docker: yes
- FlareSolverr User-Agent (see log traces or / endpoint): Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
- Are you using a VPN: no
- Are you using a Proxy: no
- Are you using Captcha Solver: no
- If using captcha solver, which one: n/a
- URL to test this issue: any that require site cookies

Description

It appears that site cookies are not passed in subsequent requests in the same session. For example, if I "get" a page in a session I can see the cookie in the solution output. If I make a new "post" or "get" to the same site using the same session, the cookie from the previous get is not used. The workaround is to parse the "get" solution json for the cookie, then store it for use in the "post" which is inconvenient. I'm not sure if this is intended behavior or not. I can't tell if talk about session cookies in the README is only specific to cloudflare or not. I was assuming that any cookies would persist across the session like python requests works.

Logged Error Messages

n/a

Screenshots

No response

Update to v3.3.16

Provide an example of the script you're using.

@ilike2burnthing could you confirm if all cookies are expected to persist in a single session? ie, is this a bug or expected behavior?This would help streamline my testing.

I don't see anything in 3.3.15 to 3.3.16 that would have an impact on cookies?

If you're correctly using sessions, yes - https://github.com/FlareSolverr/FlareSolverr#commands

Updating wasn't a suggested fix, just so that you're on the latest version.

Provide an example of the script you're using.

@ilike2burnthing here are two versions of the script - it's a bit more complicated than normal because I have to load a page to get CSRF tokens, then POST to authenticate, then once I'm authenticated I can finally load the page I want to scrape behind the authwall.

These are both very similar. The only difference in the second one (site_scrape_no_cookies.py) is I've commented out where I manually send cookies to flaresolver. When I don't send those cookies, I fail to authenticate in the authentication POST.

Works - manually parsing cookies out of flaresolver solution and sending them in preceding requests:
https://github.com/nopoz/site_scrape/blob/70b0e8c635df039db7765f6c03dc61f6c4c64818/site_scrape.py

Does not work - do not manually send cookies with flaresolver requests:
https://github.com/nopoz/site_scrape/blob/70b0e8c635df039db7765f6c03dc61f6c4c64818/site_scrape_no_cookies.py

I'll try looking at this again tomorrow with fresh eyes.

Testing with a basic script like below, sessions are working fine for me, whether or not there's a Cloudflare challenge:

import requests

url = "http://localhost:8191/v1"
headers = {"Content-Type": "application/json"}
data = {
    "cmd": "sessions.create",
    "session": "test_session"
}
response = requests.post(url, headers=headers, json=data)

url = "http://localhost:8191/v1"
headers = {"Content-Type": "application/json"}
data = {
    "cmd": "request.post",
    "session": "test_session",
    "url": "https://www.example.com/takelogin.php",
    "postData": "username=username&password=password"
}
response = requests.post(url, headers=headers, json=data)

url = "http://localhost:8191/v1"
headers = {"Content-Type": "application/json"}
data = {
    "cmd": "request.get",
    "session": "test_session",
    "url": "https://www.example.com/index.php"
}
response = requests.post(url, headers=headers, json=data)
print(response.text)

Testing your script briefly, it worked for me for sites with and without hidden inputs.

Have you confirmed that the initial login is actually successful?

Here are the logs from Flaresolverr showing different working and non-working methods if that helps:

  1. Not manually specifying cookies - does not work. No reference to cookies in the log line, not sure if that's normal or not:
2024-03-24 XX:XX:XX INFO     ReqId 140628650821376 Incoming request => POST /v1 body: {'cmd': 'request.post', 'url': 'https://example.com/login', 'session': 'example.com', 'postData': 'token=123456&username=myusername&password=mypassword&info=infovalue&keeploggedin=1&submit=login', 'maxTimeout': 60000}
  1. Manually parsing out cookie values and sending them formatted with name: <name>, value: <value> - this works and it's my current method:
2024-03-24 XX:XX:XX INFO     ReqId 140628570666752 Incoming request => POST /v1 body: {'cmd': 'request.post', 'url': 'https://example.com/login', 'session': 'example.com', 'postData': 'token=123456&username=myusername&password=mypassword&info=infovalue&keeploggedin=1&submit=login', 'maxTimeout': 60000, 'cookies': [{'name': 'cookie_name', 'value': 'cookie_value'}]}
  1. Manually sending all cookie values returned in the solution json cookie data - does not work:
2024-03-24 XX:XX:XX INFO     ReqId 140628562274048 Incoming request => POST /v1 body: {'cmd': 'request.post', 'url': 'https://example.com/login', 'session': 'example.com', 'postData': 'token=123456&username=myusername&password=mypassword&info=infovalue&keeploggedin=1&submit=login', 'maxTimeout': 60000, 'cookies': [{'domain': '.example.com', 'expiry': 1713738633, 'httpOnly': False, 'name': 'cookie_name', 'path': '/', 'sameSite': 'Lax', 'secure': True, 'value': 'cookie_value'}]}

The returned HTML after L165, the POST login, does that show you as being successfully logged in?

Yes, the html in the response shows a successful login.

If I remove the "cookies" from the POST data payload, the html shows a failure to login, Authorization token expired or invalid. Any further page loads past this point fail as I didn't authenticate properly in the POST step.

If do the same auth POST in a desktop Web Browser debug console, I can see that a cookie is sent with the POST data which matches what my script is doing. This is why I'm assuming that Flaresolverr is missing that step somehow when cookies aren't manually passed in the post payload. I can't really tell much from the debug logging what is happening behind the scenes in regards to cookies by default.

I tried your script again with a few different sites, it's definitely working for me. When using session FlareSolverr doesn't need to pass cookies in its requests, as it's all part of the same browser session.

What cookies are you sending to the login page? Just cf_clearance?

Yes, the html in the response shows a successful login.

If I remove the "cookies" from the POST data payload, the html shows a failure to login, Authorization token expired or invalid.

I'm confused, is login successful or not when using site_scrape_no_cookies.py?

What cookies are you sending to the login page? Just cf_clearance?

A site cookie is generated when you GET the login page and then it has to be passed alongside the auth data in order to login correctly. I'm not sure if it's cloudflare related or not. I see a few different sites use this type of cookie usually with a name <single alpha letter>id - presumably this prevents automated scripts from just doing a POST to the login page without loading the CSRF token(s) and cookie contents in a GET first?

So you have three factors for logging in:

  1. the hidden CSRF token(s) on the login page need to be passed with the username and password POST data payload
  2. the login page cookie "name: value" needs to be sent as a request cookie with the POST
  3. the standard username and password data needs to be present with the POST data payload

I'm confused, is login successful or not when using site_scrape_no_cookies.py?

Login is not successful when using site_scrape_no_cookies.py