yahoo / gryffin

Gryffin is a large scale web security scanning platform.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

gryffin-standalone: Timeout when rendering js

bararchy opened this issue · comments

Steps I do to reproduce:

  1. nsqlookupd -verbose=true
  2. nsqd --max-msg-size=2313820682 --lookupd-tcp-address=127.0.0.1:4160
  3. go run cmd/gryffin-distributed/main.go --storage=memory seed http://mysite.com

Output:

{"Service":"Poke","Msg":"Poking","Method":"GET","Url":"http://mysite.com"}
2015/09/30 13:27:54 INF    1 (127.0.0.1:4150) connecting to nsqd
Seed http://mysite.com injected.
2015/09/30 13:27:54 INF    1 stopping
2015/09/30 13:27:54 INF    1 exiting router
  1. go run cmd/gryffin-distributed/main.go --storage=memory crawl

Output:

2015/09/30 13:29:12 INF    2 [seed/primary] querying nsqlookupd http://127.0.0.1:4161/lookup?topic=seed
2015/09/30 13:29:12 INF    2 [seed/primary] (ano:4150) connecting to nsqd
{"Service":"CrawlAsync","Msg":"Started","Method":"GET","Url":"http://mysite.com"}
{"Service":"PhantomjsRenderer.Do","Msg":"Running: render.js","Method":"GET","Url":"http://mysite.com"}
{"Service":"PhantomjsRenderer.Do","Msg":"[Timeout] Terminating the crawl process.","Method":"GET","Url":"http://mysite.com"}
2015/09/30 13:30:19 INF    2 [seed/primary] querying nsqlookupd http://127.0.0.1:4161/lookup?topic=seed

Am I missing something ?

Am I missing something ?

No. Thanks for sharing this. Gryffin is still in Beta, and I am in progress moving the hardcoded configuration out of it.

In your report, the timeout is controlled by a variable inside the render: https://github.com/yahoo/gryffin/blob/master/cmd/gryffin-standalone/main.go#L57
https://github.com/yahoo/gryffin/blob/master/cmd/gryffin-distributed/main.go#L122

I hardcode the timeout value to 10s in the standalone version, and 60s in the distributed version. The phantomjs renderer works in an asynchronous manner that it will keep listening if there is more DOM changes coming in, and kill itself if a) the dom is "ready" and "stable", or b) timeout reached.

In your example, it looks like phantomjs still believe your site is active, so Gryffin kill phantomjs.

Standalone version is generally expected to run in a more resource constrained environment, so I have set a shorter timeout value. It is easy to modify the value, yet the best way is to move it to a configuration, or even doing that dynamically. E.g. Give longer timeout period hints for the first crawl, and then generally build up the traffic profile, such as roundtrip time, elements in the page, etc., and gradually reduce the timeout period.

While I would recommend trying out the distributed version, here is what I got when running the standalone version against our test site (webseclab):

go run cmd/gryffin-standalone/main.go "http://52.89.152.13:8080/xss/reflect/full1?in=change_me"
{"Service":"Main","Msg":"Started","Method":"GET","Url":"http://52.89.152.13:8080/xss/reflect/full1?in=change_me"}
{"Service":"Poke","Msg":"Poking","Method":"GET","Url":"http://52.89.152.13:8080/xss/reflect/full1?in=change_me"}
{"Service":"CrawlAsync","Msg":"Started","Method":"GET","Url":"http://52.89.152.13:8080/xss/reflect/full1?in=change_me"}
{"Service":"PhantomjsRenderer.Do","Msg":"Running: render.js","Method":"GET","Url":"http://52.89.152.13:8080/xss/reflect/full1?in=change_me"}
{"Service":"Fingerprint","Msg":"Computed","Method":"GET","Url":"http://52.89.152.13:8080/xss/reflect/full1?in=change_me"}
{"Service":"IsDuplicatedPage","Msg":"Unique Page","Method":"GET","Url":"http://52.89.152.13:8080/xss/reflect/full1?in=change_me"}
{"Service":"PhantomjsRenderer.Do.UniqueCrawl","Msg":"domSteady","Method":"GET","Url":"http://52.89.152.13:8080/xss/reflect/full1?in=change_me"}
{"Service":"Arachni.Scan","Msg":"Run as [arachni --checks xss* --output-only-positives --http-request-concurrency 1 --http-request-timeout 10000 --timeout 00:03:00 --scope-dom-depth-limit 0 --scope-directory-depth-limit 0 --scope-page-limit 1 --audit-with-both-methods --report-save-path /dev/null --snapshot-save-path /dev/null http://52.89.152.13:8080/xss/reflect/full1?in=change_me]","Method":"GET","Url":"http://52.89.152.13:8080/xss/reflect/full1?in=change_me"}
{"Service":"SQLMap.Scan","Msg":"Run as [sqlmap --batch --timeout=2 --retries=3 --crawl=0 --disable-coloring -o --text-only -v 0 --level=1 --risk=1 --smart --fresh-queries --purge-output --os=Linux --dbms=MySQL --delay=0.1 --time-sec=1 -u http://52.89.152.13:8080/xss/reflect/full1?in=change_me]","Method":"GET","Url":"http://52.89.152.13:8080/xss/reflect/full1?in=change_me"}
{"Service":"Get Links","Msg":"Finished","Method":"GET","Url":"http://52.89.152.13:8080/xss/reflect/full1?in=change_me"}
{"Service":"SQLMap.Scan","Msg":"SQLMap return true","Method":"GET","Url":"http://52.89.152.13:8080/xss/reflect/full1?in=change_me"}
{"Service":"Arachni.Findings","Msg":"[~] Affected page:  http://52.89.152.13:8080/xss/reflect/full1?in=change_me%3Csome_dangerous_input_359b6f75f99a226ea86907811ec95ba7/%3E","Method":"GET","Url":"http://52.89.152.13:8080/xss/reflect/full1?in=change_me"}
{"Service":"Arachni.Scan","Msg":"Arachni return true","Method":"GET","Url":"http://52.89.152.13:8080/xss/reflect/full1?in=change_me"}

Please let me know if the fuzzing starts right after the crawling. And you may need to have the fuzzer installed before the fuzzing would start.

Thanks.

@yukinying Thanks for the comprehensive answer, I'll look into tweaking the timeout values and re-run.

I usually use the distributed version because the standalone give me a Cannot establish tcp connection to log listener. error, I don't know which logging daemon it looks for, so.. nothing I can do to help it ;)

Ok, it seems that running nc -lk 5000 is good enough as a logger daemon haha

@yukinying I did more testing and found an issue that could maybe explain the Timeout thingy.

It seems that gryffin wont crash or exit when he cannot reach a website, for example:

go run cmd/gryffin-standalone/main.go "http://thisisnotarealsiteright:8834/"
{"Service":"Main","Msg":"Started","Method":"GET","Url":"http://thisisnotarealsiteright:8834/"}
{"Service":"Poke","Msg":"Poking","Method":"GET","Url":"http://thisisnotarealsiteright:8834/"}
{"Service":"Poke","Msg":"Failed","Method":"GET","Url":"http://thisisnotarealsiteright:8834/"}
{"Service":"CrawlAsync","Msg":"Started","Method":"GET","Url":"http://thisisnotarealsiteright:8834/"}
{"Service":"PhantomjsRenderer.Do","Msg":"Running: render.js","Method":"GET","Url":"http://thisisnotarealsiteright:8834/"}

And it will hang until the Timeout is reached, maybe It wont even try to connect ? I'm not really sure what is happening here TBH.

Ok, further testing results.

I run wireshark to see if connections are even made to the target site, and they are, once.

The scanner send this:

GET / HTTP/1.1
Host: 127.0.0.1:8080
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.107 Safari/537.36
Accept-Encoding: gzip

Gets back 200OK with bunch of data, and nada, it just idles until timeout is reached.

HTTP/1.1 200 OK
Content-type: text/html
Content-length: 2092
Connection: keep-alive

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Login Page</title>

<!-- Bootstrap.min v CSS -->
<link href="/css/bootstrap.min.css" rel="stylesheet">

<!-- Bootstrap Core CSS -->
<link href="/css/bootstrap.css" rel="stylesheet">

</head>

<body>
    <br />
        <!-- Begin Page Content -->
        <div id="container" class="container">
            <div class="row">
                <div class="col-md-4 col-md-offset-4">
                    <div class="login-panel panel panel-default">
                        <div class="panel-heading">
                            <h3 class="panel-title">RSAccess Login</h3>
                        </div>                
                            <div class="panel-body">          
                                <form action="/login/create" method="post">
                                    <fieldset>
                                        <div class="form-group">
                                            <label for="loginmsg"></label>
                                            <label for="username">Username:</label>
                                            <input class="form-control" type="text" id="username" name="username">
                                        </div>
                                        <div class="form-group">
                                            <label for="password">Password:</label>
                                            <input class="form-control" type="password" id="password" name="password">
                                        </div>
                                            <input class="btn-primary btn btn-block" type="submit" value="Login">
                                    </felsser>
                                </form>
                            </div>
                    </div>
                </div>
            </div>
        </div>
</body>
</html>

Looking at wireshark, it seems it keeps sending TCP Keep-alive though ...

I am trying to reproduce the case there. Are you using phantomjs v2?

@yukinying Yeha, using Arch Linux so --> phantomjs-2.0.0-4-x86_64

I'm having the same issue as @bararchy.

I've tried to run phantomjs with your render.js file on a test site and it works, it returns instantly.

However, when you are running phantomjs you are also passing an extra parameter (a json with headers and stuff) and i suspect this is causing phantomjs to timeout.

@bararchy, thanks for providing the detail. I think I have located an incorrect channel message ordering in the phantomjs.go and I will start fixing it.

@harisec, thanks. I will try to run a few tests to validate that.

Hi, I have deployed the fixes. Please let me know if the usability looks better. We will add more usability enhancement soon.

Thank you very much for the update @yukinying.

I've just installed your update. Unfortunately, it doesn't solve my problem. Maybe I didn't installed something properly, but a colleague of mine has the same issue.

The issue I keep having is that PhantomjsRenderer.Do always finishes with a Timeout. I've tried to adjust the Timeout values from the code as you've described above but it didn't solved the issue (it doesn't work even with 60 seconds as timeout).

If you need additional logs from me please let me know, I'm curious about this project :)

=== Running Gryffin ===
{"Service":"Main","Msg":"Started","Method":"GET","Url":"http://192.168.0.5/"}
{"Service":"Poke","Msg":"Poking","Method":"GET","Url":"http://192.168.0.5/"}
{"Service":"CrawlAsync","Msg":"Started","Method":"GET","Url":"http://192.168.0.5/"}
{"Service":"PhantomjsRenderer.Do","Msg":"Running: render.js","Method":"GET","Url":"http://192.168.0.5/"}
{"Service":"PhantomjsRenderer.Do","Msg":"[Timeout] Terminating the crawl process.","Method":"GET","Url":"http://192.168.0.5/"}
=== End Running Gryffin ===

p.s.
I've installed Gryffin following the instructions provided in this article:
https://blog.igbuend.com/having-a-go-at-gryffin/

I'm running Ubuntu 14.04.3 LTS (x64)

Using Arch Linux and I also still have an issue where the Timeout is always reached

Thanks for sharing the experience. I just realize that an executable phantomjs script would not be always runnable in all OS. It's been fine with OSX. To mitigate this, Gryffin is now call phantomjs directly. I verified that in an AWS Linux box that this works generally. (#18)

@yukinying Hi,

After testing a few times (go run & compile) I can say that It is still not working for me with latest head.
Same issue.
Is there something I can do to help with this issue ? can I somehow produce more log data or debug info ?

Also, on what Linux distribution did it work for you ?

@harisec , does this now works for you with latest head ?

Hi all,
I am also interested in this project. I was also having the timeout issue, and with the latest head am am facing the following issue (example):

$GOPATH/bin/gryffin-standalone http://zero.webappsecurity.com/
=== Running Gryffin ===
{"Service":"Main","Msg":"Started","Method":"GET","Url":"http://zero.webappsecurity.com/"}
{"Service":"Poke","Msg":"Poking","Method":"GET","Url":"http://zero.webappsecurity.com/"}
{"Service":"CrawlAsync","Msg":"Started","Method":"GET","Url":"http://zero.webappsecurity.com/"}
{"Service":"PhantomjsRenderer.Do","Msg":"Running: render.js","Method":"GET","Url":"http://zero.webappsecurity.com/"}
{"Service":"Fingerprint","Msg":"Computed","Method":"GET","Url":"http://zero.webappsecurity.com/"}
{"Service":"IsDuplicatedPage","Msg":"Unique Page","Method":"GET","Url":"http://zero.webappsecurity.com/"}
{"Service":"SQLMap.Scan","Msg":"Run as [sqlmap --batch --timeout=2 --retries=3 --crawl=0 --disable-coloring -o --text-only -v 0 --level=1 --risk=1 --smart --fresh-queries --purge-output --os=Linux --dbms=MySQL --delay=0.1 --time-sec=1 -u http://zero.webappsecurity.com/]","Method":"GET","Url":"http://zero.webappsecurity.com/"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x8 pc=0x529b5f]

No, for me it doesn't work as well. Sorry about that.

@parereamea ,

Are you using Linux ? it seems like your run is much better then ours, it actually goes beyond the "{"Service":"PhantomjsRenderer.Do","Msg":"Running: " step.

Yes, I am using Ubuntu 14.04.3 LTS

Hm... this is interesting, @parereamea can you describe the way you installed Gryffin ? are you using "go" for other projects or you installed it only for this ?

@bararchy for me it is the first time I am using go for a project... But the steps for installing Gryffin are those outlined in the project, with some ideas from https://blog.igbuend.com/having-a-go-at-gryffin/ (same as @harisec), http://narutoinfo.github.io/tag/nsq.html and http://nsq.io/deployment/installing.html
Also, I am using --max-msg-size=2313820682 for nsqd because of some error at some point.
I hope this helps...

@parereamea Are you using nsq with "gryffin-standalone" ?

@bararchy I am not sure I understand what you're asking me ...

@parereamea I'll re-ask, You said that you are using nsqd with the --max-msg-size=2313820682 flag.

But, in your example of the error you are using this command $GOPATH/bin/gryffin-standalone http://zero.webappsecurity.com/ which run gryffin-standalone.

So, I asked if you are running NSDQ when using gryffin-standalone ?
Because NSQD is only used by gryffin-distributed.
Also, when running gryffin-distributed are you getting the same issue ?

Thanks again for reporting this. I definitely need to provide a better documentation. I have been struggling if I should get a docker image first or the documentation first. Apologies.

Reopening so as to make sure this appears on the open issue list.

Before I dig into that, just want to mention that the go way for updating from source is not trivial. This may / may not solve your issue:

$ go get -v -u github.com/yahoo/gryffin/...
github.com/yahoo/gryffin (download)
github.com/mfonda/simhash (download)
code.google.com/p/go.text (download)
...

I will provide more updates once I re-run the whole installation in a fresh amazon ubuntu box.

I changed the topic to gryffin-standalone so as to scope the discussion on the standalone setup.

@parereamea

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x8 pc=0x529b5f]

This issue is very likely to be related to the missing tcp listener. The recent patch should have fixed that. If this is still the case, please provide a longer trace that would indicate the exact file and line causing the nil pointer dereference.

@parereamea , here is what I got from a fresh setup in AWS ubuntu box. It assumes Arachni, SQLmap and PhantomJS (v2) are available in $PATH.

$ gryffin-standalone http://zero.webappsecurity.com/
=== Running Gryffin ===
{"Service":"Main","Msg":"Started","Method":"GET","Url":"http://zero.webappsecurity.com/"}
{"Service":"Poke","Msg":"Poking","Method":"GET","Url":"http://zero.webappsecurity.com/"}
{"Service":"CrawlAsync","Msg":"Started","Method":"GET","Url":"http://zero.webappsecurity.com/"}
{"Service":"PhantomjsRenderer.Do","Msg":"Running: render.js","Method":"GET","Url":"http://zero.webappsecurity.com/"}
{"Service":"Fingerprint","Msg":"Computed","Method":"GET","Url":"http://zero.webappsecurity.com/"}
{"Service":"IsDuplicatedPage","Msg":"Unique Page","Method":"GET","Url":"http://zero.webappsecurity.com/"}
{"Service":"SQLMap.Scan","Msg":"Run as [sqlmap --batch --timeout=2 --retries=3 --crawl=0 --disable-coloring -o --text-only -v 0 --level=1 --risk=1 --smart --fresh-queries --purge-output --os=Linux --dbms=MySQL --delay=0.1 --time-sec=1 -u http://zero.webappsecurity.com/]","Method":"GET","Url":"http://zero.webappsecurity.com/"}
{"Service":"Arachni.Scan","Msg":"Run as [arachni --checks xss* --output-only-positives --http-request-concurrency 1 --http-request-timeout 10000 --timeout 00:03:00 --scope-dom-depth-limit 0 --scope-directory-depth-limit 0 --scope-page-limit 1 --audit-with-both-methods --report-save-path /dev/null --snapshot-save-path /dev/null http://zero.webappsecurity.com/]","Method":"GET","Url":"http://zero.webappsecurity.com/"}
{"Service":"SQLMap.Scan","Msg":"SQLMap return true","Method":"GET","Url":"http://zero.webappsecurity.com/"}
{"Service":"Arachni.Scan","Msg":"Arachni return true","Method":"GET","Url":"http://zero.webappsecurity.com/"}
=== End Running Gryffin ===

There is an enhancement needed to get Gryffin (v2) runs with zero.webappsecurity.com. I am going to open a bug so I could port the related python logic (v1) of such back to v2 quickly.

Can you write a list of steps you did for running this on Ubuntu fresh installation ?

as in

  1. apt-get install go
  2. bla bla bla ...
  3. dependencies etc ...

this may ease tracing this issue ,maybe we missed a step ?

The blog article covered the proper way to install the dependencies. Skipping those, the setup steps are as follows, with zero.webappsecurity.com as an example site to crawl and fuzz:

export PATH=$PATH:/usr/local/go/bin:$GOPATH/bin
go get -v -u github.com/yahoo/gryffin/...
gryffin-standalone http://zero.webappsecurity.com/

The instruction of running gryffin-distributed are still coming in. I noticed certain behavior discrepancy in Go core json encoding library on different OS, which is a bit of surprise to me, and I am going to fix that soon.

@yukinying using the command go get -v -u github.com/yahoo/gryffin/... I managed at last to run gryffin without an issue !! :)
That means there is something 'missing' in the sources or something ?
Any way it works for my now.
@harisec how about you ? can you try running go get -v -u github.com/yahoo/gryffin/... and then just run gryffin-standalone from your path ? (ie. not the sources)

@yukinying @bararchy Yes, that definitely solved the problem:) Thanks :)
Pretty weird because last time I wasn't sure how to update Go and I was deleting all go files and doing go get github.com/yahoo/gryffin/... on an empty directory.

@harisec Glad it works :)

I'm closing this issue, I think all we really need is just a simple 3-steps guide to install gryffin :)