dw [BETA]

Description:

The tool has been created to speed up manual malware hunting tasks. A simple example is already covered in “Use cases” but let’s describe it verbally. Imagine following situation, you found an open directory on the Internet, which is full of malicious samples (Example below)

Normally to download these files you would have to manually save them (or retrieve the page source code and pull the href elements with regex or so, and then download them with curl or wget). When the amount of files is bigger than few it becomes unmanageable and simple waste of time.

The “dw” if instructed to do so, could crawl the site for all available href elements and reconstruct their full URLs properly. The list of retrieved URLs could be automatically downloaded; or you could stop at this stage and decide to modify the list etc. The downloaded files could be compressed (zipped) and eventually automatically submitted to Antivirus vendor.

Features:

Some of available functionalities:

Input URL de-obfuscation (hxxp[:]//120.132.17[.]180:66/ becomes http://120.132.17.180:66/)
Input URL de-duplication
Site crawling for href elements (Two different modes)
Bulk file downloads
Files de-duplication (by file hash)
File compression (.zip, custom files count per archive)
Antivirus submission (plugins/%vendor%.py.vd file required)
Proxy submission and querying (plugins/%vendor%.py.vd file required)
Automatic Pastebin reports

Example:

 dw.py -i hashes.txt --vt-file-download --vt-file-report
  
  VirusTotal -> File Report:
  947447601a5a505420ea707ac75d7b36, 46/68, Symantec: W32.SillyFDC, Microsoft: Worm:AutoIt/Autorun.AC
  
  VirusTotal -> File Download:
  947447601a5a505420ea707ac75d7b36, downloads/947447601a5a505420ea707ac75d7b36

 dw.py -i urls.txt -gl --submit

Load and deobfuscate URLs from urls.txt
Enumerate all < a > HREFs for each loaded URL
Download all files pointed in HREFs destination
Submit all downloaded files to loaded AV vendors

 dw.py -dedup -z -i downloads/

Load all files from input folder (downloads/) [-i < folder >]
Deduplicate files
Compress all unique files from input folder and saves them to archive/ folder [-z]

 dw.py --submit -i downloads/

Process files from downloads/ [zip them when necessary]
Processed/compressed files are saved into archive/ folder
Submits files from archive/ folder to all loaded vendors [--submit enables -z automatically]

 dw.py --submit -s downloads/malware.exe

Submits a local file to all loaded AV vendors

 dw.py -v DEBUG --dedup --url-info -s http://soportek[.]cl/DNWbs6/ --submit

Log debug data into dw.log
Retrieve proxy category of given URL from all loaded proxy vendors and prints it
Download and submits the content served by the URL to all loaded AV vendors

Command Line:

optional arguments:
  -h, --help            show this help message and exit

Script arguments:

  -i INPUT, --input-file INPUT
                        Load the hashes, URLs or existing files from the input
                        file for further processing
  -s SAMPLE_FILE_OR_FOLDER, --sample SAMPLE_FILE_OR_FOLDER
                        Load given URL, file or folder for further processing
  -d DOWNLOAD_FOLDER, --download-folder DOWNLOAD_FOLDER
                        Specify custom download folder location (Default:
                        downloads/
  -a ARCHIVE_FOLDER, --archive ARCHIVE_FOLDER
                        Specify custom archive folder location (Default:
                        'archive/')
  -o OUTPUT_DIRECTORY   Copy loaded/deduplicated files into specified output
                        directory (Applicable when -dd is used)
  -dd, --dedup          Deduplicate the input and downloaded files
  -v VERBOSE_LEVEL, --verbose VERBOSE_LEVEL
                        Set the logging level to one of following: INFO,
                        WARNING, ERROR or DEBUG (Default: WARNING)
  --download            Download loaded or crawled URLs
  -z, --zip             Compress all downloaded files, or files from input
                        folder (If not zipped already)
  --no-mime, -nm        Print All retrieved HREFs without a mime type
  --limit-archive-items MAX_FILE_COUNT_PER_ARCHIVE
                        Sets the limit of files per archive (Default: 9). [0 =
                        Unlimited]

Crawling arguments:

  -gl, --get-links      Retrieve all available links/hrefs from loaded URLs
  -rl, --recursive-hostonly
                        Enable recursive crawling (Applies to -gl), but crawl
                        for hrefs containing the same url host as input url
                        (Sets --recursion-depth 0 and enables -gl)
  -r, --recursive       Enable recursive crawling (Applies to -gl, enables
                        -gl)
  -rd RECURSION_DEPTH, --recursion-depth RECURSION_DEPTH
                        Max recursion depth level for -r option (Default: 20)

Networking arguments:

  --user-agent USER_AGENT
                        User-agent string, which would be used by -gl and
                        --download
  --debug-requests      Sends GET/POST requests via local proxy server
                        127.0.0.1:8080

Submission arguments:

  --submit              Submit files to AV vendors (Enables -z by default)
  --submit-hash         Submit hashes to AV vendors
  --submit-url          Submit loaded URLs to PROXY vendors...
  --in-geoip            Determine GeoIP Location of loaded URLs etc...
  -ui, --url-info       Retrieve URL information from supported vendors for
                        all loaded input URLs.
  -uif, --url-info-force
                        Force url info lookup for every crawled URL (NOT
                        recommended)
  -sc SUBMISSION_COMMENTS, --submission-comments SUBMISSION_COMMENTS
                        Insert submission comments (Default: )
  --proxy-vendors PROXY_VENDOR_NAMES
                        Comma separated list of PROXY vendors used for URL
                        category lookup and submission
  --av-vendors AV_VENDOR_NAMES
                        Comma separated list of AV vendors used for file/hash
                        submission (Default: all)
  --email SUBMITTER_EMAIL
                        Specify the submitter's e-mail address
  --proxy-category NEW_PROXY_CATEGORY
                        Specify new proxy category (If not specified default
                        proxy category will be used)

LabAPI arguments:

  --labapi-file-download
                        Instructs the LabAPI to search for the sample in
                        available API services

Virus Total arguments:

  --vt-file-download    Download file from VirusTotal
  --vt-file-report      Get report about the file from VirusTotal
  --disable-vt-file-report
                        Disables automatic VT File report lookup for file
                        downloads

pastebin arguments:

  --pastebin-api PASTEBIN_API_KEY
                        API dev key for pastebin.com (If not specified, other
                        pastebin params would be ignored)
  -pu, --pastebin-upload
                        Uploads stdout to pastebin and prints the paste's url
  -pv PASTEBIN_TYPE, --pastebin-visibility PASTEBIN_TYPE
                        Set the paste visibility: 0 - Public or 2 - Private
                        (Default: 0)
  -pe PASTEBIN_PASTE_EXPIRATION, --pastebin-expiration PASTEBIN_PASTE_EXPIRATION
                        Set the paste expiration time to one of following:
                        'N': 'Never', '10M': '10 Minutes','1H': '1 Hour','1D':
                        '1 Day','1W': '1 Week','2W': '2 Weeks','1M': '1 Month'
                        ... (Default: 1H)
  -pt PASTEBIN_TITLE, --pastebin-title PASTEBIN_TITLE
                        Paste title

Change log:

Ver. 0.0.8:

-r, --recursive: Enable recursive crawling
-rd, --recursion-depth: Max recursion depth level (default: 20)
Simplified get_hrefs() code
get_hrefs will use the requests session object and mix of HEAD and GET requests to speed up crawling performance
get_hrefs set to return unique hrefs only

Ver. 0.0.9:

Fixes to download and get_hrefs functions
If verbose level is set to DEBUG, print the href every time it's added to links list
Documentation update

Ver. 0.1.0:

Fixes to get_hrefs function
"-rl", "--recursive-hostonly". Would crawl webistes which have the same url host as the input URL (Recommended)
Small error handling to showing archive content (Didn't work for .jar files)
Fix to download function (it was corrupting files)
Documentation update

Ver. 0.1.1:

Fix to download function (it was corrupting files)
Documentation update

Ver. 0.1.2:

Fix to download function (Logic imrpoved)

Ver. 0.1.3:

Supressed the warning: urllib3/connectionpool.py:852: InsecureRequestWarning: Unverified HTTPS request is being made

Ver. 0.1.4:

Critical fix to get_hrefs (Automatic parent folder and mod_autoindex detection; preventing back loops).

Ver. 0.1.5:

Detect and skip links automatically created in open directory like: Name, Last modified, Size, Description

Ver. 0.1.7:

Slight logic change to tasks execution
New parameters for input and downloaded files deduplication and copying files to an output folder
BC proxy lookup support
uniq class for handling deduplication
Submitter class (so far for proxy lookup only, but soon for VT as well)

Ver. 0.1.7:

If DEBUG mode enabled, it would print detected href's mime type if such was directly sent by the server
Download function prints detailed info about downloaded file like: file_hash,file_destination,file_mime_type,proxy_category,url

Ver. 0.1.9:

Params names and documentation update

Ver. 0.2.0:

--url-info, --submit-url, --email < email_address >, used for querying current proxy category or submitting new one for loaded URLs
--download, would enable download action
--skip-download got removed
--proxy-category: Specify new proxy category (Default: 'Malicious Sources/Malnets')
--proxy-vendors: Comma separated list of PROXY vendors used for URL category lookup and submission

Ver. 0.2.1:

Crawling for a hrefs shall be faster (the tool would flag as final href if the resource has known file extensions)

Ver. 0.2.2:

Print deduplicated links
Minor fixes

Ver. 0.2.3:

Cosmetic changes (working on adding smb support)

Ver. 0.2.4:

The session would use the user-agent specified by the user via --user-agent, if such is not specified it would use the one pointed by current_user_agent_index variable

Ver. 0.2.5:

Cosmetic changes (--submit is not disabling -gl nor -rl params)

Ver. 0.2.6:

New params for pastebin upload (run -h to find out)
Few cosmetic and printing code changes
Changed the display of -h param groups
No need to specify -i urls.txr (if urls.txt exist)
Check_args function adjustment

Ver. 0.2.7:

FIX: _urls variable was not properly cleared, hence duplicating output HREFs
FIX: If -gl, -rl, -r does not find any hrefs, the input URL will be added to HREFs list

Ver. 0.2.8:

pastebin: 1W instead of 1H
-i would skip entries from URL input file which starts with "#"
--url-info prints proxy category, ip, doamin, and url
Code re-design to support URL object (which would hold all information about the URL)
Update to file_extensions dictionary (i still need to add a lot!)

Ver. 0.2.9:

Quick fix in url.py->parse_url() ... shall be parsing URLs correctly

Ver. 0.3.0:

Update to file_extensions (it has 76 extensions of well known files)
Removed parse_urls from dw (it's now part of url object)
Cosmetic code changes to logging

Ver. 0.3.1:

Code fix: Again for url.py -> parse_urls (i missed one error last time)

Ver. 0.3.2:

Code update: to get_hrefs. Preventing a case with a continuous loop

Ver. 0.3.3:

Fix to Bluecoat proxy category lookup and submission (URL changes, POST data changes etc.)
Added actual check if captcha is required, if not, no need to download it from the server.

Ver. 0.3.4:

Cosmetic code change in the function loading URLs from a file (it would skip the lines which are just '\n')
--url-info-force is well respected by --url-info

Ver. 0.3.5:

url parsing improvements
preparing for other enhancements like smb support, submission tracking (databse)

Ver. 0.3.6:

Small fix to download function better handling of output file names

Ver. 0.3.7:

Small fix to download function better handling of output file names (Previous fix was cutting the file extension)
-nm: All retrieved HREFs will be printed without <mime_type> (By default <mime_type> is printed)
-i: supports a notation with mime type like: http://host[.]domain/path/etc ,(MIME: <mime_type>), so you can just copy the result from -gl, -rl etc.

Ver. 0.3.8:

-o: Fix, it should support full and relative path as well for the output folder
-i: "[END]" marker would stop input file processing for both urls and hashes
(Beta) --submit-hash: Allows submitting hashes to AV vendor (-i must be specified)
Both --submit, --submit-hash would attempt to read a Tracking ID number from submission website

Ver. 0.3.9:

Update to url parsing " || . || " or "||.||" to "."
Fix to _url_endswith() which was incorrectly considering domain TLD as an extension

Ver. 0.4.0:

New plugin_manger class brings dynamic plugins support (with config "[.vd]" files)
Not all "[.vd]" files are public, hence they may contain PII or other sensitive data like serial numbers etc.
New params: --av-vendors vendor,vendor (Default: all) which means by default upload to all loaded vendors
Params: --submit, --submit-hash are handled by respective plugins

Ver. 0.4.1:

TEMPORARY FIX: Code fix to _url_endswith() function. Due to another fix being implemented for TLDs, the function was ignoring the known file extensions list (Crawling should be faster now)

Ver. 0.4.2:

CODE CHANGE: get_hrefs function has got new logic and better logging (-rl shall be more accurate now)

Ver. 0.4.3:

CODE CHANGE: get_hrefs function: Slight change to handling pages returning text/html mime type

Ver. 0.4.4:

CODE CHANGE: get_hrefs function: Slight cosmetic change. (Better handling of < a > hrefs)

Ver. 0.4.5:

MacOS: Added automatic detection of libmagic magic file (if installed by brew)

Ver. 0.4.6:

Cosmetic code changes

Ver. 0.4.7:

Proxy query and submission moved to plugin: pp_bluecoat (pp_bluecoat.py.vd required)
Proxy submission comments, would be set to URL, if no specific submission comments were transferred
Submission code change to support additional case, where portal detects that URL was already submitted by you
dw code changes to handle new proxy plugin
dw code changes to simplify av and proxy plugin usage
--url-info is using the proxy plugin
Removed submitter.py, replaced by plugins and plugin manager
Removed submit_url_category, get_url_info, _update_headers functions from dw.py (handled by plugins now)

Ver. 0.4.8:

Update to INSTALL.md
Added error handling in load_urls_from_input_file()
Cosmetic code changes to the logic of -i and -s options
New option -s, which allows to specify a local file or folder or am URL for processing
-i would accept input file containing URLs or Hashes
Documentation update

Ver. 0.4.9:

Cosmetic fix to logic of -s option in case of URL treatment

Ver. 0.5.0:

Code fix to cache treatment for URL_CACHE. (--url-info shall be faster now)

Ver. 0.5.1:

Code fix: -rl, -gl should have better support for final url building

Ver. 0.5.2:

Code fix: -rl, -gl

Ver. 0.5.3:

The plugin class, automatically adjust self.con according to --debug-requests value
New param (Beta): --vt-download-file, once activated force the submission into hash mode and would download files from VirusTotal by file hashes

Ver. 0.5.4:

Added hasher class to facilitate error checking for vt
Error checking in download api

Ver. 0.5.5:

New option: --vt-file-report -> For now used to obtain either full report or report excerpt (Currently not printing anything)
--vt-download-file: Name changed to --vt-file-download (All other vt options would be --vt-%file/url/...%-%action% format)
Cosmetic code changes

Ver. 0.5.6:

cache adopted to VT plugin
Slight changes in pp_bluecoat plugin to support new functions in cache
Added some logging to --vt-file-report, --vt-file-download

Ver. 0.5.7:

--vt-file-report and --vt-file-download would first lookup the FILE_CACHE before calling the VT API

Ver. 0.5.8:

--in-geoip: Get the geoip location data for loaded URLs

Ver. 0.5.9:

--url-info: Would query for VT excerpt for each downloaded file

Ver. 0.6.0:

Code fix in hasher (affecting other modules)
Code fix in av_symantec.py (Handle the situation nicely when submitted hash is not available publicly)

Ver. 0.6.1:

Cosmetic code changes

Ver. 0.6.2:

Cosmetic code changes in get_hrefs (Still need to find some time to completely rebuild it + multiprocessing)

Ver. 0.6.4:

Cosmetic code changes in get_hrefs (Still need to find some time to completely rebuild it + multiprocessing)

Ver. 0.6.5:

Cosmetic code changes in get_hrefs (Still need to find some time to completely rebuild it + multiprocessing)

Ver. 0.6.6:

New plugin: la_labapi.py: Facilitates the connections to my LabAPI service
--labapi-file-download: Instructs the LabAPI to search for the sample in available API services
Added https://github.com/InQuest/python-iocextract to url class, instead of custom parsing
--disable-vt-file-report: Disables automatic VT File report lookup for file downloads (Huge number of files to download etc. would overuse the API limit)
Several code logic changes ... so there might be errors
Fixed the get_hrefs ... it was ignoring the -rl option a bit... [sorry]
Performance improvement to self.crawl_local_host_only

Ver. 0.6.7:

iocextract is not perfect, when it fails to parse URL, the code will run backup parsing algo
Updated to regex patterns for domains

Ver. 0.6.9:

Fix -s: Was not loading files from a folder... the issue is fixed now

Ver. 0.6.9:

Fix Cosmetic code changes

wit0k / dw

dw [BETA]

About

Languages