Monkey Troop is a python program for the evaluation of ARTist modules and ARTist itself. It executes custom evaluations on multiple connected devices, stores reports and computes results. Evaluations can be resumed after stopping the script.
A microservice architecture is used where we spawn a new worker process for each device and a central reporter process that collects the results and writes reports.
This project is part of the ARTist ecosystem. For more information, check out the corresponding section below.
An installation of python 3.5 and potentially additional python packages are required.
The evaluation is conducted on arbitrary many devices, but at least one needs to be present.
Monkey Troop uses the Android developer tools. For example, adb
and monkey
need to be present in the PATH
to find, control devices and test apps, respectively.
After checking out the repository, the structure looks like follows:
code/
lists/
scripts/
The root folder for all evaluation-related source files.
Lists of applications to be tested during the evaluation. As they constitute simple text files with one package name per line, new lists can be added at will.
Contains scripts that simplify the evaluation process. The main script, evaluate.sh
, contains the basic invocation
of the Monkey Troop python tool. However, as different evaluations require different input, there should be another script
for each evaluation. For example, the trace-logging evaluation uses the evaluate_trace.sh
script that invokes
evaluate.sh
with additional arguments. While some arguments are common to all evaluations, concrete evaluations may
require additional ones.
The tool is started through the script corresponding to the evaluation that should be conducted. For evaluation X
, invoke ./scripts/evaluate_X.sh
. Scripts need to be started from the tool's base directory.
Everytime an application has been tested, Monkey Troop writes a full report to out/reports/<pkg>
, where <pkg>
is the package name of the tested app. As multiple tasks are executed for each app under test, the report lists success or failure for each of them, accompanied by additional information that might have been obtained during testing.
In addition, the csv result file in out/results
is extended (or generated if none exists) that shows off a collapsed view of the evaluation results for all tested apps.
The evaluation can be cancelled at any time. However, due to its multiprocess-architecture, it might take Monkey Troop a few seconds to terminate all workers since they are given the chance to exit gracefully to avoid data loss. The cancellation signal is triggered with a keyboard interrupt (Ctrl+C
on Linux).
monkey-troop detects previous executions of evaluations and asks whether they should be proceeded. In the positive case,
already tested apps are skipped and existing results will be updated. In the negative case, all existing evaluation data
is deleted to allow for a fresh run. You can, however, easily archive your results by backing up the out/
folder.
For most evaluations, a lot of boilerplate code already exists, so you do not need to write much code yourself. We will guide you through the whole process by creating a sample evaluation.
Each evaluation resides in its own folder, so we start by creating code/evaluations/sample
.
There are 2 important classes that we need to subclass for our evaluation to work: Evaluator
and Worker
.
We first create SampleEvaluator
. You can either create it from scratch and get an impression on what and how to do
from the existing TraceLoggingEvaluator
, or you can directly copy paste it and just adapt the code.
The following steps are required:
-
set the
EVAL_ID
to'sample'
and return this value in theget_eval_id
method. -
define the subtasks that subdivide the execution of a task into logical units. Results and logging will be provided for each subtask separately in order to facilitate analysis and debugging. It is recommended to store them in a static member and return them in the
get_subtask_ids_ordered
method. -
in order to decide whether a task succeeded, you have to provide interpretations that define whether a task is an assumption, a requirement or you do not care about its result.
-
If a required subtask fails, execution stopt here and the task is considered a fail. This is typically used for all subtasks that are conducted by the system you are currently evaluating.
-
If an assumption fails, the task will be removed from the final analysis. The idea is that you use it to model external factors that you are not in control of, so you can for example retry under different circumstances or simply omit these entries from for the final analysis. For example, an application that cannot be installed in a device because its Manifest is mumble should not be counted as a failure of your system (assuming you did not alter the Manifest).
-
If a do not care subtask fails, it is simply ignored. This is useful if you, for example, model cleanup as a dedicated subtask, which might come in handy for debugging and analysis.
-
-
implement
init
by creating an argument parser to obtain evaluation-specific arguments. The dedicated init method is required because__init__
is already called earlier and without arguments. -
you can take
create_task_queue
as is for most use cases or adapt if you, e.g., enforce a certain ordering or provide more information about a task to the workers. -
in the
create_device_worker
method, return an instance of theSampleWorker
you create in the next step. -
get_analyzer
needs to return a valid analyzer implementation. If you have no special requirements, return an instance of the defaultResultAnalyzer
. It will be used in theanalyze.py
script and also during the evaluation to display the current state. -
you also need to define the source for applications under test.
get_app_repository
is expected to return anIAppRepository
. If you have all the apk files available, you can simply useFileBackedRepository
that manages access to apk files in a directory. If you want to download the apps on the fly, you can also use theGPlayDownloaderRepository
that makes use of the shipped Google Play Crawler. In this case, however, you need to set up the crawler separately as described below. Of course, you can also create an own repository if you need a custom solution. -
finally, put an instance of your
SampleEvaluation
inEvaluations.py
with the corresponding id as a key.
In the second step, we create SampleWorker
that carries the main evaluation logic.
A worker obtains a task from the queue, executes it and collects the results. Therefore, we implement the methods
process
that does the actual execution and cleanup
to takes care of all cleaning tasks. Keep in mind that an
execution can abort at any point, so a full cleanup is not always required.
An optional third step is the creation of a custom analyzer. While the default analyzer can check for duplicates and
displays the current evaluation results, a more customized one can provide evaluation-specific functionality. While
there are many helper methods readily available, you might want to extend the analyzer's api by adding a new command to
get_command_api
and link the corresponding method. This api is typically triggered by providing the command name
with additional arguments to analyze.py
The main script to start the evaluation will be created as scripts/evaluate_sample.sh
.
Similar to evaluate_trace.sh
, it will invoke evaluate.sh
with all the required arguments.
For the analyzer, you can copy scripts/evaluate_trace.sh
to
scripts/evaluate_sample.sh
and change the eval id from trace_logging
to sample
.
The Google Play Crawler allows to directly download app apk files from the Google Play Store. While monkey-troop itself
was written from scratch, the crawler is a third-party component with a different license (BSD) and a dedicated
README.md
, obtained from here. In order to make use it by, e.g., use the
GPlayDownloaderRepository
, you first need to set it up.
The Google Play API requires valid credentials and registered device(s), so you need to provide those information in a
configuration. First copy code/repositories/gplay/googleplay_api/googleplay_api/config.py.example
to
code/repositories/gplay/googleplay_api/googleplay_api/config.py
. Then fill the missing entries with valid
credentials. If you are unsure about how they might look like, you can check the history of above mentioned GitHub
project.
ARTist is a flexible open source instrumentation framework for Android's apps and Java middleware. It is based on the Android Runtime’s (ART) compiler and modifies code during on-device compilation. In contrast to existing instrumentation frameworks, it preserves the application's original signature and operates on the instruction level.
ARTist can be deployed in two different ways: First, as a regular application using our ArtistGui project that allows for non-invasive app instrumentation on rooted devices, or second, as a system compiler for custom ROMs where it can additionally instrument the system server (Package Manager Service, Activity Manager Service, ...) and the Android framework classes (boot.oat
). It supports Android versions after (and including) Marshmallow 6.0.
For detailed tutorials and more in-depth information on the ARTist ecosystem, have a look at our official documentation and join our Gitter chat.
We are about to enter the beta phase soon, which will bring a lot of changes to the whole ARTist ecosystem, including a dedicated ARTist SDK for simplified Module development, a semantic versioning-inspired release and versioning scheme, an improved and updated version of our online documentation, great new Modules, and a lot more improvements. However, in particular during the transition phase, some information like the one in the repositories' README.md files and the documentation at https://artist.cispa.saarland might be slightly out of sync. We apologize for the inconvenience and happily take feedback at Gitter. To keep up with the current progress, keep an eye on the beta milestones of the Project: ARTist repositories and check for new blog posts at https://artist.cispa.saarland .
We hope to create an active community of developers, researchers and users around Project ARTist and hence are happy about contributions and feedback of any kind. There are plenty of ways to get involved and help the project, such as testing and writing Modules, providing feedback on which functionality is key or missing, reporting bugs and other issues, or in general talk about your experiences. The team is actively monitoring Gitter and of course the repositories, and we are happy to get in touch and discuss. We do not have a full-fledged contribution guide, yet, but it will follow soon (see beta announcement above).
ARTist is based on a paper called ARTist - The Android Runtime Instrumentation and Security Toolkit, published at the 2nd IEEE European Symposium on Security and Privacy (EuroS&P'17). The full paper is available here. If you are citing ARTist in your research, please use the following bibliography entry:
@inproceedings{artist,
title={ARTist: The Android runtime instrumentation and security toolkit},
author={Backes, Michael and Bugiel, Sven and Schranz, Oliver and von Styp-Rekowsky, Philipp and Weisgerber, Sebastian},
booktitle={2017 IEEE European Symposium on Security and Privacy (EuroS\&P)},
pages={481--495},
year={2017},
organization={IEEE}
}
There is a follow-up paper where we utilized ARTist to cut out advertisement libraries from third-party applications, move the library to a dedicated app (own security principal) and reconnect both using a custom Binder IPC protocol, all while preserving visual fidelity by displaying the remote advertisements as floating views on top of the now ad-cleaned application. The full paper The ART of App Compartmentalization: Compiler-based Library Privilege Separation on Stock Android, as it was published at the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS'17), is available here.