Uber Graph Benchmark (UGB)

Getting Started

Check out the repo
Set up a database to benchmark. There is a README file under each binding directory. List out all modules by
```
./gradlew projects
```
Run benchmark on db

# generates and writes to redis db, then reads with subgraph queries
./gradlew execute -PmainArgs="-db com.uber.ugb.db.redis.RedisDB -w -g benchdata/graphs/trips -b benchdata/workloads/workloada -r"

# generates and writes to redis db, then reads with subgraph queries
./gradlew execute -PmainArgs="-db com.uber.ugb.db.cassandra.CassandraDB -w -g benchdata/graphs/trips -b benchdata/workloads/workloada -r"

# this generate vertices and edges and write to noop, used for measuring data gen performance
./gradlew execute -PmainArgs="-db com.uber.ugb.db.NoopDB -g benchdata/graphs/trips -b benchdata/workloads/workloada -w"

# this generate vertices and edges and write as csv to System.out or a file
./gradlew execute -PmainArgs="-db com.uber.ugb.db.CsvOutputDB -g benchdata/graphs/trips -b benchdata/workloads/workloada -w"

Customization

Set environment variables in

benchdata/workloads/env.properties

To add a new DB implementation, consider inherit from

com.uber.ugb.db.KeyValueDB

This stores the adjacency list in one blob.
com.uber.ugb.db.PrefixKeyValueDB

This stores the adjacency list with the same prefix. The edge writes could be faster than KeyValueDB.
com.uber.ugb.db.GremlinDB

This processes gremlin queries directly.

Build

create jar

./gradlew jar

build fat jar for spark

./gradlew build

Run on Spark

Here is one example on how to run spark

#!/usr/bin/env bash

cd ugsb

YARN_CONF_DIR=/etc/hadoop/conf /home/spark-2.1.0/bin/spark-submit \
--class "com.uber.ugb.Benchmark" \
--master yarn \
--deploy-mode client \
--driver-memory 6G \
--executor-memory 6G \
--executor-cores 2 \
--driver-cores 2 \
--num-executors 10 \
--conf spark.yarn.executor.memoryOverhead=2048 \
--driver-class-path '/etc/hive/conf' \
build/libs/ugb-all-0.0.15.jar \
"-db com.uber.ugb.db.cassandra.CassandraDB -w -g benchdata/graphs/trips -b benchdata/workloads/workloada -r -s"

echo $?

About

A framework to benchmark different graph databases, based on generated data from customizable schema, distribution, and size.

Apache License 2.0

Languages

Language:Java 99.8%Language:Shell 0.2%