Cassandra Maude project Codebase overview: ------------------- There are 2 repositories: 1. https://github.com/Sonnbc/modelCheckingNobi This contains YCSB, cluster management and deployment code as well as scripts to run experiments and analyize results. The scripts are designed to run on any computer (either an emulab server or your own local pc). 2. https://github.com/Sonnbc/modelCheckingCassandra This contains modified Cassandra. Currently there are two branches: "timestampBased" and "agnostic" corressponding to TB and TA strategies. Set up cluster ------------------- 0. Variables You need to supply correct values for variables in the commands below. Here is the explanation: N: the number of nodes in your cluster 1. Initialization The initialzation is to be done once even if your nodes are swapped out and in again. a. Symlink You need to be able to compile Cassandra and YCSB without going over disk quota assigned by Emulab. Since compiling Cassandra and YCSB involves downloading a lots of required packages to ~/.m2 folder, you need to first symlink it to a directory with higher quota (for example /proj or /scratch). mkdir /scratch/.m2 ln -s /scratch/.m2 /users/yourid/.m2 b. Environment variables: You need to export the following environment variables to both ~/.bashrc and and ~/.profile: export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64 export CASSANDRA_HOME=/modelCheckingCassandra export CASSANDRA_INCLUDE=$CASSANDRA_HOME/bin/cassandra.in.sh c. Modify account file Modify clusterSetup/account with your Emulab account and experiment's domain: For example, I normally ssh using sonnbc@node-00.riak.confluence.emulab.net then my USER is sonnbc and DOMAIN is riak.confluence.emulab.net 2. Install necessary packages After everytime your nodes are swapped in, you need to install necessary packages for the nodes in your cluster. This can be done by: clusterSetup/install_packages.sh N 3. Set up cluster Run the script to set up the cluster. usage: clusterSetup/cluster.sh -n numnode -b cassandraBranch [timestampBased | agnostic] [-d (deploy) | -u (update) recompile/reset/keep] recompile: recompile cassandra and YCSB and reset cluster reset: don't recompile the code but reset cluster (empty all data) keep: just git pull but do nothing else This will set up Cassandra cluster for your Emulab nodes. In order to do this, you need access to the two github repos above. You might need to change the the variables at top of the file if necessary. These variables include: CASSANDRA_HOME: Where Cassandra binary will be on each node. This needs to be a local directory (non NFS) SYNC_POINT_CASSANDRA: Where Cassandra code will be cloned from git and compiled. This needs to be a directory on the NFS (synced between nodes) SYNC_POINT_YCSB: Where YCSB and scripts will be cloned from git and YCSB is compiled. This needs to be a directory on the NFS. CASSANDRA_DATA: Where Cassandra stores its data. Non-NFS. You typically dont need to change this unless you know what you are doing. CASSANDRA_LOG: Where Cassandra stores its log. Non-NFS. You typically dont need to change this unless you know what you are doing. The -d switch is typically used when you first swap your nodes in. It first deletes everything in your current nodes, then clones the git repos and compiles the code from scratch. Cassandra cluster is deployed after that. The -u switch assumes you already have git repos in your NFS. It will just call "git pull" instead of "git clone". The 3 options are explained in the usage above. Typically you want to use 'keep' when you just change YCSB workload file. Make sure to check the log on the screen to see if YCSB and Cassandra are compiled correctly. For YCSB, you just need the following. The rest might fail to compile but it's not important. [INFO] YCSB Root ......................................... SUCCESS [36.239s] [INFO] Core YCSB ......................................... SUCCESS [26.911s] [INFO] Cassandra DB Binding .............................. SUCCESS [30.420s] 4. Check cluster status Make sure Cassandra is up and running for all nodes. This is done by going to CASSANDRA_HOME from one of the nodes and calling: bin/nodetool status Check if all nodes are in the UP state 5. Set up database If you have chosen to run cluster.sh with options different than "-u keep", you need to reinitialize the database. Go to one of the nodes, invoke Cassandra-cli (CASSANDRA_HOME/bin/cassandra-cli) and issue these commands: CREATE KEYSPACE usertable with placement_strategy = 'SimpleStrategy' and strategy_options = {replication_factor:3}; use usertable; create column family data with comparator=UTF8Type and default_validation_class=UTF8Type and read_repair_chance=0.0; 6. Use netem to change network delay if necessary (ask Muntasir) Run experiments ------------------- You might need to change YCSB_HOME in YCSBRun.sh if you have changed SYNC_POINT_YCSB in clusterSetup/cluster.sh The workload used is YCSB_HOME/workloads/modelCheckingWorkload. Since the nodes have a copy of the file in their cloned repo, you NEED TO MAKE SURE any changes in this file are reflected at the nodes. Otherwise they will just run YCSB with the old version that they have. So you need to commit your changes and push them to github. Then call clusterSeup/cluster.sh with "-u keep" option to instruct the nodes to pull the file. 1. YCSB Load: ./YCSBRun.sh -l -n N 2. YCSB Test: ./YCSBRun.sh -t -n N 2>&1 | grep INFO | tee temp.log 3. Parse result: python YCSBResultParser temp.log Contact: ------------------- Son - sonnbc@gmail.com