alvinantonius / kani

Cassandra backup/restore solution - alternative to cqlsh COPY FROM/TO command

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

kani

Build Status

Cassandra CSV export/import. Better than cqlsh COPY FROM/TO. Requires Java 8

Tested to work on Cassandra 2.1.15 and 2.2.4

Why?

Because COPY FROM/TO did not work for escaped characters (or any unicode). It also produces invalid CSV format if you have stringified JSON as a value.

Suppose you have the value of "ABC\nDEF" where ABC and DEF are splitted using newline character. What COPY TO does is to store it as a literal "ABC\nDEF" in the CSV. However, when COPY FROM reads it, it ignoes \ character - so the value restored will be ABCnDEF.

Another example is a stringified JSON value if you have "{"list": [1,2,3]}" as a value, COPY TO will store it as "{\"list\": [1,2,3]}" which doesn't work with csv readers.

How kani solves this issue is simple: instead of escaping characters or dealing with quotes, we simply store any text values in hexadecimal format.

Limitations

No support for Cassandra custom type and COUNTER

Usage

Using jar build

Export / import both schema and tables for a specific keyspace

java -Xmx2g -Xms2g -jar kani-standalone.jar -c "config.edn" [-d "data"] (export | import)

Options:

  • -h --help
  • -c --config - EDN config files
  • -d --directory (optional) - directory where export/import will write/read files to/from

Note: Recommended heap size is about 2GB

Export / import schema only

java -Xmx2g -Xms2g -cp kani-standalone.jar eyeota.kani.application.schema -c "config.edn" -f "file.cql" (export | import)

Options:

  • -h -- help
  • -c --config - EDN config file
  • -f --file - CQL file to be exported/imported to/from

Export / import specific table

java -Xmx2g -Xms2g -cp kani-standalone.jar eyeota.kani.application.table -c "config.edn" (export <table name> | import <table name> <csv file>)

Options:

  • -h -- help
  • -c --config - EDN config file

Comparing CSV files

Useful to test if import/export works as expected, for example: db can be exported first to a directory data-original, then db is then imported from data-original and then re-exported to data-re-exported. Once that's done, CSV files in both data-original and data-re-exported can be compared (simply using diff might not work because row ordering might have changed)

java -Xmx6g -Xms6g -cp kani-standalone.jar eyeota.kani.application.compare_csv directory-1 directory-2

OR use the provided shell script

./compare-csv.sh directory-1 directory-2

Note: Comparinvg CSV will require much more memory than export / import, it's a good idea to specify 2-3x more heap space than export (just in case).

Using the shell script uses less memory but takes a little bit longer to complete

EDN Config files

{:port              9042                      ; Cassandra native port to connect to (default: 9042)
 :hosts             ["127.0.0.1"]             ; Cassandra hosts (can specify multiple hosts)
 :keyspace          "db_keyspace"             ; Keyspace to work on
 :fetch-size        2000                      ; Cassandra fetch size (if not specified, default to 5000)
 :null-value        "<null>"                  ; What value should be stored in CSV for null values (default: "<null>")
 :table-fetch-size  {"huge_table" 20}         ; Table-specific fetch size (if the row is huge, might want to reduce the
                                              ; number to avoid timeouts)
 :consistency       :quorum                   ; Cassandra read/write consistency level (default: quorum). Accepted values are:
                                              ; [:all :any :each-quorum :local-one :local-quorum :local-serial :one :quorum
                                              ;  :serial :three: two]
 :excluded-tables   ["do_not_backup_table"]}  ; Table whose contents are not to be imported / exported (although the schema will)

Using leiningen

By default running through leiningen will set the heap size to 2gb

Export / import both schema and tables for a specific keyspace

lein run -c "config.edn" [-d "data"] (export | import)

Export / import schema only

lein run -m eyeota.kani.application.schema -c "config.edn" -f "file.cql" (export | import)

Export / import specific table

lein run -m eyeota.kani.application.table -c "config.edn" (export <table-name> | import <table-name> <csv-file>)

Comparing CSV files

lein with-profile +6gheap run -m eyeota.kani.application.compare-csv directory-1 directory-2

Build

Test

lein with-profile +test check - runs both code quality and tests

lein quality - runs only code quality checks

lein with-profile +test test - runs test

Build

lein build - runs test and copies libraries into target

lein jar - creates jar file

lein uberjar - creates self-contained jar

License

Copyright © 2016 Eyeota

Distributed under the GNU Lesser General Public License v3.0

About

Cassandra backup/restore solution - alternative to cqlsh COPY FROM/TO command

License:GNU Lesser General Public License v3.0


Languages

Language:Clojure 98.8%Language:Shell 1.2%