arhimondr / tempto

A testing framework for Presto

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tempto - test framework

This project allows developers to write and execute tests for SQL databases running on Hadoop. Individual test requirements such as data generation, HDFS file copy/storage of generated data and schema creation are expressed declaratively and are automatically fulfilled by the framework. Developers can write tests using Java (using a TestNG like paradigm and AssertJ style assertion) or by providing query files with expected results.

Prerequisites

You will need the following software to be installed on the machine running the framework:

  • Java >= 1.8
  • Python >= 2.6 (if you use the custom launcher that comes with the framework)

Currently we only support HDFS as a datastore. That means that on your Hadoop cluster you'll need the following:

  • Running Hadoop cluster with WebHDFS
  • We suggest that cluster support XAttr metadata. Having that feature enabled improves test performance slightly.

Basic concepts

  • Requirement - the set of resources a test needs in order to run, e.g. data stored HDFS, Hive tables, etc.
  • Test case - test of single functionality e.g. query.
  • Test group - logical grouping of test cases. For example one could define a join group, a group by group, a window function group etc. in order to test different SQL functionality.
  • Test context - object used to store context information specific to a test.
  • Java test - test written in Java, annotated with @Test, following the TestNG convention.
  • File based test - test written by specifying the query to run and the corresponding result using files.

Setup

Note: the machine running the framework and tests does not have to be the same as the machine or set of machines running your SQL on Hadoop database. For example, the framework and tests can be running on a Jenkins slave and the framework will remotely interact with your cluster.

TODO: we should include here information which jar to use as dependency, and where to put file with properties, how to setup maven plugins...

Logging

Tempto uses SLF4J for logging.

Log file per test

If you are using log4j as your SLF4J backend we provide an appender which allows logging output of each test and suite fulfillment process to separate files. To use that configure LOG4J appender as below:

log4j.rootLogger=INFO, TEST_FRAMEWORK_LOGGING_APPENDER
log4j.appender.TEST_FRAMEWORK_LOGGING_APPENDER=com.teradata.logging.TestFrameworkLoggingAppender
log4j.category.com.teradata.tempto=DEBUG
log4j.category.org.reflections=WARN

With this appender for each test suite run new logs directory is created within /tmp/tempto_logs. Name of directory corresponds to time when Tempto is run (e.g. /tmp/tempto_logs/2015-04-22_15-23-09). Log messages coming from different tests are logged to separate files.

Example contents of log directory:

com.facebook.presto.tests.hive.TestAllDatatypesFromHiveConnector.testSelectAllDatatypesOrc_2015-04-22_15-23-09
com.facebook.presto.tests.hive.TestAllDatatypesFromHiveConnector.testSelectAllDatatypesParquetFile_2015-04-22_15-23-09
com.facebook.presto.tests.hive.TestAllDatatypesFromHiveConnector.testSelectAllDatatypesRcfile_2015-04-22_15-23-09
com.facebook.presto.tests.hive.TestAllDatatypesFromHiveConnector.testSelectAllDatatypesTextFile_2015-04-22_15-23-09
com.facebook.presto.tests.hive.TestAllDatatypesFromHiveConnector.testSelectBinaryColumnTextFile_2015-04-22_15-23-09
com.facebook.presto.tests.hive.TestAllDatatypesFromHiveConnector.testSelectVarcharColumnForOrc_2015-04-22_15-23-09
SUITE_2015-04-22_15-23-09

If you want to override root location of logs you can use com.teradata.tempto.root.logs.dir

java -Dcom.teradata.tempto.root.logs.dir=/my/root/logs/dir ...

logging test id

Tempto sets up 'test_id' entry in SLF4J logs context (MDC). It corresponds to name of test currently being run. It can be used in logging patterns. If you are using log4j as a backend you can use it as below:

log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.Target=System.out
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.conversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L [%X{test_id}] - %m%n

Example test run

The steps below will run the example tests that come with the framework. They act as a basic smoketest to ensure that you've setup everything properly.

  • Build the framework:
$ cd tempto
$ ./gradlew clean build
BUILD SUCCESSFUL
   
$Total time: 2 mins 47.263 secs
  • Set configuration properties in the following configuration file: tempto/tempto-examples/src/main/resources/tempto-configuration.yaml. The most important settings you'll need to change are the WebHDFS host, the Hive and Presto JDBC URLs. For more details please refer to the Configuration section below.

  • Ensure that WebHDFS, Hive and Presto are running.

  • Run tests using the provided test launcher:

$ cd tempto
$ bin/tempto \
     --tests-classpath tempto-examples/build/libs/tempto-examples-all.jar \
     --tests-package=com.teradata.tempto.examples \
     --exclude-groups quarantine \
     --report-dir /tmp/test-reports
Loading TestNG run, this may take a sec.  Please don't flip tables (╯°□°)╯︵ ┻━┻
...
[2015-04-02 15:21:48] Completed 18 tests
[2015-04-02 15:21:48] 17 SUCCEEDED      /      1 FAILED      /      0 SKIPPED
[2015-04-02 15:21:48] For tests logs see: tempto_logs/2015-04-02_15-15-16
See /tmp/test-reports/index.html for detailed results.
  • The framework will print on your console whether a test passed or failed. A more detailed report is available at tempto/tempto-core-build-reports/html/index.html. Note that one test (com.teradata.tempto.examples.SimpleQueryTest.failingTest) is made to fail on purpose.

Configuration

The test execution environment is configured via a hierarchical YAML file. The YAML file is loaded from the classpath and it must be named tempto-configuration.yaml. If tempto-configuration-local.yaml file is present on classpath it will be also loaded and will overwrite settings defined in tempto-configuration.yaml file.

Configuration files locations can be overidden by using following java system properties:

  • tempto.configuration - for overriding global configuration file location

  • tempto.configuration.local - for overriding local configuration file location

 java ... -Dtempto.configuration=classpath:my_configuration.yaml \
          -Dtempto.configuration.local=file:/tmp/my_local_configuration.yaml

If you start tests using helper tempto script you can use --tempto-configuration and --tempto-configuration-local options to override configuration files.

The file contains the following configuration sections:

  • hdfs

This section is used to configure how the framework accesses HDFS. During the fulfillment process, the framework accesses HDFS through the WebHDFS REST API. In your Java tests you may also access HDFS through the HdfsClient interface. Below is an example hdfs configuration section:

hdfs:                     # HDFS related settings
  username: hdfs          # username to use for accessing HDFS
  webhdfs:
    host: master          # hostname exposing HDFS REST interface
    port: 50070           # port of HDFS REST interface
  • databases

Currently we support only JDBC based database connections. Multiple such connections may be defined in this section of the configuration. By default, tests and queries are executed using the connection named "default". You can change "default" to point to whichever JDBC connection you want to query against (see example below). You will need to define a connection for every database that will need to be access during the test run. For example, if you'd like the framework to create tables for you in Hive, you'll have to specify connection parameters for Hive. TODO: what if you've already created all your tables in hive, do you still need to provide connection parameters?

databases:           # database connections
  default:           # default connection to query against
    alias: presto    # points to connection defined below that you'd like to use as the default
 
  hive:              # connection named hive
    jdbc_driver_class: org.apache.hive.jdbc.HiveDriver                                # fully qualified JDBC driver classname
    jdbc_url: jdbc:hive2://master:10000                                               # database url
    jdbc_user: hdfs                                                                   # database user
    jdbc_password: na                                                                 # database password
    jdbc_pooling: false                                                               # (optional) should connection pooling be used (it does not work for Hive due to driver issues)
    jdbc_jar: tempto-hive-jdbc/build/libs/hive-jdbc-fat.jar                   # (optional) jar to be used for obtaining database driver. Should be used in case when we cannot have it in global classpath due to class conflicts. (e.g. hive driver conflicts with presto driver)
    table_manager_type: hive
 
  presto:           # connection named presto
    jdbc_driver_class: com.facebook.presto.jdbc.PrestoDriver
    jdbc_url: jdbc:presto://localhost:8080/hive/default
    jdbc_user: hdfs
    jdbc_password: na

  psql:           # posgresql
    jdbc_driver_class: org.postgresql.Driver
    jdbc_url: jdbc:postgresql://localhost:5432/postgres
    jdbc_user: blah
    jdbc_password: blah
    jdbc_pooling: true
    table_manager_type: jdbc

If we want framework to provision tables we need to specify table_manager_type for database connection. Currently we support two table manager types:

  • hive: manages tables in HIVE. Is applicable to HDFS backed hive database connection.
  • jdbc: manages tables in standard SQL JDBC based database. Tables are populated using "INSERT INTO " statements.

Current limitation is that only one table manager for each type can be defined.

  • tests

This section is used to configure various properties used during test execution.

tests:
  hdfs:
    path: /tempto  # where to store test data on HDFS

Java based tests

Example

See com.teradata.tempto.examples.SimpleQueryTest in tempto-examples module.

Requirements

Tests may declare requirements that are fulfilled by the framework during suite/test initialization.

Explicit RequirementsProvider

You can specify Requirements for your test through the @Requires annotation. Test methods and whole classes can be annotated with @Requires. If a class is annotated with @Requires, that is the same having every test method in that class be annoted with @Requires. The parameter passed to @Requires must be a class that extends the RequirementsProvider interface. This interface has a single method, getRequirements() that returns an instance of a Requirement object. You may notice that a better way of passing in requirements would be to supply @Requires with an instance but Java only allows constant argument annotations.

Here's an example implementation of the RequirementProvider interface:

private final class SimpleTestRequirements
        implements RequirementsProvider
{

    @Override
    public Requirement getRequirements()
    {
        // ensure TPCH nation table is available
        return new ImmutableHiveTableRequirement(NATION);
    }
}

In this case, SimpleTestRequirements encapsulated the single requirement of an immutable Hive table called nation.

The implementation of RequirementProvider is then passed as an argument to the @Requires annotation:

    @Test(groups = "query")
    @Requires(SimpleTestRequirements.class)
    public void selectAllFromNation()
    {
        assertThat(query("select * from nation")).hasRowsCount(25);
    }

If multiple @Requires annotations are stacked on top of one another on the same method or class, then the requirements they return are combined.

Test class being RequirementsProvider

Alternatively one can make Test class itself implement RequirementProvider. Then requirements returned by the implemeneted getRequirements method will be applied to all test methods in class.

private final class MyTestClass
        implements RequirementsProvider
{

    @Override
    public Requirement getRequirements()
    {
        // ensure TPCH nation table is available
        return new ImmutableHiveTableRequirement(NATION);
    }
    
    @Test
    public void someTestMethod() {
        assertThat(query("select * from nation")).hasRowsCount(25);
    }
}

Requirement Types

This section lists the supported Requirement implementations that you can return from RequirementProvider#getRequirement().

ImmutableTableRequirement

When this requirement is fulfilled, it will create a table in the underlying database. It is called immutable because the contract with the test developer is that they will not, within the logic of their test, alter the state of the table (drop it, re-create it under a different name, delete data). This is done so that requirements can be recycled between tests. If 10 tests require an immutable table, that table will only be created once and the framework assumes it will be available for all tests.

Table Definitions

ImmutableHiveTableRequirement is parametrized with TableDefinition. Target database in which table is created depends on TableDefinition instance passed as ImmutableTableRequirement parameter. Currently we have only one implementation: HiveTableDefinition allowing defining tables in Hive. Using ImmutableTableRequirement with HiveTableDefinition requires that a connection named hive is defined in configuration Yaml.

HiveTableDefinition include name, schema and dataSource. HiveTableDefinitionBuilder can be use to create new definition. You need to provide table name, create table DDL template ({0} is substituted with HDFS file location) and DataSource.

Certain commonly used tables, such as those in the TPC-H benchmark, are defined as constants and can be found in com.teradata.tempto.fulfillment.table.hive.tpch.TpchTableDefinitions.

TODO: we need to clarify to the user how they create tables.

For example this is how the nation table is built:

    public static final HiveTableDefinition NATION =
        HiveTableDefinition.builder()
                .setName("nation")
                .setCreateTableDDLTemplate("" +
                        "CREATE TABLE nation(" +
                        "   n_nationkey     INT," +
                        "   n_name          STRING," +
                        "   n_regionkey     INT," +
                        "   n_comment       STRING) " +
                        "ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' " +
                        "LOCATION '{0}'")
                .setDataSource(new TpchDataSource(TpchTable.NATION, 1.0))
                .build();

Best way to crate ImmutableTableRequirement is to use TableRequirements.immutableTable factory method.

MutableTableRequirement

When this requirement is fulfilled it will crate a table in underlying database. But unlike ImmutableTableRequirement framework does not assume table will not be modified. Each test using ImmutableTableRequirement will have a separate instance of table created in database with unique name.

To access name of table in database from test code MutableTablesState must be used. See following example.

    private static class MutableTableRequirements implements RequirementsProvider
    {
        @Override
        public Requirement getRequirements()
        {
            mutableTable(NATION, "table", LOADED)
        }
    }

    @Test(groups = "query")
    @Requires(MutableTableRequirements.class)
    public void testWithMutableTable()
    {
        MutableTablesState mutableTablesState = testContext().getDependency(MutableTablesState.class);
        TableInstance tableInstance = mutableTablesState.get("table");
        assertThat(query("select * from " + tableInstance.getNameInDatabase())).hasAnyRows();
    }

One can request that mutable table is in one of three states:

  • PREPARED - no table is crated actually - but MutableTableState entry is crated for table and unique name is generated
  • CRATED - table is crated but is not populated with data
  • LOADED - table is crated and populated with data

Executing queries

Queries are executed via implementations of the QueryExecutor interface. Currently the only implementation is JdbcQueryExecutor. Each database configured in the YAML file will have its own query executor with the same name. To retrieve that executor and issue queries against that database you can use the ThreadLocalTestContextHolder.testContext().getDependency(...) as shown below.

    // execute query against the default database
    QueryResult defaultQueryResult = QueryExecutor.query("SELECT * FROM nation");
    
    // Retrieve QueryExecutor for another, non-default, database
    QueryExecutor prestoQueryExecutor = ThreadLocalTestContextHolder.testContext().getDependency(QueryExecutor.class, "presto");
    QueryResult queryResultPresto = prestoQueryExecutor.query("SELECT * FROM nation");

To use default QueryExecutor one can use helper static method QueryExecutor.query (see examples).

Query assertions

The QueryAssert class allows you to perform AssertJ style assetions on QueryResult objects. For more information on the available types of assertions, check the methods of QueryAssert.

Example assertions:

      @Requires(TpchRequirements.class)
      @Test
      public void testContainsExactlyInOrder()
      {
          assertThat(query("SELECT n.nationkey, n.name, r.name FROM nation n " +
                  "INNER JOIN region r ON n.regionkey = r.regionkey " +
                  "WHERE name like 'A%' AND n.created > ? ORDER BY n.name", LocalDate.parse("2015-01-01")))
                  .hasColumns(INTEGER, VARCHAR, VARCHAR)
                  .containsOnly(
                          row(1, "ALGERIA", "AFRICA"),
                          row(7, "ARGENTINA", "SOUTH AMERICA"));
      }

Convention based file tests

Query tests can be written by providing the framework with a sql query file and a file with the expected result. These tests are called convention based because of the directory structure assumed by the framework, namely the directory convention.

Moreover you can define datasets that can be queried in your tests. These dataset files contain the data along with the corresponding DDL. For examples take a look at files in the tempto-examples/src/main/resources/sql-tests directory. The directory tree looks like the following:

~/repos/tempto/tempto-examples/src/main/resources$ tree .
.
├── sql-tests
│   ├── datasets
│   │   ├── sample_table.data
│   │   ├── sample_table.data-revision
│   │   └── sample_table.ddl
│   └── testcases
│       ├── generated
│       │   └── nation.generator
│       ├── nation
│       │   ├── after
│       │   ├── allRows.result
│       │   ├── allRows.sql
│       │   └── before
│       ├── sample_table
│       │   ├── allRows.result
│       │   └── allRows.sql
│       └── sample_table_insert
│           └── insert.sql
├── suites.json
└── tempto-configuration.yaml

Data sets

Data sets are stored in sql-tests/datasets directory. To create an example table, you will need to create three files:

  • TABLE_NAME.ddl - DDL for data.
  • TABLE_NAME.data - file containing raw data.
  • TABLE_NAME.data-revision - file with data marker. If you change your data, you should also increase this revision marker, so the new table data is automatically reloaded.

TABLE_NAME.ddl

Contains template for SQL for creating table. Header specifies type of table manager which should be used for this table definition. Can be jdbc or hive.

HIVE tables

Example:

-- type: hive
CREATE TABLE %NAME% (
  id INT,
  name STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
LOCATION '%LOCATION%'

Template must contain:

  • %NAME% pattern which will be replaced with table name.
  • %LOCATION% pattern which will replaced with HDFS path where data will be uploaded.
JDBC tables

Example:

-- type: jdbc
CREATE TABLE %NAME% (
  id INT,
  name VARCHAR(100)
)

Template must contain %NAME% pattern which will be replaced with table name.

TABLE_NAME.data

Contains table data.

HIVE tables

For HIVE table manager content is not analyzed. Data file is just uploaded to HDFS.

JDBC tables

Example:

-- delimiter: |; trimValues: false; types: INTEGER|VARCHAR
3|A|
2|B|
1|C|

Header parameters are:

  • delimiter - data columns delimiter (default: |)
  • trimValues - remove leading and trailing whitespace from data values (default: false)
  • types - column types (required). Supported column types are:
    • CHAR, VARCHAR, LONGVARCHAR, LONGNVARCHAR - character string
    • BOOLEAN - true/false
    • BIT - 0/1
    • TINYINT, SMALLINT, INTEGER, BIGINT - integer value
    • REAL, FLOAT, DOUBLE - floating point value
    • DECIMAL, NUMERIC - decimal point value
    • DATE - date; format: yyyy-[m]m-[d]d
    • TIME, TIME_WITH_TIMEZONE - time; format: hh:mm:ss
    • TIMESTAMP, TIMESTAMP_WITH_TIMEZONE - timestamp; format: yyyy-M-d H:m:s.SSS

TABLE_NAME.data-revision

Currently only HIVE table manager makes use of that. It should contain any string, which must be updated when table contents is changed.

TODO: where should be user create the sql-tests directory? Right now it's resources under the examples dir, where should they put it?

Tests

Test case files are stores in sql-tests/testcases_ directory. The directory right under the testcases_directory is the logical equivalent of a TestNG test class. Each logical test is pair of files:

  • TEST.sql - query to be invoked where the first line of file can be a SQL comment specifying query execution requirements:
-- database: hive; groups: example_smoketest,group2; tables: nation;
SELECT * FROM nation

This test contains queries that should be executed against the Hive database. Only results of the last query will be checked agains result file. In addition, the test is part of two separate TestNG groups: example_smoketest and group2.

In above example queries will be run against database hive (see database key in a first row). Test require immutable table nation to be created and loaded before query execution (see tables key).

  • TEST.result - file with the expected result of the query. The first line can be a SQL comment with query assertion requirements:
-- delimiter: |; ignoreOrder: false; types: INTEGER|VARCHAR|INTEGER|VARCHAR
0|ALGERIA|0| haggle. carefully final deposits detect slyly agai|
...

Above we set the | character as the delimiter, we ignore the order of rows during comparison and that we expect the columns to be of the specified types. You always need to provide types because when checking the result, the framework will have to cast String to the given type. This is of course terrible for performance, but you're trading that for convenience (for example a test writer that cannot/does not want to write Java).

Both SQL and result files honor comments which begin with --- prefix.

It is possible to define both queries and results in single TEST.sql file. Such file is divided into sections. Each section is separated by --! prefix. First section contains global properties. Next sections contain queries and results separately. Each section can override global properties. Additionally, each section can have a name. An example of such file would be:

-- database: hive; groups: example_smoketest,group2
-- delimiter: |; ignoreOrder: false; types: INTEGER|VARCHAR|INTEGER|VARCHAR
--! name: query_1
SELECT * FROM nation WHERE id=0
--!
0|ALGERIA|0| haggle. carefully final deposits detect slyly agai|
--! name: query_2
-- groups: additional_group
SELECT * FROM nation WHERE id=1
1|USA|1| foo bar|
--!

You are also able to add custom before and after scripts for your test. Those are executed before and after each test case. TODO more info on scripts, what they should be named, what they can contain.

Using tables across databases.

It is possible (which is useful for testing presto for example), to use a table which is created in one database (e.g. hive, psql) while sending test query to other database (e.g. presto). Take a look at the example below. Here query is issued via presto JDBC, while nation table could be created somewhere else. In order to determine where nation should be created (find appropriate requirements) below matching flow is used:

  • If database is specified explicitly as prefix for table name (e.g psql.nation) then requirement for table in that database will be generated. Note that database must have a table_manager with type matching table manager of a table or error will be thrown.
  • If no database is specified explicitly then if there is only one database with table manager of type equal to table type then this database will be picked up.
  • As fallback database to which test query would be send is used. Database type manager type vs table type checking is done.
-- database: presto; tables: nation;
SELECT * FROM nation

Here you have an example with an immutable table requirement from database psql.

-- database: presto; tables: psql.nation;
SELECT * FROM nation

Generated tests

TODO (not used right now)

Tests running

Running tests from your IDE

Java based tests can be simply run as TestNG tests.

File convention based tests: TODO

Shell tempto launcher

Tests can be run using the bin/tempto script. This is a wrapper around a command line invocation of the TestNG JVM. For a verbose description of all the execution options supported by the bin/tempto script run:

$ ./bin/tempto --help

Basic parameters

For running tests you have to specify at the least the following arguments:

  • classpath - classpath will be scanned to find tests to be run and it may be either a set of jars or directories or a mix of both.
  • tests-package - defines java package containing tests. For Java based tests only tests residing in this package (or some child package of this) will be executed. Additionally all convention based tests found in class path will be executed.

Example run command would look like this:

$ ./bin/tempto --tests-classpath tempto-examples/build/libs/tempto-examples-all.jar \
                     --tests-package=com.teradata.tempto.examples

In above example we set classpath to contain two entries:

  • tempto-examples/src/main/resources - this is directory entry
  • tempto-examples/build-libs/tempto-examples.jar

And tests package is set to com.teradata.tempto.examples.

Tests selection

By default all tests found in classpath are executed but user may limit that.

--groups List of groups to be executed.
--tests List of tests to be executed. For java based tests test name is just fully qualified method name e.g. com.teradata.tempto.examples.SimpleQueryTest.selectCountFromNation. For sql convention based tests name looks like: sql_tests.testcases.sample_table.allRows. Tests which name ends with one of patterns specified in --tests parameter will be executed.
--classes List of fully qualified java classess to be executed. Applies to java based tests only.
--exclude-groups List of test groups which should be excluded from execution.

Debugging

If you want to run tests from tempto script under debuger use --debug parameter. When this parameter is specified Tempto will suspend execution at beginning and wait for debugger on TCP port 5005.

$ bin/tempto \
     --tests-classpath tempto-examples/build/libs/tempto-examples-all.jar \
     --tests-package=com.teradata.tempto.examples \
     --exclude-groups quarantine \
     --report-dir /tmp/test-reports
     --debug
Loading TestNG run, this may take a sec.  Please don't flip tables (╯°□°)╯︵ ┻━┻
Listening for transport dt_socket at address: 5005

At this point you may use your IDE of choice to connect to tempto VM.

Developers

For every available Requirement there is one possible Fulfiller. Currently that mapping is hard coded. All requirements and their corresponding fulfillers are packed into tempto-core-all.jar. In the future we envision separating requirements and their possible fulfillers into separate jars.

Acknowledgements

A special thanks to the entire Hadapt team for inspiring the architecture of this framework.

About

A testing framework for Presto

License:Other


Languages

Language:Java 75.9%Language:Groovy 19.3%Language:Python 4.7%Language:Shell 0.1%