DataKitchen DataOps Documention

Tests

Automated tests increase confidence in the quality of your recipes' output.

Introduction

Quality Assurance

Automated testing is foundational to DataOps and serves to provide confidence in the quality of Recipe end deliverables. DataKitchen Tests span from simple row counts to sophisticated business logic and may be configured to stop a run in place for investigation, create a warning, or simply log results.

Test Coverage

The lifecycle of a recipe typically includes the application of increasing test coverage over time, often as future insurance in response to encountered issues. For example, if a particular data vendor has a history of data quality or delivery timeliness issues, automated testing can help hold them to account to an SLA. Moreover, automated tests prevent erroneous or missing data from impacting the quality of your end deliverables.

Declaration

Tests may be declared in data sources, data sinks, and in node notebooks. Tests in data sources and sinks evaluate built-in runtime variables, while tests in node notebook's evaluate the results of key execution or custom runtime variables generated in a container node.

{
    "type": "DKDataSource_s3",
    "name": "s3_datasource",
    "config": {{s3config}},
    "set-runtime-vars": {
        "key_count": "count_s3_files"
    }, 
    "wildcard": "*",
    "keys": {
        "example-explicit-key": {
            "file-key": "example.csv",
            "use-only-file-key": true,
            "set-runtime-vars": {
                "row_count": "s3_row_count"
            }
        }
    },
    "tests": {
        "test-count-files-pulled": { 
            "test-variable": "count_s3_files",
            "action": "stop-on-error",
            "test-logic": "count_s3_files > 0", 
            "keep-history": true,
            "description" : "Stops the OrderRun if no files are copied from S3."
        }, 
        "test-row-count":{
            "test-variable": "s3_row_count",
            "action": "stop-on-error",
            "test-logic": "s3_row_count > 100 and s3_row_count < 500",
            "keep-history": true,
            "description" : "Stops the OrderRun if the row count of the file copied via the explicit key is not greater than 100 and less than 500."
        }
    }
}
{
    "name": "container-node",
    "dockerhub-namespace": "{{dockerhubconfig.namespace}}",
    "dockerhub-username": "{{dockerhubconfig.username}}",
    "dockerhub-password": "{{dockerhubconfig.password}}",
    "analytic-container": true,
    "image-repo": "{{dockerhubconfig.imagerepo}}",
    "image-tag": "{{dockerhubconfig.imagetag}}",
    "container-input-file-keys": [
        {
            "key": "s3-datasource.*",
            "filename": "from_s3/*"
        }
    ],
    "container-output-file-keys": [
        {
            "key": "s3-datasink.*",
            "filename": "*.csv"
        }
    ],
    "tests": {
        "test1": {
            "test-variable": "nodename.datasourcename.keyname",
            "applies-to-keys": ["nodename.datasourcename.keyname"],
            "action": "stop-on-error",
            "keep-history": "true",
            "test-logic": "nodename.datasourcename.keyname > 0",
            "description" : "Stops the OrderRun if the result of the specified key in the Data Source is not greater than 100."
        }
    }
}

Order of Execution

Tests themselves are executed according to the order in which they are declared in any given JSON configuration file. Tests declared in any given data source or sink are executed only after all keys of said source/sink have been executed successfully. Tests in node notebooks are executed only after the successful execution of the entire node.

Viewing Test Results

Order Run Notifications

Test results are included in order run notifications:

Creating Clear Test Names

Users have latitude when it comes to naming tests. As a goal of DataOps is to provide high coverage of recipes using automated tests, it is best to name tests clearly to facilitate the rapid debugging in the event that a test fails. Test names should adhere to DataKitchen's object-naming syntax.

Order run notifications detail granular test results, which are most useful for quickly determining the root cause of an order run failure.  In this example the recipe has been configured to pass its notifications via [Slack](https://slack.com/) channels.

Order run notifications detail granular test results, which are most useful for quickly determining the root cause of an order run failure. In this example the recipe has been configured to pass its notifications via Slack channels.

Order Run Details

The detailed record for each order run contains summary information regarding the automated tests applied for a given order run.

The detailed record for each order run contains summary information regarding the automated tests applied for a given order run.

Gathering Test Information via DataKitchen's Command Line Tool

Users may also gather test results formation for an order run using DKCloudCommand using the orderrun-info command:

~ $ dk orderun-info --kitchen [KITCHEN] --recipe [RECIPE] --test

Longitudinal Summary

If an order contains multiple order runs, in the case of a recurring scheduled order, for example, then the results from any given test are comparable across multiple order runs for a recipe variation. Charts summarizing these results are available on the order details pages in the DataKitchen UI.

Longitudinal Test Result Charts

Test results for up to the last 20 order runs are displayed.

View the results of a given test across multiple runs of the same order for a recipe variation.

View the results of a given test across multiple runs of the same order for a recipe variation.

Test Properties

Tests are always configured in recipe JSON files, though their specific configuration is somewhat dependent on where they are declared (Data Sources/Sinks vs Node Notebooks).

Field
Description

action

Required. Denotes the Keys to which the test will be applied. The test will loop these keys, assigning the value to the test variable.

log, warning, stop-on-error

applies-to-keys

Optional. Used when declaring a test in a Node Notebook to evaluate the result of a specific Key.

description

Optional. Used to provide additional detail about the purpose of a test.

keep-history

Optional. Determines whether the test results generated by each OrderRun may be stored to create a history used for benchmarking.

true, false

historic-calculation

Optional. Sub-field to test-metric. The type of calculation used for a historic metric.

running-average

  • Holds the average of the recorded values for given metric.

previous-value

  • Holds the last recorded value for given metric.

historic-metric

Optional. Sub-field to test-metric.

test-logic

Required. Contains a logic statement evaluating the test-variable.

test-variable

The variable that holds the value being tested. When using applies-to-keys to test against the result of Key execution, can be set to a dummy value.

type

The type of test performed. This is used to evaluate the value of test-variable as a specific datatype. If the value cannot be cast to the specified type, an error will be thrown. If omitted, the user is responsible for ensuring the correctness of datatypes used in test-logic.

test-contents-as-date, test-contents-as-float, test-contents-as-integer, test-contents-as-string

Legacy Properties

The following properties and syntax are supported, though the syntax described above is highly recommended. Historic calculations continue to use this legacy syntax.

Field
Description

test-compare

Sub-field to test-logic. Declares how to compare test-variable to test-metric.

  • equal-to
  • greater-than
  • greater-than-equal-to
  • less-than
  • less-than-equal-to
  • number-of-times-constant
  • test-parameter
  • test-order

test-metric

Sub-field to test-logic. Parent field to optional historic-calculation and historic-metric fields when performing a historic comparison test. Declares the value the test-variable will be compared against.

Can be a literal (100, "100"), a date expression, or a variable name (runtime or key-associated).

Custom Logic Expressions

Tests are best configured using custom logic expressions that evaluate a runtime variable or result of key execution. Expressions are written and evaluated in Python. Custom logic expressions must resolve to a boolean value, and allow the use of several operators to evaluate the test-variable.

Logic Expression Restrictions on Variable Name Syntax

The use of logic expressions in tests requires that variable names must be only alphanumeric, with '_' also allowed. Variable names are case sensitive.

Operands

Operand

int literal

string literal

variable name

Operators

Operator

and

or

not

xor

in

+, -, /, *, ==, !=

Logic Expression Examples

rowcount > 0 and rowcount < 1000

expected_value in [1,2,3,4]

status not in [0,-1]

total == 1000

status != 0

status_message == 'Success!'

Historical Comparisons

Tests may be declared such that they are evaluated against the results of tests from prior order runs within the same order (same Kitchen-Recipe-Variation combination).

{
    "keys": {
        "get_benchmark_01_row_count": {
            "target-field": "sql",
            "resource-file": "benchmark/benchmark_01.sql",
            "query-type": "execute_query",
            "set-runtime-vars": {
                "row_count": "benchmark_01_row_count"
            }
        }
    },
    "tests": {
        "warning_if_benchmark_01_row_count_less_than_1": {
            "test-variable": "benchmark_01_row_count",
            "action": "warning",
            "type": "test-contents-as-integer",
            "test-logic": "benchmark_01_row_count < 1",
            "keep-history": true,
            "description": "Throws a warning if the row count is less than 1."
        },
        "warning_if_benchmark_01_row_count_less_than_running_average": {
            "test-variable": "benchmark_01_row_count",
            "action": "warning",
            "type": "test-contents-as-integer",
            "applies-to-keys": ["get_benchmark_01_row_count"],
            "test-logic": {
                "test-variable": "benchmark_01_row_count", 
                "test-compare": "greater-than",
                "test-metric": {
                    "historic-calculation": "running-average", 
                    "historic-metric": "benchmark_01_row_count"
                }
            },
            "keep-history": true,
            "description": "Throws a warning if the row count is not greater than its running average across OrderRuns."
        }
    }
}

Updated 29 days ago


Tests


Automated tests increase confidence in the quality of your recipes' output.

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.