DataKitchen DataOps Documention

Variables & Template Engine

Introduction

The flexible execution and deployment of data pipelines are empowered by their parameterization. Pipelines orchestrated by DataKitchen leverage variables of various types, scopes, classes, and override priorities. Recipe files are processed and loaded on demand, using a template engine based on Jinja2. The template engine functions as a preprocessor for the recipe, compiling Jinja templates and available variables in each recipe file before it is processed during order run execution.

Naming Conventions

Variables should be named according to DataKitchen's object-naming best practices.

Naming Variables

DataKitchen variable names support alpha-numeric and underscore characters and are case-sensitive. Variable names cannot start with numbers and must not contain hyphens or dashes.

See Supported Naming Conventions for more information.

Size Limit

Variables over 1MB in size are not supported.

Types

All variables are converted to Python built-in types based on their Python representation when loaded.

JSON Type
Python Type

Object

Array

Integer

Float

Boolean

String

Dictionary Variables

Variables holding dictionaries are rendered as JSON.

"string_variable": "sample value",

"integer_variable": 1,

"boolean_variable:" true,
  
"float_variable": 3.14159,  

"variable_as_list_integers": [1,2,3,4],

"variable_as_list_strings": [
      "stringlist_A",
      "stringlist_B",
      "stringlist_C",
      "stringlist_D"
    ],

"dict_variable": {
    "bucket": "my_bucket",
    "s3-access-key": "***********",
    "s3-secret-key": "***********",
}

"referencingothervariable": "copy of {{astringvariable}}"

Syntax

Standard

Variables in recipe files are referenced using the {{ }} Jinja syntax.

Wrap Variable References in Quotes When Defining Them

Variable definitions should always be enclosed in quotes "{{ }}" to set the value as string type.

"key": "{{value}}"

# Spaces are optional
"key" : "{{ value }}"

{
  "variable-list" : {
    "var1" : "value1",
    "var2" : "{{now}}",
    "var3" : "{{date_format(now,'%Y%m%d')}}",
    "var4" : "1",
    "var5" : "2",
    "var6" : "{{ int(value4) + int(value5) }}",
    "var7" : "Datetime: {{now}}"
  }
}
  • Legacy Runtime Variable Syntax: The legacy syntax for custom runtime variables declared in container nodes uses the $ syntax.
$my_custom_runtime_variable

Tests

When setting a variable as the test-variable when declaring a test, reference the variable in quotes without Jinja {{ }} syntax.

"tests": {
        "test-count-files-pulled": { 
            "test-variable": "count_s3_files",
            "action": "stop-on-error",
            "test-logic": "count_s3_files > 0", 
            "keep-history": true
        }, 
        "test-row-count":{
            "test-variable": "s3_row_count",
            "action": "stop-on-error",
            "test-logic": "s3_row_count > 100 and s3_row_count < 500",
            "keep-history": true
        }
    }

Secrets

Variables are often set to sensitive values which are stored securely as Secrets. When defining variables with secret values a special syntax is used. These definitions are almost always set at the kitchen level in support of deployments, as they are most often associated with infrastructure.

"redshiftusername":"#{vault://postgresql/username}",
"redshiftpassword":"#{vault://postgresql/password}"

Scope

Global Variables

All variables are global by default. Their lifecycle starts as soon as they are created and live with the recipe during its execution. Each assignment on these variables overwrites their previous value, though previous values are retained. Once declared, they can be referenced by other files or even other variables using standard Jinja2 referencing syntax {{ }}.

Scoped Variables

When a variable is assigned a new value during the execution of the recipe, previous values are retained. It is possible to reference these previous values by scoping a variable reference to the position in the recipe graph where said variable held a specific value. Variables may be scoped to a given node and/or data source or sink with the recipe.node.datasource/sink.variable syntax.

# Reference the latest value set for the variable
{{my_variable}} 

# Reference the last value set for the variable for a specific Node
{{recipe.node1.my_variable}} 

# Reference the last value set for the variable for a specific Data Source
# or Data Sink
{{recipe.node1.datasource1.my_variable}} 
  • Troublesome Object Names: Node and data source and sink names with unsupported characters like dashes should use the following syntax as a workaround.
# Reference the latest value set for the variable
{{my_variable}} 

# Reference the last value set for the variable for a specific Node
{{recipe.[node-1].my_variable}} 

# Reference the last value set for the variable for a specific Data Source
# or Data Sink
{{recipe.[node-1].[datasource-1].my_variable}} 

Private Variables

It is possible to prevent a variable being set as global and instead isolate its existence to the scope where it was defined, including its children scopes. This is accomplished using the _ prefix for the variable name.

For example, a private variable set in a node can be accessible by its data sources and sinks, but not by other nodes. Moreover, a private variable set in a data source may be accessed by resource files being referenced, but not by the Node or its data sinks.

{
   "set-runtime-vars" : {
   	  "row_count" : "_privateRowCount"
   }
}

Page Variables

Page Variables are the exception to the default global variable rule (when scoping syntax is not used). As we will see later, page variables are created within Jinja templates and their lifecycle is limited to the processing of a single file, after which they are discarded.

{% set items = [1,2,3] %}

{% for item in items %}
....
{% endfor %}

Classes

Pre-Defined in Version Control

Variables may be pre-defined in a recipe's variables.json or variations.json files, which are in turn saved into version control.

Pre-Defined for Kitchen Environment

Variables may be pre-defined but outside of recipe version control. These definitions are used to define the virtual kitchen environment and are most often associated with infrastructure and tooling.

Use the **Configure Kitchen** to define kitchen-level overrides via the web app.

Use the Configure Kitchen to define kitchen-level overrides via the web app.

Manage Kitchen Overrides at the Command Line

Use the kitchen-config command.

Order-Submission Variables

Users may opt to inject a variable definition override when cooking an order run. These values override any pre-defined variable values in variables.json, variations.json, and kitchen-level overrides.

Pass Overrides at Time of Order Submission via CLI

Order-submission variables may be submitted with the order-run command.

System Variables

These variables are read-only. Reference them using Jinja {{ }} syntax.

  • Compile Anytime
Name
Description
Type

CurrentKitchen

The name of the current kitchen.

String

CurrentVariation

The name of the current variation for which the recipe is being compiled.

String

now

Represents current datetime in format YYYY-MM-DD HH:mm:SS.SS. Built-in Functions may be applied to parse the returned result.

ParentKitchen

The name of the immediate parent kitchen. For the master kitchen, the parent is always master.

String

RecipeName

The name of the current recipe.

String

WarningCount

The number of warnings logged during an order run.

Integer

WorkDir

The current working directory, also known as the recipe root directory.

String

  • Compile at Runtime: Many System Variable values are only set once an order run has commenced.
Name
Description
Type

Agent

The host name of the Kubernetes agent where the recipe runs.

String

CurrentOrderId

The ID of the current order.

String

CurrentOrderRunId

The ID of the current order run.

String

OrderRunHistoryKitchen

The name of the kitchen marked as history-kitchen, to be used in metrics.

String

PreviousOrderRunId

The ID of the previous order run for the recipe variation within the same kitchen.

String

PreviousOrderRunIdList

An array containing the previous order run IDs for the recipe variation within the same kitchen.

Array of Strings

ResumedOrderRunId

The ID of the original failed order run, compiled at runtime of the resumed order.

String

ScheduledOrderRunTime

The time an order run was scheduled to execute per its order's schedule. Used with actual runtime to calculate the delay for a run, which will always be less than the configured Epsilon interval. Users can use this variable in any .json or text file. Python (.py) and shell scripts (.sh) cannot use this variable directly. If using in a non-analytic container, this variable can be passed as a command line: /bin/bash -c "echo ScheduledOrderRunTime > somefile"

Long Integer

Runtime Variables

When a recipe variation begins its execution, any declared runtime variable values are defined and redefined as they are encountered in recipe processing.

Variables May be Re-defined by a Recipe's Execution

With the exception of the Graph definition and System Variables, all variables may be redefined as a result of a recipe's execution.

Runtime variables consist of two subclasses

File Evaluation Sequence

The underlying sequence with which recipe variation files are executed is as follows.

  1. Load variables.json
  2. Process and resolve variables, merge with overrides
  3. Load recipe-level description.json. Choose the right graph based on its config.
  4. Load Graph.
  5. Send run startup notification, if configured.
  6. For each Node (step) in the graph, process, load, and add to the recipe-node dictionary node-level description.json and notebook.json files.
  7. If the node has data sources, the following occurs for each data source in node (indeterminate order).
    • Process and load user-named config file.
    • Validate and execute all its Keys (substeps) in their defined order.
    • Run all defined Tests in their defined order.
  8. Execute the node notebook.json.
    • For container nodes, process files in /docker-share and put them into the container.
  9. If the node has data sinks, the following occurs for each data sink in node (indeterminate order).
    Note: If a container node fails during processing, the data sinks will not execute.
    • Process and load user-named config file.
    • Validate and execute all its Keys (substeps) in their defined order.
    • Run all defined Tests in their defined order.
  10. Send run success or failure notification, if configured.

Variable Availability Timing

  1. The initial set of variables (variables.json, variations.json overrides, system variables) are available from the very beginning of the recipe execution.
  2. Variables produced in a node are available on its data sources and sinks.
  3. Variables produced in a node data source are available in subsequent node data sources, all node data sinks, and subsequent nodes.
  4. Variables produced in a data source of a container node are available in /docker-share files for that node. They can be referenced in /docker-share/config.json or any other related file that goes into the container.
  5. Variables produced in a data sink are available in subsequent node data sinks and downstream nodes.
  6. Variables produced across the whole recipe are available for order run notifications.
  7. Variables produced in node data sources and sinks are available in node tests, but not in compile time, only as test variables (see Tests documentation for further details).

Override Hierarchy

Variables declared in a recipe may be overridden by alternate definitions of said variable. The override behavior follows a strict hierarchy.

  • Order-time overrides override variable definitions for kitchen-level overrides, variation overrides in variations.json and baseline definitions in variables.json
    • Kitchen-level overrides override variable definitions set in variables.json and variations.json
      • Variation-level overrides in variations.json override values in variables.json
        • Base variables are declared in variables.json

Generated runtime variables may override the variable definitions described above. System variables are read-only and cannot be overridden.

Overriding Dictionary Variables

If you wish to override a single key's value within a dictionary variable you must redefine the entire dictionary as an override. One cannot override a single key.

Built-in Functions

Useful built-in functions are available for variable transformation. When using built-in functions, only use the Jinja "{{ }}" syntax once, outside of the function.

# Proper syntax
"add_days_func": "{{add_days(now, 1)}}

# Improper syntax
"add_days_func": "{{add_days({{now}}, 1)}}

Transform Datetimes

Here {{now}} is most often used for date in the examples below.

Function Name

add_days(date,days)

Adds days to a given datetime object; days can be positive or negative.

add_weeks(date,weeks)

Adds weeks to a given datetime object; weeks can be positive or negative.

add_months(date,months)

Adds months to a given datetime object; months can be positive or negative.

add_years(date,years)

Adds years to a given datetime object; years can be positive or negative.

date_format(date,format)

Returns a string representation of a given datetime object based on a format string.

date_parse(date_string, format)

Parses a string representation of datetime and returns a datetime object based on a format string.

{
    "now_var": "{{now}}", 
    "add_days_func": "{{add_days(now, 1)}}", 
    "subtract_days_func": "{{add_days(now, -1)}}", 
    "add_weeks_func": "{{add_weeks(now, 1)}}", 
    "subtract_weeks_func": "{{add_weeks(now, -1)}}", 
    "add_months_func": "{{add_months(now, 1)}}", 
    "subtract_months_func": "{{add_months(now, -1)}}", 
    "add_years_func": "{{add_years(now, 1)}}", 
    "subtract_years_func": "{{add_years(now, -1)}}", 
    "weekday_func": "{{date_format(now, '%A')}}", 
    "weekday_short_func": "{{date_format(now, '%a')}}", 
    "day_of_week_func": "{{date_format(now, '%w')}}", 
    "day_of_month_func": "{{date_format(now, '%d')}}", 
    "day_of_year_zero_padded_func": "{{date_format(now, '%j')}}", 
    "week_number_sunday_func": "{{date_format(now, '%U')}}", 
    "week_number_monday_func": "{{date_format(now, '%W')}}", 
    "month_func": "{{date_format(now, '%B')}}", 
    "month_short_func": "{{date_format(now, '%b')}}", 
    "month_padded_func": "{{date_format(now, '%m')}}", 
    "year_func": "{{date_format(now, '%Y')}}", 
    "year_short_func": "{{date_format(now, '%y')}}", 
    "hour_24_zero_padded_func": "{{date_format(now, '%H')}}", 
    "hour_12_zero_padded_func": "{{date_format(now, '%I')}}", 
    "am_pm_func": "{{date_format(now, '%p')}}", 
    "minute_zero_padded_func": "{{date_format(now, '%M')}}", 
    "second_zero_padded_func": "{{date_format(now, '%S')}}", 
    "microsecond_zero_padded_func": "{{date_format(now, '%f')}}", 
    "locale_datetime_func": "{{date_format(now, '%c')}}", 
    "locale_date_func": "{{date_format(now, '%x')}}", 
    "locale_time_func": "{{date_format(now, '%X')}}", 
    "literal_percentage_example": "{{date_format(now, '%% %Y-%m-%d %%')}}", 
    "date_parse_func": "{{str(add_years(date_parse('2019-01-01', '%Y-%m-%d'), 1))}}"
}
{
    "now_var": "2019-02-06 01:44:37.696468",
    "add_days_func": "2019-02-07 01:44:37.696468",
    "subtract_days_func": "2019-02-05 01:44:37.696468",
    "add_weeks_func": "2019-02-13 01:44:37.696468",
    "subtract_weeks_func": "2019-01-30 01:44:37.696468",
    "add_months_func": "2019-03-06 01:44:37.696468",
    "subtract_months_func": "2019-01-06 01:44:37.696468",
    "add_years_func": "2020-02-06 01:44:37.696468",
    "subtract_years_func": "2018-02-06 01:44:37.696468",
    "weekday_func": "Wednesday",
    "weekday_short_func": "Wed",
    "day_of_week_func": "3",
    "day_of_month_func": "06",
    "day_of_year_zero_padded_func": "037",
    "week_number_sunday_func": "05",
    "week_number_monday_func": "05",
    "month_func": "February",
    "month_short_func": "Feb",
    "month_padded_func": "02",
    "year_func": "2019",
    "year_short_func": "19",
    "hour_24_zero_padded_func": "01",
    "hour_12_zero_padded_func": "01",
    "am_pm_func": "AM",
    "minute_zero_padded_func": "44",
    "second_zero_padded_func": "37",
    "microsecond_zero_padded_func": "696468",
    "locale_datetime_func": "Wed Feb  6 01:44:37 2019",
    "locale_date_func": "02/06/19",
    "locale_time_func": "01:44:37",
    "literal_percentage_example": "% 2019-02-06 %",
    "date_parse_func": "2020-01-01 00:00:00",
}

Convert Types

Function
Description

int(val)

Returns an integer from its string representation.

float(val)

Returns a float from its string representation.

str(val)

Returns a string representation of any value.

bool(val)

Returns a boolean from its string representation.

Load Files

Function
Description

load_csv(csv_file,delimiter=',')

Loads a .csv file from a path csv_file relative to the /resources directory. Returns an array.
If the file has a single column and array of strings is returned. If the file has multiple columns an array of tuples is returned. Has an optional delimiter parameter that denotes the file's column delimiter.

load_text(file, escapejson=True,params={})

Loads a text (e.g. .sql) file with a path relative to the /resources directory. These text files may include jinja expressions.

Has an optional escapejson parameter, which defaults to true, escapes all newlines, tabs, and special characters to make the text suitable to be in a JSON string field. Has an optional params parameter; a dictionary of additional parameters to be used to process the file. These parameters override existing runtime variables.

load_json(file)

Loads a .json file with a path relative to the /resources directory. Returns as a representation of the file contents, which can be a dictionary, an array, a string, etc.

enumerate(list)

Returns an iterator of tuples of (index, element) of an array.

path_join(...)

Concatenates pieces of a file path, taking care of leading and trailing slashes.

create table STATES (ID numeric primary key not null, STATE varchar(20) NOT NULL);

{{ for id,state in load_csv(WorkDir+'resouces/states.csv') }}
insert into STATES (ID, STATE) values ('{{ id }}','{{ state }}');
{{ endfor }}
{ 
  "items" : [
{% for i,item in enumerate(list) %}
   
    {{ ',' if i == 0 else '' }}     # Prevent breaking a JSON array
    
    {
      "key":"key{{i}}",
      "value":"{{item}}"
    }
{% endfor %}
   ]
}

Miscellaneous

Function
Description

range(stop),
range(start,stop,[step])

Returns a list of integers from 0 to stop, or a list of values from start to stop using a given step.

random()

Returns a random float value between 0.0 and 1.0

all(items)

Returns true when all items of a list are true

any(items)

Returns true when any of the items in the list is true

len(array or dict)

Returns the length of an array or dictionary

strjoin(separator, items)

Concatenates the items in the array and returns the concatenated string. Two syntaxes are supported:
strjoin(',','singleline')
strjoin(',',['1','2','3'])

basename(path)

Returns the last name of a file path, as in /etc/asdf

dirname(path)

Returns the directory from a path, as in /etc/asdf

Template Basics

DataKitchen's template engine is based on Jinja2, which functions as a preprocessor to recipe execution of all recipe variation files. The output of the template engine for any .json configuration files must be well-formed .json to support subsequent file processing.

Compiling Files with Runtime Variables

Compiling of files via the web app or CLI may show a result that is not well-formed .json as variables set at runtime may not be available for compiling at that time.

Template Restrictions

The following files must be well-formed JSON at the beginning of recipe processing, and thus cannot leverage any complex Jinja templating: variables.json and variations.json.

Special Cases

Except for variable referencing, Jinja templating is not supported in variables.json and variations.json.

All files in a recipe do support Jinja variable referencing via the {{ }} syntax, provided that they use valid JSON. See Jinja Variable References for more information.
Supported example

  {
        "data_key": "{{example_string}}"
  }

Failure example

  {
        "data_key": {{example_string}}
  }

The failure example has improperly formed JSON and results in an error.

JSON Comments

Aside from the exceptions noted above, Jinja comments may be added to .json configuration files with the {# #} syntax. These comments are ignored by the web app forms views but are visible in source views. Comments are helpful for ongoing operational management and iteration of recipes.

{
    "key1": "value1",

{# single-line jinja comment #}

{# 
 multi-line
 jinja
 comment
 #}

		"key2": "value2"
}

Referencing Variables

Expression blocks are used to insert literals or variables into recipe files using the {{ }} syntax.

# Literals
{{ 'Hello !!' }}
{{ 10.5 }}
{{ True }}

# Variable referencing
{{ somevariable }}

Setting Page Variables

Jinja supports the setting of variables within templates via statement blocks using the {% %} syntax along with set. These variables are ephemeral in that they are scoped at template level and not propagated to other recipe files. Once template variables are set they can be used in the same way runtime variables are used in templates.

{
    {% set tables=['table1','table2','table3'] %}


    {% for table in tables %}
    ....
    {% endfor %}
}

Setting Runtime Variables

In addition to Jinja page-scoped variables, it is possible to export a variable as runtime variable, making it global for the order run. The variable is set during the page processing, so it is possible to use it right after its declaration.

{
  {# Example 1 #}
  {% export var1 = 'Hello world'%}
  
  {# Example 2: export an existing page variable#}
  {% set var1 = 'Hello world %}
  {% export var1 %}
  
  {# Example 3 #}
  {% export var1, var2 = 'Hello', 'World' #}
  
  {# Example 4 #}
  {% set var1 = 'Hello world' %}
  {% export var2, var3 = var1.split() %}

}

For Loops

Jinja statement blocks ({% %}) may also be used to loop through lists and dictionaries using for and endfor.

# Iteration of a list
{% for user in users %}
...
{% endfor %}

# Iteration of a dictionary
{% for key, value in dictionary.items() %}
...
{% endfor %}

# Iteration of a list of items with index
{% for index, value in enumerate(values) %}
....
{% endfor %}

Special Loop Variable
Inside for loops there is a special variable called loop, which contains things like the loop index.

  • index is the loop index starting from 1
  • index0 is the loop index starting from 0
# Iteration of a list
{% for user in users %}
    {{ ',' if loop.index0 > 0 else ''}}
    {{user}}
{% endfor %}

Condition Blocks

Jinja templates may also include conditional statements that leverage template or runtime variables. Here, the {{% %}} is used along with if, elif, else, and endif.

Conditional Node Keys

Use conditional jinja templating to configure keys whose execution is dependent on runtime variables.

{% if a == 0 %}
...
{% endif %}


{% if item.first %}
...
{% elif item.last %}
...
{% endif %}

Boolean Syntax Both .json and Jinja Boolean literals are supported inside Jinja statements (True, true, False, false).

Updated about a month ago


Variables & Template Engine


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.