DataKitchen DataOps Documention

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

A

action (test)

A test action defines what is to be done when a test fails. Test actions can be set to log, warning, or stop-on-error.

action node

A node type containing a data source but no data sink. Performs some action. Contains a directory named /actions that contains the files directing the data work to be performed.

add_days

Adds a given amount of days (positive or negative) to a date time object.

add_months

Adds a given amount of months (positive or negative) to a date time object.

add_weeks

Adds a given amount of weeks (positive or negative) to a date time object.

add_years

Adds a given amount of years (positive or negative) to a date time object.

agent

The Mesos slave that runs DataKitchen Recipes. The UI displays the status across all available Agents if more than one is configured. As long as one Agent is available, the status will show as green. Additionally, the Agent status in the UI shows the total available memory and disk space for all Kitchens. The Agent status is refreshed every 30 seconds. An Agent should be provided with sufficient disk space to run the upper limit of expected simultaneous OrderRuns. Note that the Agent will compute the disk space only for the partition where the working directory is set.

analytic container

api

applies-to-keys

Denotes the Keys to which a test will be applied. The test will loop these keys, assigning the value to the test variable. This is an older syntax usage that is best used for tests containing historical calculations.

archive

An optional, automated backup of OrderRun records is available on each Kitchen's configuration page.

B

bool

Parses a string representation of a boolean value.

browser

The DataKitchen UI supports Google Chrome, Mozilla Firefox, and Microsoft Edge.

built-in functions

Give support to miscellaneous operations with variables.

built-in variables

Read-only variables available for each OrderRun that provide details regarding the run.

C

clean previous run

An option for DKCC's kitchen-merge-preview command that allows the user to wipe any existing local files that remain after a previously aborted kitchen-merge-preview.

compiled-recipe

The default directory name where compiled versions of Recipes are written to local. Results from the use of the recipe-compile command.

concatenate

container id

The Docker Container ID associated with the containers used for container nodes is recorded as part of each OrderRun's MongoDB record. This value can be accessed via the UI on the OrderRun details page, specifically under the OrderRun details header (container node -> progress.json -> json -> notebook_progress -> DKContainer -> container-progress -> container-id)

container node

A node type that is encapsulated by a container. Provides flexibility to build nodes using GUI tool or tools not currently supported by default node types.

context

A local configuration for DKCC associated with a specific customer account. A default context configuration is required. Additional contexts may be configured if desired.

context-delete

A DKCC command used to delete a local context configuration.

context-list

A DKCC command that lists all local context configurations.

context-switch

A DKCC command used to create a new context, or switch to another existing context.

cpr

See "clean previous run"

currentkitchen

A system/built-in variable set equal to the name of the current Kitchen.

currentorderid

A system/built-in variable set equal to the ID of the current Order.

currentorderrunid

A system/built-in variable set equal to the ID of the current OrderRun.

currentvariation

A system/built-in variable set equal to the name of the current Variation for which the Recipe is being compiled.

D

datamapper node

A node type that maps data between a data source and a data sink.

dataops

View and sign The DataOps Manifesto.

datasink

DataKitchen's default i/o connector to put data from Recipe Nodes.

datasource

DataKitchen's default i/o connector to get data to Recipe Nodes.

data_format

Returns a string representation of a date time object according to a given format.

date_parse

Returns a date time object according to a given string representation and a format.

decrypt-key

Used for the decryption of file-based Data Sources. Its value should point to a Vault Secret that is stored as text, not binary.

decrypt-passphrase

Used for the decryption of file-based Data Sources if and only if the key has a passphrase set. Its value should point to a Vault Secret that is stored as text, not binary.

description.json (Node)

A file contained by each Recipe node containing "type" and "name" fields.

description.json (Recipe)

One of the 4 core Recipe configuration files. Contains "recipe-name", "nodes-to-use", "edges-to-use", and "recipe-emails" fields.

directed-acyclic-graph (DAG)

A finite directed graph with no directed cycles. DAGs consists of Nodes and Edges.

disk space

Disk space resources allocated for a given OrderRun.

DKCC

The acronym for DKCloudCommand, DataKitchen's command line tool.

dk-cloud-ip

DK's Cloud IP. Found within DKCC's context configuration files and notebook.json for Ingredient Nodes. Typical value is https://cloud.datakitchen.io.

dk-cloud-password

A field found within DKCC's context configuration files and notebook.json for Ingredient Nodes. The value for this field is a DataKitchen user's password, used for the API, DKCC, and UI.

dk-cloud-port

The port used by the DK Cloud platform. Found within DKCC's context configuration files and notebook.json for Ingredient Nodes. The default value is 443.

dk-cloud-username

The username for a DK user. Found within DKCC's context configuration files and notebook.json for Ingredient Nodes. The value for this field is a DataKitchen user's username, used for the API, DKCC, and UI.

DKNode_Action

Denotes a Node type Action Node.

DKNode_Container

Denotes a Node type Container Node.

DKNode_DataMapper

Denotes a Node type DataMapper Node.

DKNode_Ingredient

Denotes a Node type Ingredient Node.

DKNode_NoOp

Denotes a Node type Synchronize Node.

do-nothing node

The legacy name for the Synchronize Node type. This node type does no data work but serves as a placeholder or convergence point. The default recipe template contains two of these nodes. Its "type" is denoted as "DKNode_NoOp."

E

edge

The connection between two or more nodes in a Recipe-Variation graph.

edges-to-use

Used to override the edges used for the graph for a Recipe-Variation. Paired with nodes-to-use.

email-templates

A subdirectory to a Recipe's /resources directory where custom email notification templates for use by that Recipe are stored.

encrypt key

Used for the encryption of S3 and SFTP datasinks. Its value should point to a Vault Secret that is stored as text, not binary.

epsilon

If the scheduler misses the scheduled run time for any reason, it will still run the job if the delay time is within this interval. This is configured as part of the timing and runtime settings within the variations.json configuration file. Follows ISO 8601 syntax.

F

file-compile

A DKCC command that compiles a file with the variable and override values associated with a provided Recipe Variation name.

file-delete

A DKCC command used to delete one or more Recipe files. Delete all files within a directory to delete the directory itself.

file-diff

Leverages a locally-configured tool to display a two-pane window that compares a local version of a file against its remote counterpart.

file-merge

A DKCC command that leverages a locally-configured tool to display a three-pane window that compares a file (remote-copies) across two Kitchens. Source Kitchen, Target Kitchen, and base version of the file are displayed.

file-mount

The base filepath to be referenced inside a container node.

file-resolve

A DKCC command that marks a conflicted file within a Recipe as resolved, so that a merge can be completed.

file-update

A DKCC command that updates the server copy of one or more existing Recipe files based on local changes, or adds one or more new Recipe files. Requires inclusion of a change message.

float

Parses a string representation of a floating point number.

G

get_container

A container that is used to get data. Inside its /docker-share directory, only the config.json file can parse jinja templates.

git-setup

A DKCC command that is used to set up a GIT repository for a customer account. Authentication remains via a centralized GIT user, but commits are tagged with the appropriate DK username and email address.

global-vault

An customer-level Vault whose Secrets are accessible by all Kitchens. All users may edit its connection settings. Disable or use in conjunction with Kitchen-Vaults. OrderRuns will look to the Global-Vaults for Secrets Vaults only if the value cannot first be found within a connected Kitchen-level Vault.

graph.json

One of the 4 core Recipe configuration files. Defines sets of nodes and sets of edges.

H

I

image-tag

A field in a Container Node's notebook.json configuration file that specifies the tag of the Docker image to be pulled when building the container. This is an optional field. When populated, the default value is "latest."

ingredient

A Recipe and its outputs, which can be reused by another Recipe without requiring reprocessing of the reusable code. As of v1.0.62, for Ingredients to function properly, they need to exist in the Kitchen for which they will be used, or in the case of creating a Child Kitchen via the Wizard, the Ingredients must exist in the parent Kitchen so that they may be inherited by the child. However, Recipe Ingredients can be hidden in the UI by removing their names from the "recipes" field in Kitchen.json. This hides the Recipe Ingredients from the Recipes list in the UI, as well as from the Ingredients list in the UI, but the Recipe Ingredients will still be visible in the Recipe list returned by DKCC.

ingredient node

A node type that calls an Ingredient.

ingredient-recipe-name

The name of the Recipe that has been declared as an Ingredient. This field is configured in the notebook.json file of Ingredient Nodes.

ingredient-required-orderrun-results

A dictionary of configuration found within an Ingredient Node's notebook.json file. Specifies the metadata to be passed from the Ingredient OrderRun to its Parent OrderRun, including the polling interval.

int

Parses a string representation of an integer number.

iso 8601

The syntax used by Mesos for Order scheduling configuration.

J

json

DataKitchen configuration file format.

K

keep-history

A configuration flag used when defining tests. When set to true, keep-history retains the value of a variable across runs to provide for historical comparisons.

key

A substep of work within a Recipe node. Multiple keys may exist within a node and are executed in the order they are presented within the node. Tests are applied to the output of keys. A row count as the final portion of a SQL query is an example. All keys within a node are processed before the processing of tests.

kitchen

Kitchens are virtual workspaces, tied to a release environment, where people build, manage, and run data pipelines. Think of them much like factories containing assembly lines.

kitchen-delete

A DKCC command that deletes a Kitchen. Deleting a Kitchen will not delete any child Kitchens, but will instead create orphan kitchens.

kitchen-history

A history of all changes that have occurred to either the definition of the kitchen environment or the Recipe code and configuration. The history of changes is filterable. Each change provides a detailed diff view file changes, automated changes messages, and optional user messages.

kitchen-level overrides

These overrides supersede variables.json baseline values as well as overrides defined in variations.json. Unlike the values they override, kitchen-level overrides sit outside of the Recipe content in version control and thus are best used for defining infrastructure. For example, the schema name compiled in development versus production Kitchens. Because they are defined at the Kitchen-level, kitchen-level overrides are applied to all Recipes within a Kitchen. These overrides may be defined via Kitchen details in the UI or via the kitchen-config command when using DKCloudCommand.

kitchen-settings.json

A file stored in MongoDB that contains the configuration for the Kitchen Wizard. Note that additional Wizard settings, specifically required variables that appear as text fields in the Wizard, are configured not via the kitchen-settings.json file but the variations.json file that is part of the Ingredient Recipe.

kitchen staff

Defines the set of users who have access to a Kitchen. A user will see all existing Kitchens, but those for which they do not have Kitchen staff rights, access will be blocked. A user need not need access to the master Kitchen.

kitchen status

Appears in the UI as part of the Kitchen list. Indicates whether there was an error with a wizard step during Kitchen creation. If an error exists additional details will appear.

kitchen-vault

An optionally-inheritable custom Vault connection type where access to Secrets is limited to connected Kitchens. Management of Kitchen-Vault connection settings is limited by Kitchen Staff. Each Kitchen may only be connected to a single Kitchen-Vault at a time. Secrets in these Vaults will override Secret values in the Global-Vault if the Secret paths are identical

kitchen wizard

A UI feature that guides "clickers" through the process of creating, deleting, and merging Kitchens. Wizards can also be configured to perform other tasks like adding Ingredients to Kitchens and creating schemas and clusters.

L

latest_version

A file location in {USER_HOME}/.dk that denotes the currently installed version of DKCC. This is used to prompt the user to upgrade DKCC when applicable.

load_csv

Loads a .csv file and returns its contents as a list of tuples, which can be iterated with jinja loop expressions. The file reading is performed in compile time so it must reference an existing resource file in the Recipe. The path is absolute, the built-in variable WorkDir can be used as a helper to locate the file in the Recipe.

local tool

A local tool configured with DKCC to provide two-pane file diffs and three-pane file merges.

log

OrderRun logs integrate logging data for all tools in your toolchain that are orchestrated as part of any given Recipe Variation.

M

mappings

A field within a DataMapper's notebook.json file. Defines the names and keys for sources and sinks.

master

The default Kitchen which is the parent to all subsequent Kitchen lineages and also a parent unto itself. The master Kitchen cannot be deleted.

max-disk

The setting, in MB, for the upper limit of disk space made available to an Agent for a Recipe to run. This value is configured via a Variation's Mesos Settings configuration. The default setting is 12GB. This is the minimum disk space allowed. Additional disk space may be required depending on the volume of data being handled by the Recipe. Ingredient nodes are treated as wholly separate Recipes and thus require their own dedicated disk space.

max-ram

The setting for the upper limit of RAM space for a container containing a Recipe; configured via Variation Mesos configuration. The default setting is 1GB as of 2018-02-12.

mesos-group

Designates a DataKitchen Agent constraint. When an Agent constraint is applied to a Kitchen via a mesos-group, Orders from that Kitchen will only be picked up and cooked by Agents with a matching mesos-group tag. This can be used to segment Orders across releases environments and across cloud providers/on-prem.

md5

A hash format that may be generated when a file is loaded by a data source.

mesos-setting

Also known as Timing & Runtime Settings. Contains the configuration for a specific combination of Order Scheduling and Resource allocation. Found within each Recipe's variations.json within the mesos-setting-list.

mesos-setting-list

A list of configured mesos-settings located in each Recipe's variations.json file.

N

node

Encapsulates a unit of work in a data analytics workflow, which can be code or a GUI tool with configuration. Nodes can be thought of as steps in a data workflow. Nodes contains one or more keys and should contain one or more tests. A node first processes preconditions, retrieves data, performs tests, and finally, processes postconditions.

nodes-to-use

Used to override the nodes used for the graph for a Recipe-Variation. Paired with edges-to-use.

notebook.json

A configuration file used for DataMapper and Container nodes.

now

An object representing current date time.

O

order

The submission of a specific Recipe-Variation for execution. Orders may be run on demand or scheduled to commence at some point in the future. Orders possess a unique Order ID and may contain one or more unique OrderRuns.

order id

The unique ID assigned to every Order. Available during runtime via CurrentOrderID.

orderrun

A specific instance of the running of a Recipe-Variation. Multiple OrderRuns will exist for a given Order if that Order is configured to repeat. Each OrderRun has a unique OrderRun ID and a run record.

orderrun-allow-log-test-results

In a recipe that runs an Ingredient, set this value to true to allow the logs for tests of log type, for the OrderRun in the generated Child Kitchen, to be read into the Parent Kitchen.

orderrun-allow-warning-test-results

In a recipe that runs an Ingredient, set this value to true to allow the logs for tests of warning type, for the OrderRun in the generated Child Kitchen, to be read into the Parent Kitchen.

orderrun-checks-timing

For a Recipe that runs an Ingredient, this value is used to set the specific time/cadence that the Recipe checks the status of the auto-generated child Kitchen where the OrderRun takes place for the Ingredient itself. An example value, found in a notebook.json file, is [2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048]

orderrunhistorykitchen

The name of the Kitchen marked as history kitchen, to be used in metrics.

orderrun id

The unique ID assigned to every OrderRun. Available during runtime via CurrentOrderRunID.

orphan kitchen

Deleting a Kitchen does not delete any of its child Kitchens but rather orphans them. Orphan Kitchens can subsequently be deleted or merged "diagnonally" into other Kitchens via the "Advanced Merge" feature.

output-filename

The name of the file returned by a container node. Defined per key.

overrides

See variable overrides and recipe overrides

P

parent kitchen

The Kitchen from which a child kitchen is directly created, and where it inherits all selected Recipes. Inheritance also applies to Kitchen Staff and Recipe overrides. When merging between a parent and child, the merge should also first be processed from parent down to child (parent=source).
playbook

passive

A type of connection configured in a FTP Data Source or Sink.

placeholder-node

See synchronize node

previousorderrunid

The ID of the previous OrderRun.

previousorderrunidlist

An array containing the previous OrderRun IDs for this Recipe/Variation.

progress.json

A file, specific to each OrderRun, that contains details about the ongoing run.

python scripts

Python scripts that are used as part of Recipes are placed inside containers (Container Nodes or Data Sources). Other python scripts may be used strictly for development work, though they never run against production servers. These scripts are likely included as /resources files in GHE.

Q

quality assurance

Do DataOps and implement automated tests across your Recipes to catch errors in place and resolve them before they are ever released to Production.

R

recipe

A combination of pipeline assets and DataKitchen configuration files that define a Graph of executable steps. Each Recipe contains four core configuration files, a /resources sub-directory, and additional sub-directories specific to each contained Node. A given Recipe may consist of multiple Variations, each with its own variable override set saved in variations.json.

recipe overrides

See kitchen-level overrides.

recipename

The name of the Recipe for the current OrderRun.

reproducibility

The level of detail included in each OrderRun record makes its processing fully reproducible, thus satisfying regulatory requirements.

resources

A standard Recipe directory that contains files leveraged by the Recipe. By default, /resources contains a README.txt file and an /email-templates directory containing default email templates. Node-specific files can either be stored here or within their respective nodes.

resume

OrderRuns that have failed or have been manually stopped may be resumed, either from the UI or via DKCC's orderrun-resume command. Resumed OrderRuns ignore nodes that were previously completed successfully and will rerun the entirety of failed nodes, even if a failed node did have some keys that completed successfully. Note that a new container is used for a resumed OrderRun.

row_count

A built-in feature of data sources and sinks whereby the row counts for files is set to a variable that can be consumed by a downstream test. Row counts are not available for binary files.

run record

Each OrderRun generates a distinct run record that contains a full copy of the Recipe code that was run, a record of the specific infrastructure on which the run occurred, compiled values for all relevant overrides, timing information, and test data. With a run record, an OrderRun becomes fully reproducible.

S

schedule

Recipes may be scheduled to run on a recurring basis. If changes are applied to a Recipe between scheduled Order-Runs the Recipe need not be rescheduled. See here for syntax details.

scheduledorderruntime

The time an OrderRun was scheduled to execute per its Order's schedule. Used with actual runtime to calculate the delay for a run, which will always be less than the configured Epsilon interval. Users can use this variable in any JSON, SQL, or text file. Python files (.py) and shell scripts (.sh) cannot use this variable directly. If using in a non-analytic container, this variable can be passed as a command line: /bin/bash -c "echo ScheduledOrderRunTime > somefile"

schedule delay

The time elapsed between when an OrderRun was scheduled to kick off and its actual start time. This delay time is always less than the configured Epsilon interval, otherwise, the scheduled OrderRun is skipped.

secret

A sensitive value stored securely via encryption in the Vault. A filter is applied to the system that prevent Secrets from being displayed in Order-Run logs. Sometimes this filter overwrites non-Secrets in logs (but not in compiled files).

serving states

The states provided by the dk orderrun-info --runstatus command response: PLANNED_SERVING, ACTIVE_SERVING, COMPLETED_SERVING, STOPPED_SERVING, SERVING_ERROR, SERVING_RERAN, UNKNOWN

sha

A hash format that may be generated when a file is loaded by a data source.

sink-key

sink-name

source_kitchen

The designated "from" Kitchen for Kitchen Merge Previews and Kitchen Merges.

source-key

The specific Key from a Data Source to be mapped to a specific Key in a Data Sink as part of an explicit mapping in a DataMapper Node's notebook.json file.

source-name

The specific Data Source to be mapped to a specific Data Sink as part of an explicit mapping in a DataMapper Node's notebook.json file.

status

Indicates the state of an Order or Order-Run. Recall that Orders may contain multiple Order-Runs if they are schedued or if they have been stopped and resumed. Possible Order Status values include: Active, Complete, Stopped, Error. Possible Order-Run values include: "", Actve, Completed, Error in.

stop-on-error

A test action that stops an Order-Run when a test fails.

str

Returns a string representation of any object.

synchronize node

A node type that does no data work but serves as a placeholder or convergence point. The default recipe template contains two of these nodes. Its "type" is denoted as "DKNode_NoOp."

T

target_kitchen

The designated "to" Kitchen for Kitchen Merge Previews and Kitchen Merges.

template

Defines a standard Recipe structure that can be leveraged when creating a Recipe via DKCC. Templates current include those that match Quickstart1 (default), Quickstart2, and Quickstart3.

test

Defined within a Recipe node, a test is applied to a key within said node. Tests are configured with the following fields: "test-variable", "type", "applies-to-keys", "action", "keep-history", "test-logic", "test-compare", and "test-metric." A single key can be used by multiple tests by first creating a test, populating a variable as part of said test, then using that variable in other tests. The test suite holds onto variable values within a given node.

test-compare

Legacy syntax, though still supported. Sub-field to test-logic. Declares how to compare test-variable to test-metric.

test-contents-as-date

Declares the type of the variable being tested as datetime.

test-contents-as-float

Declares the type of the variable being tested as float.

test-contents-as-integer

Declares the type of the variable being tested as integer.

test-contents-as-string

Declares the type of the variable being tested as string.

test-logic

Contains a logic statement evaluating the test-variable.

test-metric

Legacy syntax, though still supported. Sub-field to test-logic. Parent field to optional historic-calculation and historic-metric fields when performing a historic comparison test. Declares the value the test-variable will be compared against. Can be a literal (100, "100"), a date expression, or a variable name (runtime or key-associated).

test-order

Deprecated.

timeout

API session tokens are valid for 4 hours after which long-running connection sessions must be renewed.

test-variable

The variable that holds the value being tested.

todayval

Today=0

todaywithslashes

{{date_format(add_days(now,todayval),'%Y/%m/%d')}}

todaywithdashes

{{date_format(add_days(now,todayval),'%Y-%m-%d')}}

type (test)

The type of test performed. This is used to evaluate the value of test-variable as a specific datatype. If the value cannot be cast to the specified type, an error will be thrown. If omitted, the user is responsible for ensuring the correctness of datatypes used in test-logic: [test-contents-as-date, test-contents-as-float, test-contents-as-integer, test-contents-as-string].

U

user home

The dafult location where DKCC installs its hidden /.dk folder. This contains a DKCC config file and latest version file. Defined in documentation by {USER_HOME}.

V

validate

The process of confirming that all Recipe files can be properly compiled during an Order-Run. Validation is processed at the Variation level for both DKCC and the UI at the time of any version control changes. With DKCC users may also validate a Recipe Variation aside from any version control changes.

variables

Values set in variables.json file for runtime flexibility. Can be overriden.

variables.json

One of the 4 core Recipe configuration files. Stores values for jinja templates, which allows for runtime flexibility. Values are stored as plain text or point to a path within the Vault.

variable overrides

Values set in variations.json file for runtime flexibility. Overrides values set in varaibles.json . Can be overridden.

variations.json

One of the 4 core Recipe configuration files. Contains a list of defined Recipe Variations as well as setting for the Environment, Mesos, and Overrides. Also defines the "active-variation."

vault

An encrypted store of Secrets in DynamoDB. See global-vault and kitchen-vault.

W

warning

Warns the user when a test fails but does not stop an Order-Run.

wildcard

Leveraged when getting or putting files with templated naming formats, specifically to cycle through large numbers of files. For example, wildcards may be used to pull files based on the date in their name while ignoring the time portion of the name. Wild card errors only halt an OrderRun if they reference a non-existent filepath; absence of files matching wildcards will not stop an OrderRun.

wizard-status

A field in kitchen.json that summarizes the Order Runs processed by the cooking of Ingredients via the Kitchen Wizard.

workdir

The current working directory, also known as the Recipe root directory.

X

Y

Z

zip

Updated 2 months ago

Glossary


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.