DataKitchen DataOps Documention

Nodes

The steps executed by a recipe graph.

The steps performed in a recipe are organized in a graph composed of nodes.

Node Types

Five Node Types

There are five major node types designed for distinct purposes.

Node
Description

A node without keys. Can run without configured data sources or sinks. Often used as a convergence node for graphs with parallel nodes upstream.

Maps data between data sources and data sinks. Keys can be mapped manually or via wildcards.

Runs data sources to connect to infrastructure for use cases where data sinks are not needed. For example, connecting to a database to perform administrative operations. The /data-sources directory is named /actions for action nodes.

Runs a Docker container based on parameterizable image names and tags. Scripts and GUI tools can be embedded and run within these nodes. DataKitchen provides a number of container images with useful features that may be leveraged directly or customized.

Runs a recipe variation that has been declared as an Ingredient. A distinct order run is created for the ingredient node, which is run in an auto-generated child kitchen.

Node Type Properties

Each node type is identified by the type value in its description.json file.

Node
Type
Notebook?
Data Sources?
Data Sinks?

DKNode_NoOp

-

-

DKNode_DataMapper

Required

Required

DKNode_Action

Required

-

DKNode_Container

Required

Optional

DKNode_Ingredient

-

-

Node Components

Description.json

Mandatory for all node types. Sets the node type and provides a description field used to describe a node's purpose.

{
   "type" : "[NODE TYPE]",
   "description": "[IT IS BEST PRACTICE TO PROVIDE A DETAILED DESCRIPTION]"
}

Notebook.json

{
    "metadata": {
        "name": "mapper"
    },
    "wildcard-will-automatically-create-mappings": [
        {
            "data-source": "sftp_source",
            "data-sink": "s3_sink"
        }
    ],
    "mappings": {}
}
{
    "name": "container-node-s3",
    "dockerhub-namespace": "{{dockerhubconfig.namespace}}",
    "dockerhub-username": "{{dockerhubconfig.username}}",
    "dockerhub-password": "{{dockerhubconfig.password}}",
    "analytic-container": true,
    "image-repo": "{{dockerhubconfig.imagerepo}}",
    "image-tag": "{{dockerhubconfig.imagetag}}",
    "container-input-file-keys": [
        {
            "key": "s3-datasource.*",
            "filename": "from_s3/*"
        },
        {
            "key": "s3-datasource.key1",
            "filename": "from_s3/s3-input-file-3x3.csv"
        }
    ],
    "container-output-file-keys": [
        {
            "key": "s3-datasink.*",
            "filename": "*.csv"
        }
    ]
}
{
  "dk-cloud-ip": "{{https://cloud.datakitchen.io}}",
  "dk-cloud-password": "{{<PASSWORD>}}",
  "dk-cloud-port": "{{443}}",
  "dk-cloud-username": "{{<USERNAME>}}",
  "DKDOC": "this is node definition for a node that calls a recipe ingredient",
  "ingredient-name": "",
  "ingredient-recipe-name": "",
  "ingredient-description": "This is an example of an Ingredient node's notebook.json configuration.",
  "required-ingredient-variables": [
      "{{pre-defined-variable-format}}", 
      "$runtime-variable-format"
  ],
  "ingredient-required-orderrun-results": {
    "orderrun-poll-interval": 10,
    "orderrun-timeout": 60,
    "orderrun-kitchen": "master",
    "orderrun-allow-log-test-results": true,
    "orderrun-allow-warning-test-results": true,
    "orderrun-allow-failure-test-results": false
  }
}

(sources: DataMapper, Action, Synchronize, Container, Ingredient)

Docker-Share

The /docker-share directory is only applicable to container nodes. This directory holds the config.json file and any files used by the container, like scripts.

Config.json

The config.json file is only applicable to container nodes. Here configuration may be set to import packages, inject parameters into the container, and run scripts within the container.

Data Sources

Infrastructure connections that get data for use by a recipe node. A node may have multiple Data sources or none at all. Data sources within a given node are processed in indeterminate order.

Before the data work performed by a node can occur, data must often be gathered. After the data work by the node is performed, the resulting output must often be placed in some location. These requirements are configured via data sources and data sinks.

Data Sinks

Infrastructure connections that push data from a recipe node. A node may have multiple data sinks or none at all. Data sinks within a given node are processed in indeterminate order.

Keys

The sub-Steps executed within a given node. Keys are specific to any given Node. Keys are executed in the order they are defined within a given recipe configuration file. Keys may be defined across multiple files within a node.

Resource Files

Nodes can call files located in the recipe-level /resources directory.

Tests

Automated tests against recipe variables may be defined in multiple files within a node. Tests are executed in the order they are defined within a given node file.

Node File Structure

  • /NODE_NAME directory
    • description.json
    • notebook.json
      • May contain keys and/or tests
    • /data_sources directory
      • (/actions for action nodes)
      • May contain keys and/or tests
    • /data_sinks directory
      • May contain keys and/or tests
    • /docker-share
      • Only applicable for container nodes
      • Always contains a config.json file

Updated 6 days ago


Nodes


The steps executed by a recipe graph.

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.