DataKitchen DataOps Documention

Container Nodes

Nodes that run scripts and GUI tools.

Introduction

Container nodes support the use of essentially any tool in the analytics toolchain, including scripts (Python, Java, Shell, etc.) and GUI tools like Jupyter and Tableau notebooks. Any container image available on Docker Hub may be leveraged in support of a tool in your toolchain.

DataKitchen Provides Many Standard I/O Connectors

For a list of DataKitchen supported standard I/O connectors that do not require the use of Container Nodes see Data Sources & Sinks documentation.

Container Images

Containers are built from images. DataKitchen provides a number of base images that you can use directly or as a base to which you may add tools and/or proprietary libraries. These images are located on Docker Hub.

DataKitchen Supports Alternative Container Image Hosts

Configure a connection to an alternative to DockerHub by editing the Docker Registry URL field of a Container Node's notebook.json file. This is the dockerhub-url field in the raw json.
An example value is as follow:

http://containerhosting.website.com:5000/

Container Node Source Images

The Container used for Container Nodes can be built directly from DataKitchen's analytic container, or a closely related derivative. Alternatively, users may leverage their own custom-built images for container nodes. In these cases, the node's notebook.json should include the following configuration:

"analytic-container": false,
"command-line": "[OPTIONAL COMMAND]"

Here, the "[OPTIONAL COMMAND]" may be something like:

"/bin/bash -c \"echo 'Hello world' > output.txt\""

The following images are currently available:

  • datakitchen/ac-base
    • Python 2.7 analytic container base image running Ubuntu 14.
    • Supports lists of parameters passed into container via config.json.
    • Supports running .py scripts located in a node's /docker-share directory.
    • The AC logger supports a subset of methods from the Python 2 logger.
      • Supports printing to logs via LOGGER.info().
      • Supports LOGGER.setLevel(); warning is the default level
        • Requires import logging in script to change default level
          • Change to DEBUG: LOGGER.setLevel(logging.DEBUG)
        • Alternate logging levels supported
          • LOGGER.debug()
          • LOGGER.warning()
          • LOGGER.error()
          • LOGGER.critical()
  • datakitchen/ac-base3
    • Python 3.4 analytic container base image running Ubuntu 14.
    • Supports lists of parameters passed into container via config.json.
    • Supports running .py scripts located in a node's /docker-share directory.
    • The AC logger supports a subset of methods from the Python 3 logger.
      • Supports printing to logs via LOGGER.info().
      • Supports LOGGER.setLevel(); warning is the default level
        • Requires import logging in script to change default level
          • Change to DEBUG: LOGGER.setLevel(logging.DEBUG)
        • Alternate logging levels supported
          • LOGGER.debug()
          • LOGGER.warning()
          • LOGGER.error()
          • LOGGER.critical()
  • datakitchen/ac_python3_public_container
    • Python 3.7 analytic container base image running Debian 9.6.
    • Supports the passing of parameters into containers via config.json via either a list or dictionary. with dictionary being the best practice.
    • Supports running .py scripts located in a node's /docker-share directory.
    • The AC logger supports a subset of methods from the Python 3 logger.
      • Supports printing to logs via LOGGER.info()
      • Supports LOGGER.setLevel(); warning is the default level
        • Requires import logging in script to change default level
          • Change to DEBUG: LOGGER.setLevel(logging.DEBUG)
        • Alternate logging levels supported
          • LOGGER.debug()
          • LOGGER.warning()
          • LOGGER.error()
          • LOGGER.critical()
  • datakitchen/jasper_container
    • Analytic container for Internet-of-Things (IOT) applications.
    • Generates a report based on provided data.
    • Report settings and input/output file settings are available.

Container base images are referenced via a Container Node's notebook.json file.

Description.json

Like all nodes, Container nodes require a node-level description.json file:

{
    "type": "DKNode_Container",
    "description": "[YOUR DESCRIPTION HERE]"
}

Notebook.json

Container Nodes also require a node-level notebook.json file. This is where the configuration of the Container itself is located:

{
    "image-repo"                 : "ac_process_info_container",
    "image-tag"                  : "latest",
    "dockerhub-namespace"        : "datakitchen",
    "dockerhub-username"         : "{{dockerhub.username}}",
    "dockerhub-password"         : "{{dockerhub.password}}",
    "analytic-container"         : true,
    "container-input-file-keys"  : [
      {
        "key"       : "inputfiles.some-input",
    	  "filename"  : "some_input_file.csv"
      }
    ],
    "container-output-file-keys" : [
      {
        "key"       : "outputfiles.some-output",
        "filename"  : "results.xml"
      }
    ],
    "assign-variables": [
      {
        "name"      : "rowcount",
        "file"      : "rowcount.txt"
      },
      {
        "name"      : "successcount",
        "file"      : "successcount.txt"
      },
      {
        "name"      : "failurecount",
        "file"      : "failurecount.txt"
      }
    ],
    "inside-container-file-mount"            : "/dk/ContainerWorkingDirectory",
    "inside-container-file-directory"        : "docker-share",
    "container-input-configuration-file-path": "the-node-name/docker-share",
    "container-input-configuration-file-name": "config.json",
    "container-output-log-file"              : "ac_logger.log",
    "container-output-progress-file"         : "progress.json",
  	"delete-container-when-complete"				 : false
}

Docker-Share

Every Container Node contains a /docker-share directly where the scripts the Node runs are stored.

Config.json

Every Container Nodes contains a config.json file within its /docker-share directory.

Properties

property
input type
required/optional

analytic-container

When true, it is expected to be a container, or based on one of the base containers provided by DataKitchen.

optional, default true

assign-variables

A list of associations betwen files inside the container and variables. These variables will be loaded with the contents of these files that should be inside the container once its execution finishes.
Variables can be later used in tests.

optional

command-line

The command to be executed by the container.

Only valid when analytic-container is false.

container-input-configuration-file-name

The name of the configuration file for the container.
Ignored when analytic-container is true.

optional, default config.json

container-input-configuration-file-path

The path to the folder of the files being exchanged with the container on the side of the recipe node, relative to the recipe root directory.
Ignored when analytic-container is true.

optional, default [[node-name]]/docker-share

container-input-file-keys

A list of mappings between data source keys and files to be placed inside the container once created.

optional

container-output-file-keys

A list of mappings between files inside the container and data sink keys, these files will be retrieved from the container once it finishes its execution, and sent to the data sinks.

optional

container-output-log-file

The name of the log file being generated by the container.
Ignored when analytic-container is true.

optional, default ac_logger.log

container-output-progress-file

The name of the progress json file being generated by the container.
Ignored when analytic-container is true.

optional, default progress.json

delete-container-when-complete

Determines whether the container is deleted after its processing has completed and files have been extracted. Default value is false.

optional

dockerhub-namespace

Docker image namespace

required

dockerhub-password

Docker registry service password

required

dockerhub-url

The URL of the Docker registry from where images should be picked.

optional

dockerhub-username

Docker registry service user name

required

image-repo

The name of the docker image.

required

image-tag

The docker image tag denoting the image version to be pulled. The default value is "latest".

optional

inside-container-file-directory

The name of the folder that will be used to exchange information between the container and the node. It's relative to inside-container-file-mount.
When analytic-container is false, this folder must be placed in the working directory.

optional, default docker-share

inside-container-file-mount

The working directory inside the container.
Ignored when analytic-container is true.

optional, default given by the same container.

Container Node Input File Keys

In order to inject files from a data source inside the container we use this expression:

container-input-file-keys: [
  {
    "key"       : "inputfiles.some-input",
    "filename"  : "some_input_file.csv"
  }
]

The key field expresses a reference to a key in a data source, the expression is:

{
    "key": "[ data source name ].[ key name ]"
}

The file name is a name relative to the folder defined by the field inside-container-file-directory, with being docker-share the default folder name.

Configuring an Arbitrary List of Input Files Using Wildcards

container-input-file-keys: [
  {
    "key"       : "inputfiles.*",
    "filename"  : "*.csv"
  }
]

The * in key field will be replaced by each key in the datasource, and the one in filename by the key name. Check data sources section for more details about how to retrieve multiple files using wildcards in datasources.

Container Output File Keys

This is the way to export files from inside the container to a given data sink, is similar to input file keys

"container-output-file-keys" : [
  {
    "key"       : "outputfiles.some-output",
    "filename"  : "results.xml"
  }
]

Configuring an Arbitrary List of Output Files Using Wildcards

In order to export multiple files or files without a specific name, we use wildcards.

"container-output-file-keys" : [
  {
    "filename"  : "*.xml"
    "key"       : "store-results.*",
  }
]

Runtime Output Variables

It is possible to feed runtime variables with contents of files produced by the container.
These variables are available for tests and further usage in following nodes in the graph.
Optionally files can be decoded as json, by default is read as plain text.

"assign-variables" : {
    "name": "variablename",
    "file": "output.json",
    "decode-json": true
}

Examples

Example1

{
    "type": "DKNode_Container",
    "description": "This container runs python3, with secrets from global and kitchen vault."
}
{
    "dockerhub-username": "#{vault://dockerhub/username}",
    "dockerhub-namespace": "datakitchen",
    "dockerhub-password": "#{vault://dockerhub/password}",
    "image-repo": "ac_python3_public_container",
    "metadata": {
        "name": "python3_container"
    },
    "analytic-container": true,
    "tests": {
        "test_global_key": {
            "test-logic": "result_global == 'global_val'",
            "action": "stop-on-error",
            "type": "test-contents-as-string",
            "test-variable": "result_global",
            "keep-history": false
        },
        "test_kitchen_key": {
            "test-logic": "result_kitchen == 'kitchen_value'",
            "action": "stop-on-error",
            "type": "test-contents-as-string",
            "test-variable": "result_kitchen",
            "keep-history": false
        }
    }
}
{
    "dependencies": [],
    "keys": {
        "run-script": {
            "script": "test.py",
            "parameters": {
                "global_secret": "#{vault://global/key}",
                "kitchen_secret": "#{vault://kitchen/key}"
            },
            "export": [
                "result_global",
                "result_kitchen"
            ]
        }
    }
}
import os

if __name__ == '__main__':
	global result_global, result_kitchen

	LOGGER.info("global secret: " + global_secret)
	LOGGER.info("kitchen secret: " + kitchen_secret)

	result_global = global_secret
	result_kitchen = kitchen_secret

(source)

Example2

{
    "type": "DKNode_Container",
    "description": "This container runs the ac_python3_container2 image."
}
{
    "dockerhub-username": "#{vault://dockerhub/username}",
    "dockerhub-namespace": "datakitchen",
    "dockerhub-password": "#{vault://dockerhub/password}",
    "image-repo": "ac_python3_container2",
    "metadata": {
        "name": "python3_container"
    },
    "analytic-container": true,
    "tests": {
        "test-filecount": {
            "test-logic": {
                "test-compare": "equal-to",
                "test-metric": 10
            },
            "action": "stop-on-error",
            "type": "test-contents-as-integer",
            "test-variable": "result",
            "keep-history": false
        },
        "test-float": {
            "test-logic": {
                "test-compare": "equal-to",
                "test-metric": 1.234
            },
            "action": "stop-on-error",
            "type": "test-contents-as-float",
            "test-variable": "float_val",
            "keep-history": false
        }
    },
    "assign-variables": [
        {
            "name": "float_val",
            "file": "float.txt"
        }
    ]
}
{
    "dependencies": [],
    "keys": {
        "run-script": {
            "script": "test.py",
            "parameters": {
                "globalvar1": "value1",
                "dockerhub_username": "{{dockerhub_username}}"
            },
            "export": [
                "result"
            ]
        }
    }
}
1.234
row1
row2
row3
row4
row5
row6
row7
row8
row9
row10
import os

if __name__ == '__main__':
	global result

	LOGGER.info('Value of globalvar1:' + globalvar1)
	LOGGER.info("(should work) dockerhub username: " + dockerhub_username)
	LOGGER.info("(should not work) dockerhub password: " + "{{dockerhub_password}}")


	with open('docker-share/records.csv') as f:
		result = len(f.readlines())

	

Updated 4 months ago


Container Nodes


Nodes that run scripts and GUI tools.

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.