DataKitchen DataOps Documention

DKCloudCommand

PyPI version

DataOps users who prefer to operate at the command line rather than the DataKitchen web app GUI can do so via DKCloudCommand, the command line tool for interacting with the DataKitchen API.

DataKitchen recommends that you first review the DataOpps Web App topics as companions to this demonstration of DKCloudCommand. The GUI resources introduce concepts like kitchens, recipes, variations, orders, and ingredients. In these command line topics, we'll cover those concepts in less detail and focus on building a recipe from scratch.

Requirements

Python

DKCloudCommand requires Python version 3.6 or newer. DataKitchen recommends Python 3.8. This version of Python may be easily installed within a virtual environment, as described in the next section.

pip

DKCloudCommand is best installed via pip3. Pip3 is already installed if you are using Python 3 (version >=3.6).

Virtual Environments

We recommend that all users install DKCloudCommand within a virtual environment. Two great options, conda and virtualenv, are described in detail below. You may choose to use either.

conda

First, install Miniconda, a lightweight version of Anaconda, on your local machine via the appropriate installer.

Next, execute the series of commands below to setup a conda virtual environment named DataKitchen where you will install DKCloudCommand.

# Create a conda virtual environment named DataKitchen that uses Python 3.7
~ $ conda create -n DataKitchen python=3.7

# Activate the newly-created conda virtual environment (Windows)
~ $ activate DataKitchen
(DataKitchen) ~ $

# Activate the newly-created conda virtual environment (macOS)
~ $ source activate DataKitchen
(DataKitchen) ~ $

# When finished using the conda virtual environment, deactivate it
(DataKitchen) ~ $ source deactivate
~ $

virtualenv

Execute the following series of commands to install virtualenv and create a virtual environment named DataKitchen where you will install DKCloudCommand.

# Install virtualenv
~ $ pip install virtualenv

# Create and enter a directory to house local virtual environments
~ $ mkdir virtualenvs 
~ $ cd virtualenvs

# Create a virtual environment named DataKitchen that uses Python 3.7
~ $ virtualenv DataKitchen --python=python3.7

# Activate the newly-created DataKitchen virtual environment
~ $ source DataKitchen/bin/activate
(DataKitchen) ~ $

# When finished using the virtual environment, deactivate it
(DataKitchen) ~ $ deactivate
~ $

Docker Containers

DKCloudCommand can also be run inside a custom container.

Python Package Requirements

Python package requirements will be installed or updated when installing DKCloudCommand.

Recommended Tools

A number of high-quality open-source tools are available to assist with the building and editing data analytic pipelines. DataKitchen's flexibility as a platform allows it to play nice with any tool sin your existing toolchain. In this guide we'll make use of the open-source tools highlighted below:

Integrated Development Environment (IDE)

Pycharm Community Edition (Multi-Platform)

File-Transfer Application

Transmit (macOS)

Cyberduck (Windows)

Database Query Tool

DBeaver (Multi-Platform)

DBeaver is open source, can be downloaded without network-admin privileges, and automatically downloads and manages required database drivers.

SQL Workbench (Multi-Platform)

SQL Workbench is open source.

Visual File Diff & Merge Tools

Configuring Visual File Diff & Merge Tools with DKCloudCommand

Configuration for file diff and merge tools is documented in the next section.

Pycharm (Multi-Platform)
See above.

Meld (Multi-Platform)

Updated about a month ago


DKCloudCommand


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.