Version: 1.0

2. Kubeflow Concepts

Component

A component is composed of Component contents and a Component wrapper. A single component is delivered to Kubeflow through a Component wrapper and the delivered component executes the defined Component contents and produces artifacts.

Component Contents

There are three components that make up the component contents:

Environment
Python code w/ Config
Generates Artifacts

Let's explore each component with an example. Here is a Python code that loads data, trains an SVC (Support Vector Classifier) model, and saves the SVC model.

import dill
import pandas as pd

from sklearn.svm import SVC

train_data = pd.read_csv(train_data_path)
train_target= pd.read_csv(train_target_path)

clf= SVC(
    kernel=kernel
)
clf.fit(train_data)

with open(model_path, mode="wb") as file_writer:
     dill.dump(clf, file_writer)

The above Python code can be divided into components contents as follows.

Environment is the part of the Python code where the packages used in the code are imported.
Next, Python Code w\ Config is where the given Config is used to actually perform the training.
Finally, there is a process to save the artifacts.

Component Wrapper

Component wrappers deliver the necessary Config and execute tasks for component content.

In Kubeflow, component wrappers are defined as functions, similar to the train_svc_from_csv example above. When a component wrapper wraps the contents, it looks like the following:

Artifacts

In the explanation above, it was mentioned that the component creates Artifacts. Artifacts is a term used to refer to any form of a file that is generated, such as evaluation results, logs, etc. Of the ones that we are interested in, the following are significant: Models, Data, Metrics, and etc.

Model
Data
Metric
etc

Model

We defined the model as follows:

A model is a form that includes Python code, trained weights and network architecture, and an environment to run it.

Data

Data includes preprocessed features, model predictions, etc.

Metric

Metric is divided into two categories: dynamic metrics and static metrics.

Dynamic metrics refer to values that continuously change during the training process, such as train loss per epoch.
Static metrics refer to evaluation metrics, such as accuracy, that are calculated after the training is completed.

Pipeline

A pipeline consists of a collection of components and the order in which they are executed. The order forms a directed acyclic graph (DAG), which can include simple conditional statements.

Pipeline Config

As mentioned earlier, components require config to be executed. The pipeline config contains the configs for all the components in the pipeline.

Run

To execute a pipeline, the pipeline config specific to that pipeline is required. In Kubeflow, an executed pipeline is called a "Run."

When a pipeline is executed, each component generates artifacts. Kubeflow pipeline assigns a unique ID to each Run, and all artifacts generated during the Run are stored.

Now, let's learn how to write components and pipelines.

Component​

Component Contents​

Component Wrapper​

Artifacts​

Model​

Data​

Metric​

Pipeline​

Pipeline Config​

Run​