17th March, 2021
8 Min read
Book a Demo
Kubeflow is an open-source project which is built on top of Kubernetes, that contains a set of tools and frameworks to enable the development, deployment and management of Machine Learning models, workflows and services in a completely portable and scalable manner. Kubeflow aims at creating an industry standard for end-to-end management on Machine Learning Infrastructure.
Using Machine Learning in an organization is not a straightforward process. It is very different from some of the generic services that are deployed in the environment production by a company. The life cycle of an ML service is very diverse, containing various stages. If these stages are not carefully thought about in the development phase then it can result in scalability issues at a later stage when the service is running on production. This is where Kubeflow comes to the rescue, it is an industry-standard that considers the ML service deployment right from the initial phase of development to the final deployment at scale.
The main purpose of an ML model is to serve the ML model in production and generating value for the business. However, ML models go through a multi-stage process to reach this point. Starting from:
The multi-stage process is the thing that is simplified and standardized by Kubeflow, as running and maintaining this processing is a challenge even for the most experienced Data Scientists and Machine Learning Engineers.
Kubeflow can be used to manage the entire Machine Learning workflow of a company at scale and maintain the same level of quality. Its underlying Kubernetes provide any Kubeflow user all the capabilities that lie inside Kubernetes hence providing great scalability capabilities.
Any ML workflow requires a large amount of experimentation and research work. This includes testing various models, comparing them, tuning hyperparameters and validating the result. Kubeflow provides Jupyter Notebooks, various ML frameworks and capabilities such as CUJ’s (Critical User Journey) end-to-end pipelines that provide speedy development capabilities.
Kubeflow is supported by all major cloud providers. It provides a standardized environment abstracting all the underlying config so that the researchers and developers can focus on the development with their ML workflows capable of working on cloud resources, laptops and on-prem servers.
In the development phase, hyperparameters optimization is often a critical task and results can be skewed by very minor variations. Manual hyperparameters tuning is a tedious and time-consuming task. Kubeflow provides Katib, a tool that can be used to tune hyperparameters in an automated way. This automation in hyperparameters tuning can reduce the development time considerably.
The principles on which Kubeflow is built upon are:
A Machine Learning Service/Model often varies as per the use case and the data that is provided to it. Composability means the ability to choose what is right for your project. ML model generation is a multi-stage process where we need to carefully choose the stages that are required in our project. Kubeflow handles version switching, correlating various frameworks and libraries by treating each of these as an independent system and then giving us the ability to easily generate a pipeline between these multiple systems.
Portability in Kubeflow means that it generates an abstraction layer between your system and the ML project. That means the ML project can be run anywhere you are using Kubeflow, whether it is our laptop, Training Rig, or the Cloud. Kubeflow handles all the platform-specific configurations and we only need to worry about our ML models and not the underlying configs.
Scalability is the ability to increase and decrease the resource consumption as per the requirement by the project or the request load it needs to handle. As Kubeflow is built on top of Kubernetes it lies in an ideal position to manage all the resources it needs due to the underlying capabilities of a Kubernetes engine. Toggling between computing resources, sharing between multiple teams and region allocation lies in the very foundation of Kubeflow due to its base underlying technology of Kubernetes.
The components that collectively make Kubeflow are:
Kubeflow provides a central dashboard that helps you keep track of all the pipelines, services etc deployed via Kubeflow.
Jupyter notebooks are one of the most used tools in the field of Dat Science and Machine Learning, you can spin up a quick Jupyter Notebook and begin your research and development. It abstracts all the excess details that you need to handle in an IDE. Jupyter Notebooks contain cells in which code can be run in an interpreted manner, these are great for visualization and research work.
Kubeflow comes with the support for various state of the art frameworks for Machine Learning such as Tensorflow, PyTorch, MXNet, MPI and Chainer. These are widely used in the ML industry.
Kubeflow comes with inbuilt ML pipelines for End-to-End orchestration of ML workflow. Reusable and Easy Setup pipelines for experimentation and development.
Serving the ML model as a service for production is the end goal for Machine Learning search work in a company. Kubeflow comes with a wide range of serving tools for your ML models such as TensorFlow Serving, NVIDIA Triton, Seldon Serving, KFServing, etc.
Kubeflow contains a facility for storing metadata for your ML workflow. It helps to maintain and manage your Machine Learning workflows. Metadata contains exec config, models, datasets, and deployment artifacts for Kubernetes.
Feature Storage refers to the production deployment side of an ML Service and is often the part that most Machine Learning teams find challenging. It covers the stage of inference and training the Machine Learning models for production. This feature handles various issue such as:
To address these issues Kubeflow uses Feat, which is an open-source feature store that is used to help teams working on a Machine Learning system for defining, managing, discovering, validating and serving features to the ML models during the training and inference phase.
The process of setting up Kubeflow is explained below.
Before moving to Kubeflow setup basic prior knowledge Kubernetes and Kustomize are required. Kubernetes is the underlying container orchestration service on which Kubeflow is built and Kustomize is a template-free wat to customize application configuration.
Reference Link: Kubernetes Basics
Reference Link: Kustomize
Note: even while using Kubernetes you need to comply with minimum system requirements for deploying Kubeflow on your Kubernetes cluster. The reference link for minimum system requirements is given below:
Reference Link: Minimum System Requirements
Image Source: Kubeflow Docs
You can use your pre-built Kubernetes cluster or follow the process below to create a quick Cluster using minikube. Make sure you have all Kubernetes helper tools installed such as kubectl.
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikub
Note: For other Linus distros and operating systems refer to the link below:
Reference Link: minikube download
minikube start
kubectl version
kubectl cluster-info
Other kubectl commands can be used to get more detailed information related to the cluster. Including the nodes, deployment, services etc. Refer to the reference link mentioned below to get all these commands.
Reference Link: kubectl Cheat Sheet
Kubeflow Operator helps deploy, monitor and manage the Kubeflow lifecycle. It is built using Operator Framework which is an open-source toolkit to built, test, package operators and manage the lifecycle of operators. The Kubeflow Operator uses KfDef as its custom resource and kfctl as the underlying tool for running the operator. It can be installed from operatorhub.io
curl -sL https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.17.0/install.sh | bash -s v0.17.0
kubectl create -f https://operatorhub.io/install/kubeflow.yaml
kubectl get csv -n operators
kubectl get pod -n operators
NAME READY STATUS RESTARTS AGE
kubeflow-operator-55876578df-25mq5 1/1 Running 0 17h
The metadata.name field must be set for the KfDef manifests whether it is downloaded from the Kubeflow manifests repo or is originally written. The following example shows how to prepare the KfDef manifests
# download a default KfDef configuration from remote repo
export KFDEF_URL=https://raw.githubusercontent.com/kubeflow/manifests/v1.1-branch/kfdef/kfctl_ibm.yaml
export KFDEF=$(echo “${KFDEF_URL}” | rev | cut -d/ -f1 | rev)
curl -L ${KFDEF_URL} > ${KFDEF}
# add metadata.name field
# Note: yq can be installed from https://github.com/mikefarah/yq
export KUBEFLOW_DEPLOYMENT_NAME=kubeflow
yq w ${KFDEF} ‘metadata.name’ ${KUBEFLOW_DEPLOYMENT_NAME} > ${KFDEF}.tmp && mv ${KFDEF}.tmp ${KFDEF}
# create the namespace for Kubeflow deployment
KUBEFLOW_NAMESPACE=kubeflow
kubectl create ns ${KUBEFLOW_NAMESPACE}
# create the KfDef custom resource
kubectl create -f ${KFDEF} -n ${KUBEFLOW_NAMESPACE}
kubectl logs deployment/kubeflow-operator -n ${OPERATOR_NAMESPACE} -f
kubectl get pod -n ${KUBEFLOW_NAMESPACE}
As mentioned above Kubeflow is supported by all major cloud providers. Although the underlying process is quite similar, reference docs are available for installation on various cloud providers.
Schedule a call with our experts. Discover new technology and get recommendations to improve your performance.