Post

Kubeflow for ML - Chapter 3



Kubeflow for ML - Chapter 3

Chapter 3: Kubeflow Design: Beyond the Basics

๐Ÿ‘€ ๋ณธ ํฌ์ŠคํŠธ๋Š” Kubeflow for Machine Learning ์ฑ…์„ ๋ฐœ์ทŒ/์š”์•ฝํ•˜๋ฉด์„œ ํ•„์š”ํ•œ ๋‚ด์šฉ์€ ์ถ”๊ฐ€ํ•˜์—ฌ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Introduction

๋ณธ ์žฅ์—์„œ๋Š” Kubeflow์˜ ์ปดํฌ๋„ŒํŠธ๋“ค์„ ์‚ดํŽด๋ณด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์•„๋ž˜ Figure 3-1์€ Kubeflow์˜ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

Kubeflow architecture
Figure 3-1. Kubeflow architecture

Getting Around the Central Dashboard

Kubeflow์˜ ๋ฉ”์ธ ์ธํ„ฐํŽ˜์ด์Šค๋Š” ์„ผํŠธ๋Ÿด ๋Œ€์‹œ๋ณด๋“œ(central dashboard)์ž…๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž๋Š” ์ด ํŽ˜์ด์ง€๋ฅผ ํ†ตํ•ด ๋Œ€๋ถ€๋ถ„์˜ Kubeflow ์ปดํฌ๋„ŒํŠธ์— ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Central dashboard
Figure 3-2. The central dashboard

Notebooks (JupyterHub)

Kubeflow์—์„œ๋Š” Notebook ํ™˜๊ฒฝ์œผ๋กœ JuypterHub๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋‹จ์ผ ์‚ฌ์šฉ์ž์˜ Jupyter Notebook์„ ์—ฌ๋Ÿฌ ์ธ์Šคํ„ด์Šค์— ๋Œ€ํ•ด ์ƒ์„ฑ, ๊ด€๋ฆฌ, ํ”„๋ก์‹œํ•˜๋Š” ๋‹ค์ค‘ ์‚ฌ์šฉ์ž์šฉ ํ—ˆ๋ธŒ์ธ๋ฐ์š”. JupyterHub์—์„œ ์ ‘๊ทผํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์™ผ์ชฝ ์‚ฌ์ด๋“œ ๋ฉ”๋‰ด์—์„œ Notebooks๋ฅผ ํด๋ฆญํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ์ƒˆ๋กœ์šด ์„œ๋ฒ„๋ฅผ ์ƒ์„ฑํ•  ๋•Œ Docker ์ด๋ฏธ์ง€, ์„œ๋ฒ„ ์ž์› ๋“ฑ์„ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Notebook settings
JupyterHub in Kubeflow

Kubeflow๋Š” ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธํ‹ฐ์ŠคํŠธ๊ฐ€ ๋…ธํŠธ๋ถ ํ™˜๊ฒฝ์„ ๋ฒ—์–ด๋‚˜์ง€ ์•Š์€ ์ฑ„๋กœ ํด๋Ÿฌ์Šคํ„ฐ์™€ ๊ด€๋ จ๋œ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋…ธํŠธ๋ถ ์ด๋ฏธ์ง€์— kubectl์„ ํฌํ•จํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ Jupyter Notebook์˜ ์•„๋ฌด ์…€์—์„œ !kubectl get pod -A ๋ฅผ ์‹คํ–‰ํ•˜๋ฉด ํ˜„์žฌ Kubernetes์˜ Pod ๋ชฉ๋ก์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Jupyter notebook Pod์€ default-editor๋ผ๋Š” ํŠน์ˆ˜ํ•œ ์„œ๋น„์Šค ๊ณ„์ •์œผ๋กœ ์‹คํ–‰๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ณ„์ •์€ Pods, Deployments, Services, Jobs, TFJobs, PyTorchJobs์™€ ๊ฐ™์€ ๋‹ค์–‘ํ•œ Kubernetes ๊ถŒํ•œ์„ ๊ฐ–๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ณ„์ •์„ ์‚ฌ์šฉ์ž ์ •์˜ ์—ญํ•  (custom role)์— ๋ฐ”์ธ๋”ฉํ•˜์—ฌ ๋…ธํŠธ๋ถ ์„œ๋ฒ„์˜ ๊ถŒํ•œ์„ ์ œํ•œํ•˜๊ฑฐ๋‚˜ ํ™•์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Training Operators

JupyterHub๋งŒ์œผ๋กœ ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์—์„œ์˜ ํ•™์Šต์€ ์‰ฝ์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— Kubeflow์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ•™์Šต ์ปดํฌ๋„ŒํŠธ๋ฅผ ์ œ๊ณตํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

  • Chainer
  • MPI
  • Apache MXNet
  • PyTorch
  • TensorFlow

Kubeflow์—์„œ๋Š” ์˜คํผ๋ ˆ์ดํ„ฐ(operator)๋ผ๋Š” application-specific ์ปจํŠธ๋กค๋Ÿฌ๋กœ ๋ถ„์‚ฐ ํ•™์Šต ์ž‘์—…์„ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ์ด ์˜คํผ๋ ˆ์ดํ„ฐ๋Š” Kubernetes API๋กœ ํ™•์žฅํ•˜์—ฌ ๋ฆฌ์†Œ์Šค ์ƒํƒœ๋ฅผ ์ƒ์„ฑ, ๊ด€๋ฆฌ, ์กฐ์ž‘ํ•ฉ๋‹ˆ๋‹ค. ๋” ๋‚˜์•„๊ฐ€์„œ ์˜คํผ๋ ˆ์ดํ„ฐ๋ฅผ ํ†ตํ•ด ํ™•์žฅ์„ฑ, ๊ด€์ฐฐ ๊ฐ€๋Šฅ์„ฑ(observability), ํŽ˜์ผ์˜ค๋ฒ„(failover)์™€ ๊ฐ™์€ ์ค‘์š”ํ•œ ๋ฐฐํฌ ์ปจ์…‰๋“ค์„ ์ž๋™ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ํŒŒ์ดํ”„๋ผ์ธ์—์„œ ์‹œ์Šคํ…œ ๋‚ด ๋‹ค๋ฅธ ๊ตฌ์„ฑ ์š”์†Œ์˜ ์‹คํ–‰๋“ค์„ ์—ฐ๊ฒฐํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Kubeflow Pipelines

Kubeflow Pipelines์€ ๋จธ์‹ ๋Ÿฌ๋‹ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์‹คํ–‰์„ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ดํŠธํ•ฉ๋‹ˆ๋‹ค. Argo Workflows ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตฌํ˜„๋˜์–ด ์žˆ์œผ๋ฉฐ, ๋”ฐ๋ผ์„œ Kubeflow๋Š” Argo ์ปดํฌ๋„ŒํŠธ๋“ค์„ ์„ค์น˜ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ํฐ ํ‹€์—์„œ ํŒŒ์ดํ”„๋ผ์ธ ์‹คํ–‰์—๋Š” ๋‹ค์Œ ์ปดํฌ๋„ŒํŠธ๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

  • Python SDK
  • DSL Compiler
  • Pipeline Service
  • Kubernetes resources
    • ํŒŒ์ดํ”„๋ผ์ธ ์„œ๋น„์Šค๋Š” Kubernetes API๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ํŒŒ์ดํ”„๋ผ์ธ์„ ์‹คํ–‰ํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ Kubernetes CRD (์‚ฌ์šฉ์ž ์ •์˜ ๋ฆฌ์†Œ์Šค ์ •์˜, Custom Resource Definitions)๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • Orchestration controllers
    • ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜ ์ปจํŠธ๋กค๋Ÿฌ๋“ค์€ CRD์—์„œ ์ง€์ •ํ•œ ํŒŒ์ดํ”„๋ผ์ธ ์‹คํ–‰์„ ์™„๋ฃŒ์— ํ•„์š”ํ•œ ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ปจํ…Œ์ด๋„ˆ๋“ค์€ ๊ฐ€์ƒ ๋จธ์‹ ์˜ Kubernetes Pod ๋‚ด์—์„œ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค.
  • Artifact storage
    • Metadata
      • ์‹คํ—˜, ์ž‘์—…, ์‹คํ–‰, ๋ฉ”ํŠธ๋ฆญ ๋“ฑ์œผ๋กœ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋Š” MySQL์— ์ €์žฅ๋ฉ๋‹ˆ๋‹ค.
    • Artifacts
      • ํŒŒ์ดํ”„๋ผ์ธ ํŒจํ‚ค์ง€, ๋ทฐ, ์‹œ๊ณ„์—ด๊ณผ ๊ฐ™์€ ๋Œ€๊ทœ๋ชจ ๋ฉ”ํŠธ๋ฆญ ๋“ฑ์œผ๋กœ Kubeflow Pipelines์€ ์ด๋Ÿฐ ์•„ํ‹ฐํŒฉํŠธ๋“ค์„ MinIO server, GCS, Amazon S4์™€ ๊ฐ™์€ ์•„ํ‹ฐํŒฉํŠธ ์Šคํ† ์–ด์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

Hyperparameter Tuning

Kubeflow๋Š” Katib ๊ฐ™์€ ์ปดํฌ๋„ŒํŠธ๋ฅผ ํ†ตํ•ด Kubernetes ํด๋Ÿฌ์Šคํ„ฐ ์œ„์—์„œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์„ ์ˆ˜ํ–‰ํ•˜๋„๋ก ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. Katib์€ Bayesian Optimization ๊ธฐ๋ฐ˜์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ํ”„๋ ˆ์ž„์›Œํฌ๋กœ TensorFlow, MXNet, PyTorch ๋“ฑ์— ๋Œ€ํ•œ ํŠœ๋‹์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. Katib์€ ๋‹ค์Œ ๋„ค ๊ฐ€์ง€ ์ฃผ์š” ์ปจ์…‰์„ ๊ธฐ๋ฐ˜์— ๋‘๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

  • Experiment
    • Feasible space์—์„œ ์‹คํ–‰ํ•˜๋Š” ํ•˜๋‚˜์˜ ์ตœ์ ํ™” ์ž‘์—…์„ ๋งํ•ฉ๋‹ˆ๋‹ค. ์‹คํ—˜ ์ค‘์—๋Š” ๋ชฉ์  ํ•จ์ˆ˜ $f(x)$๊ฐ€ ๋ฐ”๋€Œ์ง€ ์•Š๋Š” ๊ฒƒ์„ ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.
  • Trial
    • ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์˜ ๋ชฉ๋ก์ž…๋‹ˆ๋‹ค. ํ•˜๋‚˜์˜ Trial์ด ๋๋‚˜๋ฉด ๋ชฉ์  ํ•จ์ˆ˜ $f(x)$์— ๋Œ€ํ•œ ๊ณ„์‚ฐ๊ฐ’์ด ๋‚˜์˜ค๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
  • Job
  • Suggestion
    • ํŒŒ๋ผ๋ฏธํ„ฐ ์ง‘ํ•ฉ์„ ๊ตฌ์ถ•ํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. Katib์€ ํ˜„์žฌ Random, Grid, Hyperband, Bayesian Optimization์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

Model Inference

Kubeflow๋Š” ML ๋ชจ๋ธ์„ ์šด์˜ ํ™˜๊ฒฝ์— ๋งž์ถฐ ํ™•์žฅํ•ด ๋ฐฐํฌํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. TFServing, Seldon Serving, PyTorch Serving, TensorRT ๊ฐ™์€ ๋ชจ๋ธ ์„œ๋น™ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ๊ณตํ•  ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ž๋™ ํ™•์žฅ, ๋„คํŠธ์›Œํ‚น, ํ—ฌ์Šค ์ฒดํ‚น, ์„œ๋ฒ„ ๊ตฌ์„ฑ์— ๋Œ€ํ•œ ๋ชจ๋ธ ์ถ”๋ก  ๋ฌธ์ œ๋ฅผ ์ผ๋ฐ˜ํ™”ํ•˜๋Š” KFServing๊นŒ์ง€ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ „์ฒด์ ์ธ ๊ตฌํ˜„์€ Istio์™€ Knative Serving์„ ๊ธฐ๋ฐ˜์— ๋‘๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๊ธฐ๋ณธ์ ์œผ๋กœ ๋ชจ๋ธ ์„œ๋น™์€ ๊นŒ๋‹ค๋กญ๊ธฐ ๋•Œ๋ฌธ์— ๋น ๋ฅธ ์Šค์ผ€์ผ์—…๊ณผ ์Šค์ผ€์ผ๋‹ค์šด์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. Knative serving์€ ์ƒˆ๋กœ์šด ์š”์ฒญ์„ ์ž๋™์œผ๋กœ ์ตœ์‹  ๋ชจ๋ธ ๋ฐฐํฌ๋กœ ๋ผ์šฐํŒ…ํ•˜์—ฌ ์ง€์†์ ์ธ ๋ชจ๋ธ ์—…๋ฐ์ดํŠธ๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด์„œ๋Š” ์ดํ›„ ๋กค๋ฐฑ์„ ์œ„ํ•ด ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ๋ชจ๋ธ์„ ๊ณ„์† ์œ ์ง€ํ•˜๋˜ ๋ฆฌ์†Œ์Šค ํ™œ์šฉ์„ ์ตœ์†Œํ™”ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. Knative๋Š” ํด๋ผ์šฐ๋“œ ๋„ค์ดํ‹ฐ๋ธŒ์ต ๋•Œ๋ฌธ์— ๊ธฐ๋ณธ ์ธํ”„๋ผ ์Šคํƒ์˜ ์ด์ ์„ ํ™œ์šฉํ•ด Kubernetes์— ์žˆ๋Š” ๋กœ๊น…, ํŠธ๋ ˆ์ด์‹ฑ, ๋ชจ๋‹ˆํ„ฐ๋ง ๋“ฑ์˜ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. KFServing ๋˜ํ•œ Knative eventing์„ ์‚ฌ์šฉํ•ด ํ”Œ๋Ÿฌ๊ทธํ˜• ์ด๋ฒคํŠธ ์†Œ์Šค๋ฅผ ์„ ํƒ์ ์œผ๋กœ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

KFServing ๋ฐฐํฌ๋Š” ๋‹ค์Œ์˜ ์ปดํฌ๋„ŒํŠธ๋“ค์„ ์—ฐ๊ฒฐํ•˜๋Š” ์ผ์ข…์˜ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ดํ„ฐ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

  • Preprocessor
    • ์„ ํƒ์  ์ปดํฌ๋„ŒํŠธ๋กœ ๋ชจ๋ธ ์„œ๋น™์— ํ•„์š”ํ•œ ํ˜•ํƒœ๋กœ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ€ํ™˜ํ•˜๋Š” ์—ญํ• 
  • Predictor
    • ํ•„์ˆ˜์ ์ธ ์ปดํฌ๋„ŒํŠธ๋กœ ์‹ค์ œ ๋ชจ๋ธ ์„œ๋น™ํ•˜๋Š” ์—ญํ• 
  • Postprocessor
    • ์„ ํƒ์  ์ปดํฌ๋„ŒํŠธ๋กœ ๋ชจ๋ธ ์„œ๋น™ ๊ฒฐ๊ณผ๋ฅผ ์ถœ๋ ฅ์— ๋งž๋Š” ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ์—ญํ• 

Metadata

๋ชจ๋ธ ์ƒ์„ฑ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์ถ”์ ํ•˜๊ณ  ์บก์ฒ˜ํ•˜๋Š” ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๊ด€๋ฆฌ ๊ธฐ๋Šฅ์€ Kubeflow์—์„œ ์ค‘์š”ํ•œ ์ปดํฌ๋„ŒํŠธ ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ปดํฌ๋„ŒํŠธ์—๋Š” ๋‹ค์Œ ์ •๋ณด๋ฅผ ๋“ฑ๋กํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ๋ชจ๋ธ ์ƒ์„ฑ์— ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ ์†Œ์Šค
  • ํŒŒ์ดํ”„๋ผ์ธ์˜ ์ปดํฌ๋„ŒํŠธ๋‚˜ ๊ฐ ๋‹จ๊ณ„๋ฅผ ํ†ตํ•ด ์ƒ์„ฑ๋œ ์•„ํ‹ฐํŒฉํŠธ
  • ์ปดํฌ๋„ŒํŠธ๋‚˜ ๊ฐ ๋‹จ๊ณ„์˜ ์‹คํ–‰ ๊ฒฐ๊ณผ
  • ํŒŒ์ดํ”„๋ผ์ธ๊ณผ ๊ด€๋ จ๋œ ์—ฐ๊ฒฐ ์ •๋ณด

์ด์ฒ˜๋Ÿผ ML ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋Š” ML ์›Œํฌํ”Œ๋กœ์šฐ ๋‚ด ์ปดํฌ๋„ŒํŠธ์™€ ๊ฐ ๋‹จ๊ณ„์˜ ์ธํ’‹๊ณผ ์•„์›ƒํ’‹์„ ์ถ”์ ํ•ฉ๋‹ˆ๋‹ค.

Metadata diagram
Figure 3-3. Metadata diagram

Support Components

MinIO

ํŒŒ์ดํ”„๋ผ์ธ ์•„ํ‚คํ…์ฒ˜์˜ ๊ธฐ๋ฐ˜์€ ๊ณต์œ  ์ €์žฅ์†Œ์ž…๋‹ˆ๋‹ค. ์š”์ฆ˜์€ ํด๋ผ์šฐ๋“œ ์ œ๊ณต ์—…์ฒด๋งˆ๋‹ค ๋ณ„๋„์˜ ์Šคํ† ๋ฆฌ์ง€๋ฅผ ์ œ๊ณตํ•˜๋Š”๋ฐ์š”. ์ด๋กœ ์ธํ•ด ์˜์กด์„ฑ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Kubeflow๋Š” ์ด๋Ÿฐ ์˜์กด์„ฑ ๋ฌด์ œ๋ฅผ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋Œ€๊ทœ๋ชจ ํ”„๋ผ์ด๋น— ํด๋ผ์šฐ๋“œ ์ธํ”„๋ผ์šฉ์œผ๋กœ ์„ค๊ณ„๋œ ๊ณ ์„ฑ๋Šฅ ๋ถ„์‚ฐ ๊ฐ์ฒด ์ €์žฅ์†Œ์ธ MinIO๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋ฌผ๋ก  ํ”„๋ผ์ด๋น— ํด๋ผ์šฐ๋“œ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ํผ๋ธ”๋ฆญ APPI์— ๋Œ€ํ•œ ์ผ๊ด€์ ์ธ ๊ฒŒ์ดํŠธ์›จ์ด ์—ญํ• ๋„ ์ˆ˜ํ–‰ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

MinIO๋Š” ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Kubeflow์—์„œ ์ œ๊ณตํ•˜๋Š” ๊ธฐ๋ณธ๊ฐ’์€ ๋‹จ์ผ ์ปจํ…Œ์ด๋„ˆ ๋ชจ๋“œ์ธ๋ฐ ์ด๋Š” ์„ค์ •์„ ํ†ตํ•ด ๋ถ„์‚ฐํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์œ ์—ฐํ•œ ๊ฒŒ์ดํŠธ์›จ์ด ์˜ต์…˜์„ ์ œ๊ณตํ•˜์—ฌ ๊ทœ๋ชจ ์ œํ•œ ์—†์ด ํด๋ผ์šฐ๋“œ์— ๋…๋ฆฝ์ ์ธ ๊ตฌํ˜„์„ ๊ฐ€๋Šฅ์ผ€ ํ•ฉ๋‹ˆ๋‹ค.

MinIO์— ์ ‘๊ทผํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ํ†ตํ•ด ํฌํŠธํฌ์›Œ๋”ฉํ•˜์—ฌ https://localhost:9000์œผ๋กœ ์ ‘์†ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (HTTPS ๋ฆฌ๋‹ค์ด๋ ‰์…˜์ด ์•ˆ๋œ๋‹ค๋ฉด http://localhost:9000์œผ๋กœ ์ ‘์†ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.) ๊ธฐ๋ณธ ์•„์ด๋””์™€ ๋น„๋ฐ€๋ฒˆํ˜ธ๋Š” minio/minio123 ์ž…๋‹ˆ๋‹ค.

1
kubectl port-forward --address=0.0.0.0 -n kubeflow svc/minio-service 9000:9000
MinIO dashboard
Figure 3-4. MinIO dashboard

์—ฌ๊ธฐ์— MinIO CLI๋ฅผ ์„ค์น˜ํ•˜์—ฌ ์›Œํฌ์Šคํ…Œ์ด์…˜์—์„œ MinIO์— ๋Œ€ํ•œ ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ œ ๊ฒฝ์šฐ์—๋Š” Linuxbrew๋ฅผ ์„ค์น˜ํ•ด๋†“์•„์„œ ๊ฐ„๋‹จํ•˜๊ฒŒ ์„ค์น˜ํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

1
brew install minio/stable/minio

์„ค์น˜๊ฐ€ ์™„๋ฃŒ๋˜๋ฉด MinIO ํด๋ผ์ด์–ธํŠธ๊ฐ€ Kubeflow MinIO์— ์—ฐ๊ฒฐ๋˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

1
mc config host add minio http://localhost:9000 minio minio123

์ด์ œ ๋งŒ์•ฝ ์ƒˆ๋กœ์šด ๋ฒ„์ผ“์„ ๋งŒ๋“ ๋‹ค๋ฉด ๋‹ค์Œ์˜ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

1
mc mb minio/kf-book-examples
Create bucket in MinIO
Create a bucket in MinIO

Istio

Kubeflow๋ฅผ ์ง€์›ํ•˜๋Š” ๋˜ ๋‹ค๋ฅธ ์ปดํฌ๋„ŒํŠธ๋Š” Istio์ž…๋‹ˆ๋‹ค. Istio๋Š” ๋‹ค์Œ ๊ธฐ๋Šฅ๋“ค์„ ์ œ๊ณตํ•˜๋Š” ์ผ์ข…์˜ ์„œ๋น„์Šค ๋ฉ”์‹œ (service mesh)์ž…๋‹ˆ๋‹ค.

  • ์„œ๋น„์Šค ๋””์Šค์ปค๋ฒ„๋ฆฌ (Service discovery)
    • ์„œ๋น„์Šค๊ฐ€ ์˜คํ†  ์Šค์ผ€์ผ๋ง ๋“ฑ์˜ ์ด์œ ๋กœ ๋™์ ์œผ๋กœ ์ƒ์„ฑ๋˜๊ฑฐ๋‚˜ ์ปจํ…Œ์ด๋„ˆ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฐฐํฌ๋จ์— ๋”ฐ๋ผ ์„œ๋น„์Šค์˜ IP๊ฐ€ ๋™์ ์œผ๋กœ ๋ณ€๊ฒฝ๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์€๋ฐ, ์„œ๋น„์Šค ํด๋ผ์ด์–ธํŠธ๊ฐ€ ์„œ๋น„์Šค๋ฅผ ํ˜ธ์ถœํ•  ๋•Œ ์„œ๋น„์Šค์˜ ์œ„์น˜๋ฅผ ์•Œ์•„๋‚ด๋Š” ๊ธฐ๋Šฅ [์ถœ์ฒ˜]
  • ๋กœ๋“œ ๋ฐธ๋Ÿฐ์‹ฑ (Load balancing)
  • ์žฅ์•  ๋ณต๊ตฌ (Failure Recovery)
  • ๋ฉ”ํŠธ๋ฆญ (Metrics)
  • ๋ชจ๋‹ˆํ„ฐ๋ง (Monitoring)
  • ์†๋„ ์ œํ•œ (Rate limiting)
  • ์ ‘๊ทผ ์ œ์–ด (Access control)
  • ์—”๋“œํˆฌ์—”๋“œ ์ธ์ฆ (End-to-end authentication)

Istio๋Š” ๋…ผ๋ฆฌ์ ์œผ๋กœ ๋ฐ์ดํ„ฐ ์˜์—ญ (data plane)๊ณผ ์ปจํŠธ๋กค ์˜์—ญ (control plane)์œผ๋กœ ๋ถ„๋ฆฌ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

  • ๋ฐ์ดํ„ฐ ์˜์—ญ (Data plane) : ํŠธ๋ž˜ํ”ฝ์„ ์ „์†กํ•˜๋Š” ๋ชฉ์ ์„ ์ œ๊ณตํ•˜๋Š” ์˜์—ญ์œผ๋กœ ์ปจํŠธ๋กค ์˜์—ญ์— ์˜ํ•ด ํ†ต์ œ๋ฉ๋‹ˆ๋‹ค.
  • ์ปจํŠธ๋กค ์˜์—ญ (Control plane) : ๋ฐ์ดํ„ฐ ์˜์—ญ์„ ์ œ์–ดํ•˜๋Š” ์˜์—ญ์ž…๋‹ˆ๋‹ค.
Istio architecture
Figure 3-5. Istio architecture

Istio์˜ ์ฃผ์š” ์ปดํฌ๋„ŒํŠธ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • Envoy
    • Istio์˜ ๋ฐ์ดํ„ฐ ์˜์—ญ์€ ์žฅ์•  ์ฒ˜๋ฆฌ (failure handling), ๋™์  ์„œ๋น„์Šค ๋””์Šค์ปค๋ฒ„๋ฆฌ, ๋กœ๋“œ ๋ฐธ๋Ÿฐ์‹ฑ ๋“ฑ์˜ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๋Š” Envoy proxy์— ๊ธฐ๋ฐ˜์„ ๋‘๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. Envoy๋Š” ๋‹ค์Œ์˜ ๊ธฐ๋Šฅ๋“ค์„ ํƒ‘์žฌํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
      • ๋™์  ์„œ๋น„์Šค ๋””์Šค์ปค๋ฒ„๋ฆฌ
      • ๋กœ๋“œ ๋ฐธ๋Ÿฐ์‹ฑ
      • TLS ํ„ฐ๋ฏธ๋„ค์ด์…˜ (TLS termination)
      • HTTP/2, gRPC ํ”„๋ก์‹œ
      • ์„œํ‚ท ๋ธŒ๋ ˆ์ด์ปค (Circuit breaker)
      • ํ—ฌ์Šค ์ฒดํฌ (Health checks)
      • ํผ์„ผํŠธ ๊ธฐ๋ฐ˜์˜ ํŠธ๋ž˜ํ”ฝ ๋ถ„ํ• ์„ ํ†ตํ•œ ๋‹จ๊ณ„์  ๋กค์•„์›ƒ (Staged rollouts with percent-based traffic splitting)
      • ๊ฒฐํ•จ ์ฃผ์ž… (Fault injection)
      • ๋‹ค์–‘ํ•œ ๋ฉ”ํŠธ๋ฆญ
  • Mixer
    • ์„œ๋น„์Šค ๋ฉ”์‹œ ์ „๋ฐ˜์— ๊ฑธ์ณ ์ ‘๊ทผ ์ œ์–ด ๋ฐ ์‚ฌ์šฉ ์ •์ฑ…์„ ์ ์šฉํ•˜๊ณ  Envoy ํ”„๋ก์‹œ์™€ ๊ธฐํƒ€ ์„œ๋น„์Šค๋กœ๋ถ€ํ„ฐ ํ…”๋ ˆ๋ฉ”ํŠธ๋ฆฌ(telemetry) ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•ฉ๋‹ˆ๋‹ค.
  • Pilot
    • Pilot์€ ์ง€๋Šฅํ˜• ๋ผ์šฐํŒ… (intelligent routing)๊ณผ ๋ณต์›์„ฑ (resiliency)์„ ์œ„ํ•œ Envoy ์‚ฌ์ด๋“œ์นด์™€ ํŠธ๋ž˜ํ”ฝ ๊ด€๋ฆฌ ๊ธฐ๋Šฅ์— ๋Œ€ํ•ด ์„œ๋น„์Šค ๋””์Šค์ปค๋ฒ„๋ฆฌ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
  • Galley
    • Galley๋Š” Istio์˜ ๊ตฌ์„ฑ์— ๋Œ€ํ•œ ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ, ์ˆ˜์ง‘, ์ฒ˜๋ฆฌ, ๋ฐฐํฌ๋ฅผ ์œ„ํ•œ ์ปดํฌ๋„ŒํŠธ์ž…๋‹ˆ๋‹ค. Kubernetes์˜ YAML ํŒŒ์ผ์„ Istio๊ฐ€ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • Citadel
    • Citadel์€ ๊ฐ•๋ ฅํ•œ ์„œ๋น„์Šค-ํˆฌ-์„œ๋น„์Šค์™€ ์—”๋“œ ์œ ์ € ์ธ์ฆ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

Knative

Knative Architecture
Figure 3-6. Knative architecture

Knative์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ์š”์†Œ๋Š” Knative Serving์ž…๋‹ˆ๋‹ค. Knative Serving์€ ์„œ๋ฒ„๋ฆฌ์Šค(serverless) ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ๋ฐฐํฌ์™€ ์„œ๋น„์Šค๋ฅผ ์ง€์›ํ•˜๋Š”๋ฐ์š”. Knative Serving์€ Kubernetes CRD์˜ ์ง‘ํ•ฉ์œผ๋กœ ๊ตฌํ˜„์ด ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

  • Service
  • Route
  • Configuration
  • Revision

Apache Spark

Kubeflow 1.0 ๋ถ€ํ„ฐ Spark ์žก์„ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” Spark operator๋ฅผ ๋‚ด์žฅํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ถ”๊ฐ€๋กœ Google์˜ Dataproc๊ณผ Amazon์˜ Elastic Map Reduce (EMR)๊ณผ์˜ ํ†ตํ•ฉ๋„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

Kubeflow Multiuser Isolation

์ตœ์‹  ๋ฒ„์ „์˜ Kubeflow๋Š” ๋‹ค์ค‘ ์‚ฌ์šฉ์ž ๊ฒฉ๋ฆฌ(multiuser isolation)์„ ๋„์ž…ํ•˜์—ฌ ๋‹ค๋ฅธ ํŒ€์ด๋‚˜ ์‚ฌ์šฉ์ž์™€ ๊ฐ™์€ ๋ฆฌ์†Œ์Šค ํ’€์„ ๊ณต์œ ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์‚ฌ์šฉ์ž๋Š” ์„œ๋กœ์˜ ๋ฆฌ์†Œ์Šค๋ฅผ ์‹ค์ˆ˜๋กœ ์ฐธ์กฐํ•˜๊ฑฐ๋‚˜ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๊ณ ๋„ ์ž์‹ ์˜ ๋ฆฌ์†Œ์Šค๋ฅผ ๊ฒฉ๋ฆฌํ•˜์—ฌ ๋ณดํ˜ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 1.0 ๋ฒ„์ „๋ถ€ํ„ฐ๋Š” Kubeflow์˜ Jupyter notebook ์„œ๋น„์Šค๊ฐ€ ๋‹ค์ค‘ ์‚ฌ์šฉ์ž ๊ฒฉ๋ฆฌ๋ฅผ ์™„์ „ํžˆ ์ง€์›ํ•˜๋Š” ์ฒซ ๋ฒˆ์งธ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์ด ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.



This post is licensed under CC BY 4.0 by the author.