Post

Kubeflow for ML - Chapter 1





Chapter 1: Kubeflow, What It Is and Who It Is For

๐Ÿ‘€ ๋ณธ ํฌ์ŠคํŠธ๋Š” Kubeflow for Machine Learning ์ฑ…์„ ๋ฐœ์ทŒ/์š”์•ฝํ•˜๋ฉด์„œ ํ•„์š”ํ•œ ๋‚ด์šฉ์€ ์ถ”๊ฐ€ํ•˜์—ฌ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Example
Kubeflow

Kubeflow๋Š” ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธํ‹ฐ์ŠคํŠธ๊ฐ€ ํ•™์Šตํ•œ ๋ชจ๋ธ์„ ์ œํ’ˆํ™”ํ•˜๊ฑฐ๋‚˜ ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๊ฐ€ ๋ชจ๋ธ์„ ํ™•์žฅ ๊ฐ€๋Šฅํ•˜๊ณ  ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒƒ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. Kubeflow๊ฐ€ ํ•˜๋Š” ์ผ์€ Kubernetes ์œ„์—์„œ ์ž‘๋™ํ•˜๋Š” ์—ฌ๋Ÿฌ ๋„๊ตฌ, ํŠนํžˆ ์˜คํ”ˆ ์†Œ์Šค๋“ค์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Model Development Life Cycle

๋ชจ๋ธ ๊ฐœ๋ฐœ ์ƒ๋ช… ์ฃผ๊ธฐ(Model Development Life Cycle, MDLC)๋Š” ๋ชจ๋ธ ํ•™์Šต๊ณผ ๋ชจ๋ธ ์ถ”๋ก  ์‚ฌ์ด์˜ ํ๋ฆ„ ๋˜๋Š” ๊ณผ์ •์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. Figure 1-1์€ ๋ชจ๋ธ ํ•™์Šต๊ณผ ์ถ”๋ก  ์‚ฌ์ด์—์„œ ์ผ์–ด๋‚˜๋Š” ์—ฐ์†์ ์ธ ์ƒํ˜ธ์ž‘์šฉ์„ ์ž˜ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

Example
Figure 1-1. Model development life cycle

Where Does Kubeflow Fit In?

Kubeflow๋Š” MDLC์˜ ๋ชจ๋“  ๋‹จ๊ณ„๋ฅผ ์œ„ํ•œ ํด๋ผ์šฐ๋“œ ๋„ค์ดํ‹ฐ๋ธŒ ๋„๊ตฌ๋“ค์˜ ๋ชจ์Œ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ฐ๊ฐ์˜ ๋„๊ตฌ๋“ค์„ ์‹ฌ๋ฆฌ์Šค(seamless) ํ•˜๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค. ๊ฐ€์žฅ ์ค‘์š”ํ•œ ํŠน์ง•์€ Kubeflow๊ฐ€ ์‚ฌ์šฉ์ž๊ฐ€ MDLC์˜ ๊ฐ ์ปดํฌ๋„ŒํŠธ(component)๋ฅผ ํ†ตํ•ฉ๋œ ์—”๋“œํˆฌ์—”๋“œ(end-to-end) ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ๋นŒ๋“œํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค€๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ์ด๋•Œ Kubeflow๋Š” ์ปจํ…Œ์ด๋„ˆํ™”์™€ ํ™•์žฅ์„ฑ(scalability), ๊ทธ๋ฆฌ๊ณ  ํŒŒ์ดํ”„๋ผ์ธ์˜ ์ด์‹์„ฑ(portability)๊ณผ ์žฌํ˜„์„ฑ(repeatability)์„ ์œ„ํ•ด Kubernetes๋ฅผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์„œ MDLC๋Š” ๋‹ค์Œ์˜ ๋‹จ๊ณ„๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

  • ๋ฐ์ดํ„ฐ ํƒ์ƒ‰ (Data Exploration)
  • ํ”ผ์ฒ˜ ์ค€๋น„ (Feature Preparation)
  • ๋ชจ๋ธ ํ•™์Šต/ํŠœ๋‹ (Model Training/Tuning)
  • ๋ชจ๋ธ ์„œ๋น™ (Model Serving)
  • ๋ชจ๋ธ ํ…Œ์ŠคํŒ… (Model Testing)
  • ๋ชจ๋ธ ๋ฒ„์ €๋‹ (Model Versioning)

Why Containerize? Why Kubernetes?

Example
Difference between virtual machines and containers

์ปจํ…Œ์ด๋„ˆ๋ฅผ ํ†ตํ•œ ๊ฒฉ๋ฆฌ ํ™˜๊ฒฝ์€ ๋จธ์‹ ๋Ÿฌ๋‹ ๋‹จ๊ณ„๋ฅผ ์ด์‹ํ•  ์ˆ˜ ์žˆ๊ณ  ์žฌํ˜„ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค. ์‰ฝ๊ฒŒ ๋งํ•ด์„œ ์ปจํ…Œ์ด๋„ˆํ™”ํ•˜๋ฉด โ€œ์ด๊ฑด ๋‚ด ์ปดํ“จํ„ฐ์—์„œ๋Š” ๋์—ˆ๋Š”๋ฐ ์—ฌ๊ธฐ์„  ์•ˆ๋˜๋„ค?โ€ ๊ฐ™์€ ์ƒํ™ฉ์„ ์ค„์ผ ์ˆ˜ ์žˆ์ฃ . ๋”ฐ๋ผ์„œ ์šฐ๋ฆฌ๊ฐ€ ๋งŒ๋“  ํŒŒ์ดํ”„๋ผ์ธ์„ ํŠน์ • ํด๋ผ์šฐ๋“œ์— ์–ฝ๋งค์ด์ง€ ์•Š๊ณ  ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. Google Cloud์—์„œ๋“  Amazon AWS์—์„œ๋“  ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜๋Š” ๊ฒ๋‹ˆ๋‹ค.

์ปจํ…Œ์ด๋„ˆ์— ๋Œ€ํ•œ ๊ฐœ๋…์„ ์ตํžˆ๊ธฐ์—๋Š” subicura๋‹˜์˜ ์ฟ ๋ฒ„๋„คํ‹ฐ์Šค ์•ˆ๋‚ด์„œ๋งŒํ•œ ๊ฒƒ์ด ์—†๋‹ค๊ณ  ๋ด…๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ํ•ด๋‹น ๊ธ€์„ ์ฐธ๊ณ ํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค. :smiley:

Kubeflowโ€™s Design and Core Components

Kubeflow๋Š” ์œ„์—์„œ ์–ธ๊ธ‰ํ•œ ๊ฒƒ์ฒ˜๋Ÿผ ML ์‹ค๋ฌด์ž๊ฐ€ ๋ณธ์ธ์ด ์›ํ•˜๋Š” ๋Œ€๋กœ ์ž์ฒด ์Šคํƒ์„ ๊ตฌ์„ฑํ•˜๊ณ  ์ปค์Šคํ„ฐ๋งˆ์ด์ฆˆํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ Kubeflow๋Š” ๊ฒฐํ•ฉ์„ฑ (composability), ์ด์‹์„ฑ (portability), ํ™•์žฅ์„ฑ (scalability)์„ ํ†ตํ•ด ML ์‹œ์Šคํ…œ์„ ๊ฐœ๋ฐœํ•˜๊ณ  ๋ฐฐํฌํ•˜๋Š” ํ”„๋กœ์„ธ์Šค๋ฅผ ๋‹จ์ˆœํ™”ํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

  • ๊ฒฐํ•ฉ์„ฑ
    • Kubeflow์˜ ํ•ต์‹ฌ ์ปดํฌ๋„ŒํŠธ๋Š” ์ด๋ฏธ ML ์‹ค๋ฌด์ž๋“ค์ด ์ž์ฃผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๋“ค์ž…๋‹ˆ๋‹ค. Kubeflow๋Š” ์ด ๋„๊ตฌ๋“ค์„ ML ๊ฐ ๋‹จ๊ณ„์— ๋…๋ฆฝ์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ์—”๋“œ ํˆฌ ์—”๋“œ ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ๊ตฌ์„ฑํ•˜๋Š” ๊ฒƒ์„ ๋„์™€์ค๋‹ˆ๋‹ค.
  • ์ด์‹์„ฑ
    • ์ปจํ…Œ์ด๋„ˆ ๊ธฐ๋ฐ˜์˜ ๋””์ž์ธ๊ณผ Kubernetes, ํด๋ผ์šฐ๋“œ ๋„ค์ดํ‹ฐ๋ธŒ ์•„ํ‚คํ…์ณ์˜ ์žฅ์ ์„ ํ†ตํ•ด Kubeflow๋Š” ์‚ฌ์šฉ์ž์—๊ฒŒ ํŠน์ •ํ•œ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ์„ ์š”๊ตฌํ•˜์ง€ ์•Š๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž๋Š” ๋ณธ์ธ์˜ ํ™˜๊ฒฝ์—์„œ ์‹คํ—˜ํ•˜๊ณ  ํ”„๋กœํ† ํƒ€์ดํ•‘ํ•œ ๋‹ค์Œ, ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์— ์‰ฝ๊ฒŒ ๋ฐฐํฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ํ™•์žฅ์„ฑ
    • Kubernetes๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— Kubeflow๋Š” ํด๋Ÿฌ์Šคํ„ฐ์˜ ์š”๊ตฌ์— ๋”ฐ๋ผ ๋™์ ์œผ๋กœ ์‹œ์Šคํ…œ์„ ํ™•์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    • ํŠนํžˆ ํ™•์žฅ์„ฑ์€ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ณ„์† ๋งŽ์•„์ง€๋Š” ํ™˜๊ฒฝ์—์„œ ํฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.

์ฐธ๊ณ ๋กœ 2021๋…„ 5์›” ๊ตฌ๊ธ€์—์„œ ๋ฐœ๊ฐ„ํ•œ ๋ฐฑ์„œ์ธ Practitioners guide to MLOps ์—์„œ๋Š” MLOps์˜ ํ•ต์‹ฌ ์š”์†Œ๋“ค์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.

Example
Core MLOps Technical Capabilities

๋‹ค์Œ์€ Kubeflow์˜ ์ปดํฌ๋„ŒํŠธ๋“ค์ž…๋‹ˆ๋‹ค.

  • Data Exploraation with Notebooks
    • ๋ฐ์ดํ„ฐ ํƒ์ƒ‰์€ MDLC์˜ ์‹œ์ž‘์ ์œผ๋กœ ๋ณดํ†ต Jupyter Notebook์„ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.
  • Data/Feature Preparation
    • ML ๋ชจ๋ธ์„ ๋งŒ๋“ค ๋•Œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋ถ€๋ถ„ ์ค‘ ํ•˜๋‚˜๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์ถ”์ถœํ•˜๊ณ  ๋ณ€ํ™˜ํ•˜๊ณ  ๋ถˆ๋Ÿฌ์˜ค๋Š” ์ž‘์—…์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.
    • Kubeflow์—์„œ๋Š” Apache Spark์™€ TensorFlow Transform์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. Apache Spark๋Š” ๋Œ€๊ทœ๋ชจ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ž‘์—…ํ•  ๋•Œ ์šฉ์ดํ•˜๋ฉฐ, TensorFlow Transform์€ TensorFlow Serving๊ณผ์˜ ํ†ตํ•ฉ์„ ํ†ตํ•ด ์ถ”๋ก  ์ž‘์—…์„ ์‰ฝ๊ฒŒ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • Training
    • Kubeflow์—์„œ๋Š” ๋‹ค์Œ์˜ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.
      • TensorFlow
      • PyTorch
      • Apache MXNet
      • XGBoost
      • Chainer
      • Caffe2
      • Message passing interface (MPI)
  • Hyperparameter Tuning
    • Katib์ด ์žˆ์Šต๋‹ˆ๋‹ค. Katib์€ AutoML์„ ์œ„ํ•œ Kubernetes-native ํ”„๋กœ์ ํŠธ์ž…๋‹ˆ๋‹ค. ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹๊ณผ ์–ผ๋ฆฌ ์Šคํƒ€ํ•‘ (Early stopping), ๋‰ด๋Ÿด ์•„ํ‚คํ…์ฒ˜ ์„œ์น˜ (Neural Architecture Search, NAS)๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.
  • Model Validation
  • Inference/Prediction
    • Kubeflow๋Š” KFServing ๊ฐ™์€ ์„œ๋น™์„ ์œ„ํ•œ ๋ฉ€ํ‹ฐํ”„๋ ˆ์ž„์›Œํฌ ์ปดํฌ๋„ŒํŠธ๋ฅผ ์ง€์›ํ•˜๋ฉฐ, ์ถ”๊ฐ€๋กœ TensorFlow Serving, Seldon Core, BentoML ๊ฐ™์€ ๊ธฐ์กด ์„œ๋น™ ํˆด๋„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.
  • Pipelines
    • ์œ„์˜ ์ปดํฌ๋„ŒํŠธ๋Š” ๋ชจ๋‘ MDLC์˜ ๊ฐ ๋‹จ๊ณ„์— ๋Œ€์‘ํ•˜๋Š” ์ปดํฌ๋„ŒํŠธ์ด๋ฉฐ, Kubeflow๋Š” MDLC๋ฅผ ML ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ์ทจ๊ธ‰ํ•˜์—ฌ ๊ฐ ๋…ธ๋“œ๊ฐ€ ML ์›Œํฌํ”Œ๋กœ์˜ ๋‹จ๊ณ„์ธ ๊ทธ๋ž˜ํ”„๋กœ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค.
    • Kubeflow ํŒŒ์ดํ”„๋ผ์ธ์€ ์‚ฌ์šฉ์ž๊ฐ€ ์žฌ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์›Œํฌํ”Œ๋กœ์šฐ๋ฅผ ์‰ฝ๊ฒŒ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ์ปดํฌ๋„ŒํŠธ์ž…๋‹ˆ๋‹ค.
Example
Figure 1-3. A Kubeflow pipeline


This post is licensed under CC BY 4.0 by the author.