Post

N-Beats (2019)





๐Ÿ“„ Oreshkin, Boris N., et al. โ€œN-BEATS: Neural basis expansion analysis for interpretable time series forecasting.โ€ย arXiv preprint arXiv:1905.10437ย (2019).

๋“ค์–ด๊ฐ€๋ฉฐ

2018๋…„ ์‹œ๊ณ„์—ด ๋ชจ๋ธ ๊ฒฝ์ง„๋Œ€ํšŒ์ธ M4 Competition์ด ์—ด๋ ธ์—ˆ๋Š”๋ฐ์š”. ํ•ด๋‹น ๋Œ€ํšŒ์—์„œ ์žฌ๋ฐŒ๊ฒŒ ์—ฌ๊ธธ๋งŒํ•œ ์ ์€ ๋ฐ”๋กœ ์ˆœ์ˆ˜ ML ๋ชจ๋ธ๋“ค์˜ ์„ฑ์ ์ž…๋‹ˆ๋‹ค. ์ด 60๊ฐœ์˜ ํŒ€์—์„œ ์—ฌ์„ฏ ๊ฐœ์˜ ํŒ€๋งŒ์ด ์ˆœ์ˆ˜ ML ๋ชจ๋ธ์„ ์ œ์ถœํ•˜์˜€๊ณ , ํ•ด๋‹น ํŒ€๋“ค์˜ ์ˆœ์œ„ ์ค‘ ๊ฐ€์žฅ ๋†’์€ ์„ฑ์ ์€ 23์œ„์˜€์Šต๋‹ˆ๋‹ค. ์ฐธ๊ณ ๋กœ ํ•ด๋‹น ๋Œ€ํšŒ์—์„œ ์šฐ์Šน์„ ์ฐจ์ง€ํ•œ ๋ชจ๋ธ์€ ํ†ต๊ณ„์ ์ธ ๋ฐฉ๋ฒ•๋ก ๊ณผ ML ๋ฐฉ๋ฒ•๋ก ์„ ์„ž์€ ES-RNN (Exponential Smoothing Recurrent Neural Network) ์ด์—ˆ์Šต๋‹ˆ๋‹ค.

๋ณธ ๋…ผ๋ฌธ์€ ์ด๋Ÿฐ ์ƒํ™ฉ์—์„œ ํ•ด๋‹น ๋Œ€ํšŒ์— ์ œ์ถœ๋œ ๋ชจ๋ธ๋ณด๋‹ค ์„ฑ๋Šฅ์ด ์ข‹์€ ์ˆœ์ˆ˜ ML ๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ์ธ N-Beats๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ ์ œ์‹œํ•˜๊ณ  ์žˆ๋Š” ๋ชจ๋ธ์˜ ์žฅ์ ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • Deep Neural Architecture
    • ๊ธฐ์กด ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ(M3, M4, TOURISM ๋“ฑ)์— ๋Œ€ํ•ด ํ†ต๊ณ„์  ์ ‘๊ทผ๋ฒ•๋ณด๋‹ค ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ์ˆœ์ˆ˜ DL ๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
  • ํ•ด์„ ๊ฐ€๋Šฅํ•œ ์‹œ๊ณ„์—ด ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ
    • ๊ณ„์ ˆ์„ฑ-์ถ”์„ธ ์ˆ˜์ค€์˜ ์ ‘๊ทผ ๋ฐฉ์‹๊ณผ ๊ฐ™์€ ์ „ํ†ต์ ์ธ ๋ถ„ํ•ด ๊ธฐ๋ฒ•๊ณผ ๋น„์Šทํ•œ ๋ฐฉ์‹์œผ๋กœ ๋ชจ๋ธ์„ ํ•ด์„ ๊ฐ€๋Šฅํ•œ ์•„ํ‚คํ…์ฒ˜๋กœ ์„ค๊ณ„ํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

๋ณธ ๋ชจ๋ธ์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด๊ธฐ ์ „์— ๋ช‡ ๊ฐ€์ง€ ํ‘œ๊ธฐ๋ฒ•(notation)์„ ์งš๊ณ  ๋„˜์–ด๊ฐ€๊ฒ ์Šต๋‹ˆ๋‹ค. ์ด์‚ฐ์  ์‹œ๊ฐ„์— ๋Œ€ํ•œ ๋‹จ๋ณ€๋Ÿ‰ ์˜ˆ์ธก ๋ฌธ์ œ์— ๋Œ€ํ•ด์„œ ๋‹ค์Œ์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.

  • $H$ ๊ธธ์ด๋งŒํผ์˜ ์˜ˆ์ธก ๋ฒ”์œ„
    • $\mathbf{y} = [y_{T+1}, \cdots, y_{T+H}] \in \mathbb{R}^H$
  • $T$ ๊ธธ์ด๋งŒํผ์˜ ๊ณผ๊ฑฐ ์ด๋ ฅ
    • $[y_1, \cdots, y_T] \in \mathbb{R}^T$
  • ๊ธธ์ด $t \leq T$์˜ lookback window
    • $\mathbf{x} = [y_{T-t+1}, \cdots, y_T] \in \mathbb{R}^t$
  • $\mathbf{y}$๋ฅผ ์˜ˆ์ธกํ•œ ๊ฐ’ $\hat{\mathbf{y}}$

N-Beats

N-Beats์˜ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์„ค๊ณ„ํ•  ๋•Œ ๋‹ค์Œ์˜ ํฌ์ธํŠธ๋ฅผ ์ค‘์š”ํ•˜๊ฒŒ ์—ฌ๊ฒผ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

  1. ๊ธฐ๋ณธ ์•„ํ‚คํ…์ฒ˜๋Š” ๋‹จ์ˆœํ•˜๊ณ  ์ผ๋ฐ˜์ ์ด๋˜ ๋†’์€ ํ‘œํ˜„๋ ฅ์„ ๊ฐ–๊ณ  ์žˆ์–ด์•ผ ํ•จ
  2. ์‹œ๊ณ„์—ด์— ํŠนํ™”๋œ ํ”ผ์ฒ˜ ์—”์ง€๋‹ˆ์–ด๋ง์ด๋‚˜ ์ž…๋ ฅ๊ฐ’ ์Šค์ผ€์ผ๋ง ๋“ฑ์— ์˜์กดํ•˜์ง€ ์•Š๋Š” ์•„ํ‚คํ…์ฒ˜์—ฌ์•ผ ํ•จ
  3. ์‚ฌ๋žŒ์ด ๊ฒฐ๊ณผ๋ฅผ ํ•ด์„ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ์•„ํ‚คํ…์ฒ˜์—ฌ์•ผ ํ•จ

์ด๋Ÿฐ ํฌ์ธํŠธ๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ๋Š” N-Beats์˜ ์•„ํ‚คํ…์ฒ˜๋Š” ์•„๋ž˜ ๋‹ค์ด์–ด๊ทธ๋žจ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

N-Beats architecture

Basic Block

The detailed architecture of a basic block.

๊ธฐ๋ณธ ๋ธ”๋ก์˜ ํ˜•ํƒœ๋Š” ์œ„ ์ด๋ฏธ์ง€์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฐ ๊ธฐ๋ณธ ๋ธ”๋ก์„ ์—ฌ๋Ÿฌ ๊ฐœ ์Œ“์•„ ํ•˜๋‚˜์˜ ์Šคํƒ์„ ๋งŒ๋“œ๋Š”๋ฐ, ์ผ๋ฐ˜์ ์ธ ์„ค๋ช…์„ ์œ„ํ•ด $\ell$ ๋ฒˆ์งธ ๋ธ”๋ก์— ๋Œ€ํ•ด ๋‹ค๋ฃจ๊ฒ ์Šต๋‹ˆ๋‹ค.

$\ell$ ๋ฒˆ์งธ ๋ธ”๋ก์— ๋Œ€ํ•˜์—ฌ ํ•ด๋‹น ๋ธ”๋ก์˜ ์ž…๋ ฅ ๋ฒกํ„ฐ์ธ $\mathbf{x}_\ell$ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋งŒ์•ฝ $\ell = 1$ ์ด๋ผ๋ฉด ๋งจ ์ฒ˜์Œ ๋ธ”๋ก์ด๋ฏ€๋กœ $\mathbf{x}_\ell$ ์€ ๋ชจ๋ธ์˜ ์ž…๋ ฅ ๋ฒกํ„ฐ์™€ ๊ฐ™์•„์ง‘๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์šฐ๋ฆฌ๊ฐ€ ์˜ˆ์ธกํ•  ๋ฒ”์œ„์˜ ๊ธธ์ด๋ฅผ $H$ ๋ผ๊ณ  ํ•˜๋ฉด ์ตœ์ดˆ ๋ธ”๋ก์˜ ์ž…๋ ฅ๊ฐ’์ด ๋˜๋Š” ๋ฒกํ„ฐ์˜ ๊ธธ์ด๋Š” ๋ณดํ†ต $2H$ ์—์„œ $7H$ ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰ ์˜ˆ์ธกํ•˜๋Š” ํƒ€์ž„์Šคํƒฌํ”„ ๊ธธ์ด์˜ ๋‘ ๋ฐฐ์—์„œ ์ผ๊ณฑ ๋ฐฐ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅ๊ฐ’์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

ํ•˜์ง€๋งŒ ๋‹ค๋ฅธ ๊ฒฝ์šฐ์—๋Š” ๋ชจ๋‘ ์ด์ „ ๋ธ”๋ก์˜ residual output์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋‘ ๊ฐœ์˜ ์•„์›ƒํ’‹ ๋ฒกํ„ฐ $\hat{\mathbf{x}}_\ell$, $\hat{\mathbf{y}}_\ell$ ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ๊ฐ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • $\hat{\mathbf{x}}_\ell$ : Backcast ์˜ˆ์ธก
    • ์ž…๋ ฅ ๋ฒกํ„ฐ์— ๋Œ€ํ•œ ์˜ˆ์ธก์ž…๋‹ˆ๋‹ค.
  • $\hat{\mathbf{y}}_\ell$ : Forecast ์˜ˆ์ธก
    • ์‹ค์ œ๋กœ ์˜ˆ์ธกํ•  ๋ฒ”์œ„์— ๋Œ€ํ•œ ์˜ˆ์ธก์ž…๋‹ˆ๋‹ค.

์ด๋Ÿฐ ์„ค์ • ์•„๋ž˜ ๊ธฐ๋ณธ ๋ธ”๋ก์€ ๋„ค ๊ฐœ์˜ FC ๋ ˆ์ด์–ด๋ฅผ ๊ฑฐ์ณ์„œ ๋‘ ๊ฐœ์˜ ๋ถ„๊ธฐ๋กœ ๋‚˜๋ˆ ์ง€๋Š”๋ฐ ๊ฐ ๋ถ„๊ธฐ์—์„œ backcast์™€ forecast์— ๋Œ€ํ•œ ์˜ˆ์ธก ๊ณ„์ˆ˜ $\theta_\ell^b$ ์™€ $\theta_\ell^f$๋ฅผ ์–ป๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ๊นŒ์ง€๋ฅผ ์ˆ˜์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • $\mathbf{h}_{\ell, 1} = \text{FC}_{\ell, 1}(\mathbf{x}_\ell)$
  • $\mathbf{h}_{\ell, 2} = \text{FC}_{\ell, 2}(\mathbf{h}_{\ell, 1})$
  • $\mathbf{h}_{\ell, 3} = \text{FC}_{\ell, 3}(\mathbf{h}_{\ell, 2})$
  • $\mathbf{h}_{\ell, 4} = \text{FC}_{\ell, 4}(\mathbf{h}_{\ell, 3})$
  • $\theta_\ell^b = \text{Linear}^b_\ell (\mathbf{h}_{\ell, 4})$
  • $\theta_\ell^f = \text{Linear}^f_\ell (\mathbf{h}_{\ell, 4})$

์—ฌ๊ธฐ์„œ FC ๋Š” fully connected layer์™€ ReLU๋กœ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.

\[\mathbf{h}_{\ell, 1} = \text{ReLU}(\mathbf{W}_{\ell, 1} \mathbf{x}_\ell + \mathbf{b}_{\ell, 1})\]

๋งˆ์ง€๋ง‰์œผ๋กœ ๊ธฐ์ € ๋ ˆ์ด์–ด(basis layer) $g_\ell^b$์™€ $g_\ell^f$๋ฅผ ๊ฑฐ์ณ ๋‹ค์Œ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

\[\hat{\mathbf{y}_\ell} = g_\ell^f(\theta_\ell^f) = \sum_{i=1}^{\text{dim}(\theta_\ell^f)} \theta_{\ell, i}^f \mathbf{v}_i^f, \qquad \hat{\mathbf{x}_\ell} = g_\ell^b(\theta_\ell^b) = \sum_{i=1}^{\text{dim}(\theta_\ell^b)} \theta_{\ell, i}^b \mathbf{v}_i^b\]

์ด๋•Œ ๊ธฐ์ € ๋ ˆ์ด์–ด๋Š” ํ•™์Šต ๊ฐ€๋Šฅํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ์„ค์ •ํ•  ์ˆ˜๋„ ์žˆ๊ณ  ํŠน์ • ํ•จ์ˆ˜ ํ˜•ํƒœ๋กœ ์„ค์ •ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

Doubly Residual Stacking

Doubly residual stacking

์ผ๋ฐ˜์ ์ธ residual connection์€ ์ž…๋ ฅ๊ฐ’์„ ๋ช‡ ๊ฐœ์˜ ๋ ˆ์ด์–ด๋ฅผ ๊ฑด๋„ˆ ๋›ฐ์–ด ๋”ํ•˜๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ๋” ๊นŠ์€ ๊ตฌ์กฐ๋ฅผ ์ž˜ ํ•™์Šตํ•˜๋Š” ์ด์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ํ•ด์„ ๊ฐ€๋Šฅํ•œ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” ๋„์›€์ด ๋˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์ €์ž๋Š” ๊ธฐ์กด ๊ฐ’์„ ๋”ํ•˜๋Š” ๋Œ€์‹  ๋นผ๋Š” ๋ฐฉ์‹์„ ์ฑ„์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. Backcast์—์„œ ๋ธ”๋ก์˜ ์ž…๋ ฅ ๋ฒกํ„ฐ์™€ ํ˜„์žฌ ๋ธ”๋ก์˜ backcast๋ฅผ ๋บ€ residual์„ ๋‹ค์Œ ๋ธ”๋ก์œผ๋กœ ๋„˜๊ฒจ์ฃผ๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.

\[\mathbf{x}_\ell = \mathbf{x}_{\ell-1} - \hat{\mathbf{x}}_{\ell -1}\]

Forecast๋Š” residual connection ์—†์ด ๋งค ๋ธ”๋ก์˜ forecast๋ฅผ ๋”ํ•ฉ๋‹ˆ๋‹ค.

\[\hat{\mathbf{y}} = \sum_\ell \hat{\mathbf{y}}_\ell\]

์ด ๊ตฌ์กฐ๋ฅผ ํ†ตํ•ด ์–ป๋Š” ํšจ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  1. ์ด์ „ ๋ธ”๋ก์ด ์ž…๋ ฅ ๋ฒกํ„ฐ์˜ ์ผ๋ถ€ ์‹œ๊ทธ๋„ $\hat{\mathbf{x}}_{\ell-1}$ ์„ ์ œ๊ฑฐํ•˜์—ฌ ๋ธ”๋ก์˜ ์˜ˆ์ธก ์ž‘์—…์„ ์‰ฝ๊ฒŒ ๋งŒ๋“ค์–ด์ค๋‹ˆ๋‹ค.
  2. Backcast์˜ residual connection ๊ตฌ์กฐ๋กœ ์ธํ•ด ๊ทธ๋ผ๋””์–ธํŠธ๊ฐ€ ๋” ์ž˜ ํ˜๋Ÿฌ ์—ญ์ „ํŒŒ๋ฅผ ์šฉ์ดํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
  3. Forecast์˜ summation connection ๊ตฌ์กฐ๋Š” ๊ณ„์ธต์  ๋ถ„ํ•ด(hierarchical decompostion) ๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
    • $g_\ell^b$์™€ $g_\ell^f$๋กœ ์ธํ•ด ๊ฐ•์ œ๋˜๋Š” ์˜๋„์ ์ธ ๊ตฌ์กฐ๋Š” forecast์˜ ๊ณ„์ธต์  ๋ถ„ํ•ด๊ฐ€ ๋ชจ๋ธ์˜ ํ•ด์„์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ์ค‘์š”ํ•œ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค.

์œ„ ๋‚ด์šฉ๊นŒ์ง€๋ฅผ ํ•˜๋‚˜์˜ ์Šคํƒ์œผ๋กœ ๊ตฌ์„ฑํ•ด์„œ stack residual์€ ๋‹ค์Œ ์Šคํƒ์œผ๋กœ, ๊ฐ ์Šคํƒ์˜ stack forecast๋Š” ๋ชจ๋‘ ํ•ฉํ•ด์„œ global forecast๋ฅผ ์–ป๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. ์ด๋•Œ ํ•™์Šต์€ MSE๋ฅผ ์†์‹ค ํ•จ์ˆ˜๋กœ ํ•ด์„œ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

Interpretability

N-Beats๋Š” $g_\ell^b$์™€ $g_\ell^f$๋ฅผ ์„ค์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋”ฐ๋ผ ๋‘ ๊ฐœ์˜ ์•„ํ‚คํ…์ฒ˜๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค. ์ง€๊ธˆ๊นŒ์ง€ ๋‹ค๋ฃฌ ์ผ๋ฐ˜์ ์ธ ์•„ํ‚คํ…์ฒ˜(Generic architecture)๋Š” ์‹œ๊ณ„์—ด์— ํŠนํ™”๋œ ์ง€์‹์— ์˜์กดํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด์ œ๋ถ€ํ„ฐ ์„ค๋ช…ํ•  ํ•ด์„ ๊ฐ€๋Šฅํ•œ ์•„ํ‚คํ…์ฒ˜(interpretable architecture) ๋Š” ํ•ด์„๋ ฅ์„ ์œ„ํ•ด์„œ ์œ ๋„ ํŽธํ–ฅ(inductive bias)๋ฅผ ์ถ”๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋•Œ ์‹œ๊ณ„์—ด์— ๋Œ€ํ•œ ์ •๋ณด๊ฐ€ ๋“ค์–ด๊ฐ€์ฃ .

์ผ๋ฐ˜์ ์ธ ์•„ํ‚คํ…์ฒ˜๋Š” $g_\ell^b$์™€ $g_\ell^f$๋ฅผ ์ด์ „ ๋ ˆ์ด์–ด ์•„์›ƒํ’‹์˜ linear projection์œผ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

\[\hat{\mathbf{y}}_\ell = \mathbf{V}_\ell^f \theta_\ell^f + \mathbf{b}_\ell^f \qquad \hat{\mathbf{x}}_\ell = \mathbf{V}_\ell^b \theta_\ell^b + \mathbf{b}_\ell^b\]

์ด๋•Œ $\mathbf{V}_\ell^f$ ๋Š” $H \times \dim(\theta_\ell^f)$ ์˜ ์ฐจ์›์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค.

ํ•ด์„ ๊ฐ€๋Šฅํ•œ ์•„ํ‚คํ…์ฒ˜๋Š” $g_\ell^b$์™€ $g_\ell^f$ ๋ฅผ ์–ด๋–ป๊ฒŒ ์„ค์ •ํ•˜๋Š๋ƒ์— ๋”ฐ๋ผ ์ถ”์„ธ ๋ชจ๋ธ(trend model) ๊ณผ ๊ณ„์ ˆ์„ฑ ๋ชจ๋ธ(seasonality model) ๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค.

Trend model

์ถ”์„ธ์˜ ์ผ๋ฐ˜์ ์ธ ํŠน์„ฑ์ด๋ผ๊ณ  ํ•˜๋ฉด ๋‹จ์กฐ์ฆ๊ฐ€ ๋˜๋Š” ๋‹จ์กฐ๊ฐ์†Œํ•˜๋Š” ํ˜•ํƒœ๋ฅผ ๊ฐ–๊ฑฐ๋‚˜ ์ฒœ์ฒœํžˆ ๋ณ€ํ™”ํ•˜๋Š” ํ˜•ํƒœ๋ฅผ ๊ฐ–๋Š”๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฐ ํŠน์„ฑ์„ ๋‚˜ํƒ€๋‚ด๊ธฐ ์œ„ํ•ด์„œ ์ €์ž๋Š” ์ž‘์€ ์ฐจ์ˆ˜์˜ ๋‹คํ•ญํ•จ์ˆ˜ ํ˜•ํƒœ๋ฅผ ์ฐจ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

\[\hat{\mathbf{y}}_{s, \ell} = \sum^p_{i=0} \theta_{s, \ell, i}^f t^i \quad \text{where } \mathbf{t} = [0, 1, 2, \cdots, H-2, H-1]^T/H\]

ํ–‰๋ ฌ์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

\[\hat{\mathbf{y}}_{s, \ell}^{tr} = \mathbf{T}\theta_{s, \ell}^f \quad \text{where } \mathbf{T} = [\mathbf{1}, \mathbf{t}, \cdots, \mathbf{t}^p]\]

์ดํ•ด๋ฅผ ๋•๊ธฐ ์œ„ํ•ด ์œ„ ๊ทธ๋ฆผ์„ ์ฐธ๊ณ ํ•˜์ž๋ฉด $g^b$ ์™€ $g^f$ ๋ฅผ ํŠน์ • ํ–‰๋ ฌ ํ˜•ํƒœ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ํ–‰์€ backcast ๋˜๋Š” forecast์˜ time step์„ ๋‚˜ํƒ€๋‚ด๊ณ  ๊ฐ ์—ด์€ ๋‹คํ•ญํ•จ์ˆ˜์˜ ์ฐจ์ˆ˜๋งŒํผ์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. $p$๋ฅผ ์ ๋‹นํ•˜๊ฒŒ ์ž‘๊ฒŒ ์„ค์ •ํ•˜๋ฉด $\hat{\mathbf{y}}_{s, \ell}^{tr}$์€ ์ถ”์„ธ๋ฅผ ๋”ฐ๋ผ๊ฐ€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

Seasonality Model

๊ณ„์ ˆ์„ฑ์€ ๊ทœ์น™์ ์ด๊ณ  ์ฃผ๊ธฐ์ ์ด๋ฉฐ ๋ฐ˜๋ณต์ ์ธ ๋ณ€๋™์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฐ ํŠน์„ฑ์„ ๋‚˜ํƒ€๋‚ด๊ธฐ ์œ„ํ•ด ์ฃผ๊ธฐ ํ•จ์ˆ˜๋ฅผ ์ฐจ์šฉํ–ˆ๋Š”๋ฐ์š”. ๊ฐ€์žฅ ์ ์ ˆํ•œ ์„ ํƒ์€ ์—ฌ๋Ÿฌ๋ชจ๋กœ ํ‘ธ๋ฆฌ์— ๊ธ‰์ˆ˜์ž…๋‹ˆ๋‹ค.

\[\hat{\mathbf{y}}_{s, \ell} = \sum^{\lfloor H/2 - 1 \rfloor}_{i=0} \theta_{s, \ell, i}^f \cos(2\pi it) + \theta^f_{s, \ell, i+\lfloor H/2 \rfloor} \sin(2\pi it)\]

ํ–‰๋ ฌ์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

\[\begin{aligned} &\hat{\mathbf{y}}_{s, \ell}^{seas} = \mathbf{S}_{s, \ell}^f \\ & \quad \text{where } \mathbf{S} = [\mathbf{1}, \cos(2\pi\mathbf{t}), \cdots, \cos(2\pi\lfloor H/2-1 \rfloor \mathbf{t}), \sin(2\pi\mathbf{t}), \cdots, \sin(2\pi \lfloor H/2 -1 \rfloor \mathbf{t})] \end{aligned}\]

๋งˆ์ง€๋ง‰์œผ๋กœ ์ถ”์„ธ ๋ชจ๋ธ๊ณผ ๊ณ„์ ˆ์„ฑ ๋ชจ๋ธ์„ ์•„๋ž˜ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ ๋ถ™์—ฌ์ฃผ๋ฉด ๋ฉ๋‹ˆ๋‹ค. ๊ฐ ์ถ”์„ธ ๋ธ”๋ก๊ณผ ๊ณ„์ ˆ์„ฑ ๋ธ”๋ก์œผ๋กœ ์Šคํƒ์„ ๊ตฌ์„ฑํ•˜๋ฉฐ, ๊ฐ ๋ธ”๋ก์€ ์ผ๋ฐ˜์ ์ธ ์•„ํ‚คํ…์ฒ˜์™€ ๋™์ผํ•˜๊ฒŒ residual connection์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

Implementation

N-Beats๋ฅผ ๊ตฌํ˜„ํ•œ ์ฝ”๋“œ๋Š” ๋‹ค์Œ ์ €์žฅ์†Œ์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ ์‹คํ—˜ ์ฝ”๋“œ๋„ ํฌํ•จํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ๋…ผ๋ฌธ๊ณผ ๋™์ผํ•œ ๊ฒฐ๊ณผ๋ฅผ ์–ป๊ธฐ ์œ„ํ•œ ๊ฐ€์ด๋“œ ์—ญ์‹œ ์ˆ˜๋ก๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹จ, ์ผ๋ฐ˜์ ์ธ ์•„ํ‚คํ…์ฒ˜๋Š” ๊ตฌํ˜„ํ•˜๊ธฐ ์‰ฝ์ง€๋งŒ ํ•ด์„ ๊ฐ€๋Šฅํ•œ ์•„ํ‚คํ…์ฒ˜๋Š” ๊ตฌํ˜„๋„ ๊นŒ๋‹ค๋กญ๊ณ  ํ•ด์„๋„ ์‰ฝ์ง€ ์•Š์Šต๋‹ˆ๋‹ค.



This post is licensed under CC BY 4.0 by the author.