SlideShare ist ein Scribd-Unternehmen logo
1 von 34
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cyrus Vahid - Principal Architect – AWS Deep Learning
Amazon Web Services
Multivariate Time Series
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Autoregressive Models
• Hyndman[1] defines autoregressive models as:
’’ In an autoregression model, we forecast the variable of
interest using a linear combination of past values of the
variable. The term autoregression indicates that it is a
regression of the variable against itself.’’
• AR(p) model:
𝑦𝑡 = 𝑐 + 𝜙1 𝑦𝑡−1 + 𝜙𝑦𝑡−2 + … + 𝜙𝑦𝑡−𝑝 + 𝑒𝑡
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Auto Regressive Models
𝑦𝑡 = 18 − 0.8𝑦𝑡−1 + 𝑒𝑡 𝑦𝑡 = 8 + 1.3𝑦𝑡 − 1 − 0.7 𝑦𝑡−2 − 2 + 𝑒𝑡
• Autoregressive models are remarkably flexible at handling a wide range of
different time series patterns.
𝑟𝑒𝑓: 𝐻𝑦𝑛𝑑𝑚𝑎𝑛 [1]
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges faced by existing models
• Most methods are designed to forecasting individual
series or small groups. New set of problems have
emerged:
• Forecasting a large number of individual or grouped time
series.
• Trying to learn a global model facing the difficulty of dealing
with scale of different time-series that would otherwise be
related.
• Many older models cannot account for environmental inputs.
• Cold start problem for new items to be included in the forecast.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Goal
• Ability to learn and generalized from similar series
provides us with the ability to learn more complex models
without overfitting.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DeepAR
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Solution
• DeepAR is a forecasting model based on autoregressive
RNNs, which learns a global model from historical data of
all time series in all datasets.[2]
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DeepAR Advantages
• Minimal manual feature engineering
• Ability to provide forecast for series with little or no history.
• Ability to incorporate a wide range of likelihood models.
• Provides consistent estimates for subgroups.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DeepAR Model
• Goal: Given observed values of a series 𝑖 for 𝑡 time-steps, estimating
probability distribution of next 𝑇 steps; more formally, modeling the below
conditional distribution is the goal: 𝑃 𝑧𝑖,𝑡0:𝑇 𝑧𝑖,1:𝑡0
, 𝑥𝑖,1:𝑇
• Parameterized by output of an AR RNN.
𝑄Θ 𝑧𝑖,𝑡0:𝑇 𝑧𝑖,1:𝑡0
, 𝑥𝑖,1:𝑇 =
𝑡=𝑡0
𝑇
𝑄Θ 𝓏𝑖,𝑡 𝑧𝑖,1:𝑡−1, 𝑥𝑖,1:𝑇 =
𝑡=𝑡0
𝑇
ℓ(𝓏𝑖,𝑡|𝜃(𝒉𝑖,𝑡, Θ))
𝒉𝑖,𝑡 = h(𝒉𝑖,𝑡−1, 𝓏𝑖,𝑡−1, 𝑥𝑖,𝑡, Θ)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DeepAR Architecture
• DeepAR is an encoder decode architecture, taking a
number of input steps, output from encoder, and
covariates, and predicts for the number of steps indicated
as horizon.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Likelihood Model – Gaussian
• Gaussian likelihood for real-valued Data
ℓ 𝐺 𝓏 𝜇, 𝜎 = 2𝜋𝜎2 −
1
2 𝑒
−
𝓏−𝜇 2
2𝜎2
𝜇 𝒉𝑖,𝑡 = 𝑤𝜇
𝑇 𝒉𝑖,𝑡 + 𝑏 𝜇
𝜎 𝒉𝑖,𝑡 = log 1 + 𝑒 𝑤 𝜇
𝑇 𝒉𝑖,𝑡+𝑏 𝜎
Softplus activation
Network output
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Likelihood Model – Negative Bionomial
• Negative-binomial likelihood for positive count data. The
Negative Binomial distribution is the distribution that
underlies the stochasticity in over-dispersed count data.[3]
ℓ 𝑁𝐵 𝓏 𝜇, 𝛼 =
Γ 𝓏 +
1
𝛼
Γ 𝓏 + 1 Γ
1
𝛼
1
1 + 𝛼𝜇
1
𝛼 𝛼𝜇
1 + 𝛼𝜇
𝓏
𝜇 𝒉𝑖,𝑡 = log 1 + 𝑒 𝑤 𝜇
𝑇 𝒉𝑖,𝑡+𝑏 𝜇
𝛼 𝒉𝑖,𝑡 = log 1 + 𝑒 𝑤 𝛼
𝑇 𝒉𝑖,𝑡+𝑏 𝛼
• 𝜇 𝑎𝑛𝑑 𝛼𝑎𝑟𝑒 𝑏𝑜𝑡ℎ 𝑜𝑢𝑡𝑝𝑢𝑡 𝑜𝑓 𝑎
𝑑𝑒𝑛𝑠𝑒 𝑙𝑎𝑦𝑒𝑟 𝑤𝑖𝑡ℎ
𝑠𝑜𝑓𝑡𝑝𝑙𝑢𝑠 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛
• 𝛼 𝑠𝑐𝑎𝑙𝑒𝑠 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑡𝑜
𝑡ℎ𝑒 𝑚𝑒𝑎𝑛
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scaling
• Non-linearity results in loss of scale.
• Solution:
• Dividing AR inputs by item-dependent scale factor.
• Multiplying scale-dependent likelihood by the same factor.
• 𝑣𝑖 = 1 +
1
𝑡0
𝑡=1
𝑡0
𝓏𝑖,𝑡
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Comparison
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Code
https://github.com/awslabs/amazon-sagemaker-
examples/blob/master/introduction_to_amazon_algorithms/deepar_electricity/DeepAR-
Electricity.ipynb
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
LSTNet
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenge
• Autoregressive models may fail to capture mixture of long
and short term patterns.`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Solution – LSTNet[4]
• Long and Short Terms Time-series Networks is designed
to capture mix long- and short-term patterns in data for
multivariate time-series.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Concept
• Using CNN to discover local dependencies
• RNNs to capture long-term dependencies
• Autoregressive model to handle scale.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Problem Formulation
• Given 𝑌 = 𝑦1, 𝑦2, … , 𝑦 𝑇 where 𝑦𝑡 𝜖ℝ 𝑛
and 𝑛 is the variable
dimension, the aim is to predict 𝑦 𝑇+ℎ, and h is the horizon.
• Similarly, given 𝑌 = 𝑦1, 𝑦2, … , 𝑦 𝑇+1 , we want to predict
𝑦 𝑇+1+ ℎ
• The input matrix is denoted as 𝑋 = 𝑦1, 𝑦2, … , 𝑦 𝑇 𝜖ℝ 𝑛×𝑇
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Architecture
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Convolutional Component
• Extract short-term patterns in the time dimension as well
as local dependencies between variables.
• Multiple filters of width 𝜔 and height 𝑛 = 𝑛𝑢𝑚_𝑣𝑎𝑟
• ℎ 𝑘 = 𝑅𝐸𝐿𝑈(𝑊𝑘 ∗ 𝑋 + 𝑏 𝑘)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Recurrent Component
• The output of the Conv layer is simultaneously fed to
Recurrent and Recurrent-skip layers (next slide).
• RNN component is GRU layer with RELU activation.*
𝑟𝑡 = 𝜎 𝑥 𝑡 𝑊𝑥𝑟 + ℎ 𝑡−1 𝑊ℎ𝑟 + 𝑏 𝑟
𝑢 𝑡 = 𝜎 𝑥 𝑡 𝑊𝑥𝑢 + ℎ 𝑡−1 𝑊ℎ𝑢 + 𝑏 𝑢
𝑐𝑡 = 𝑅𝐸𝐿𝑈 𝑥 𝑡 𝑊𝑥𝑐 + 𝑟𝑡 ⊙ (ℎ 𝑡−1 𝑊𝑐𝑟) + 𝑏 𝑐
ℎ 𝑡 = 1 − 𝑢 𝑡 ⊙ ℎ 𝑡−1 + 𝑢 𝑡 ⊙ 𝑐𝑡
* The implementation of the paper is using tanh, but the authors claim is that RELU performs better than tanh
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Recurrent-skip Component
• Recurrent skip component is a recurrent layer that
captures lagged long-term dependencies according to the
appropriate lag. For instance hourly electricity
consumption would have a lag of 24 time steps.
𝑟𝑡 = 𝜎 𝑥 𝑡 𝑊𝑥𝑟 + ℎ 𝑡−𝑝 𝑊ℎ𝑟 + 𝑏 𝑟
𝑢 𝑡 = 𝜎 𝑥 𝑡 𝑊𝑥𝑢 + ℎ 𝑡−𝑝 𝑊ℎ𝑢 + 𝑏 𝑢
𝑐𝑡 = 𝑅𝐸𝐿𝑈 𝑥 𝑡 𝑊𝑥𝑐 + 𝑟𝑡 ⊙ (ℎ 𝑡−𝑝 𝑊𝑐𝑟) + 𝑏 𝑐
ℎ 𝑡 = 1 − 𝑢 𝑡 ⊙ ℎ 𝑡−𝑝 + 𝑢 𝑡 ⊙ 𝑐𝑡
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Combining Recurrent and Recurrent-skip Outputs
• A Dense layer combines the output from recurrent layers.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Temporal Attention Layer
• In case of non-seasonal data skip step p is not useful.
• In such cases an attention mechanism is used, which
learns the weighted combination of hidden
representations at each window position of the input
matrix.
𝛼 𝑡 = 𝐴𝑡𝑡𝑛𝑆𝑐𝑜𝑟𝑒 𝐻 𝑇
𝑅
, ℎ 𝑇−1
𝑅
; 𝛼 𝑡 𝜖ℝ 𝑞
: 𝐴𝑡𝑡𝑛. 𝑤𝑒𝑖𝑔ℎ𝑡𝑠
𝐻 𝑇
𝑅
= ℎ 𝑡−𝑞
𝑅
, … , ℎ 𝑡−1
𝑅
: 𝑠𝑡𝑎𝑐𝑘𝑖𝑛𝑔 ℎ𝑖𝑑𝑑𝑒𝑛 𝑠𝑡𝑎𝑡𝑒𝑠 𝑐𝑜𝑙𝑢𝑚𝑛 − 𝑤𝑖𝑠𝑒𝑙𝑦
𝑐𝑡 = 𝐻𝑡 𝛼 𝑡: context vector
ℎ 𝑡
𝐷
= 𝑊 𝑐𝑡; ℎ 𝑡−1
𝑅
+ 𝑏: 𝑜𝑢𝑡𝑝𝑢𝑡 𝑖𝑠 𝑐𝑜𝑛𝑐𝑎𝑡𝑖𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑐 𝑎𝑛𝑑 𝑙𝑎𝑠𝑡 𝑤𝑖𝑛𝑑𝑜𝑤 ℎ𝑖𝑑𝑑𝑒𝑛 𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑎𝑡𝑖𝑛𝑜
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Autoregressive Component
• ARC overcomes loss of scale, cased by DNN non-
linearity.
• ARC is a linear AR.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Final Output
• Final output is obtained by integrating AR and DNN
outputs.
𝑌𝑡 = ℎ 𝑡
𝐷
+ ℎ 𝑡
𝐿
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Objective Function
• The paper suggests using either L1 or L2 loss function.
𝐹: 𝐹𝑟𝑜𝑏𝑒𝑛𝑖𝑜𝑢𝑠 𝑁𝑜𝑟𝑚: 𝐴 𝐹
=
𝑖=1
𝑚
𝑗=1
𝑛
|𝑎𝑖𝑗|2
ℎ: ℎ𝑜𝑟𝑖𝑧𝑜𝑛
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Metrics
• Root Relative Squared Error (RSE): We want lower error.
• Empirical Correlation Coefficient (CORR): We want higher
correlation.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Comparison
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Code
https://github.com/safrooze/LSTNet-Gluon
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
References
1. Forecasting: Principles and Practice – Rob J Hyndman, George Athanasopoulos https://www.otexts.org/fpp/8/3
2. DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks - Valentin Flunkert , David Salinas , Jan
Gasthaus. https://arxiv.org/abs/1704.04110
3. http://sherrytowers.com/2014/07/11/negative-binomial-likelihood/
4. Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks, Guokun Lai et. Al
https://arxiv.org/pdf/1703.07015.pdf
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Weitere ähnliche Inhalte

Was ist angesagt?

Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn
 
Activation functions
Activation functionsActivation functions
Activation functionsPRATEEK SAHU
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Databricks
 
Glove global vectors for word representation
Glove global vectors for word representationGlove global vectors for word representation
Glove global vectors for word representationhyunyoung Lee
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion ModelsSangwoo Mo
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...Jinwon Lee
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term MemoryYan Xu
 
Multivariate Time Series
Multivariate Time SeriesMultivariate Time Series
Multivariate Time SeriesApache MXNet
 
Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++Dongheon Lee
 
Recent Progress in RNN and NLP
Recent Progress in RNN and NLPRecent Progress in RNN and NLP
Recent Progress in RNN and NLPhytae
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)Shuntaro Yada
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You NeedDaiki Tanaka
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksJinwon Lee
 
Time series forecasting with machine learning
Time series forecasting with machine learningTime series forecasting with machine learning
Time series forecasting with machine learningDr Wei Liu
 

Was ist angesagt? (20)

Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
 
Activation functions
Activation functionsActivation functions
Activation functions
 
Cnn
CnnCnn
Cnn
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
 
Rnn & Lstm
Rnn & LstmRnn & Lstm
Rnn & Lstm
 
Glove global vectors for word representation
Glove global vectors for word representationGlove global vectors for word representation
Glove global vectors for word representation
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term Memory
 
Multivariate Time Series
Multivariate Time SeriesMultivariate Time Series
Multivariate Time Series
 
Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++
 
Recent Progress in RNN and NLP
Recent Progress in RNN and NLPRecent Progress in RNN and NLP
Recent Progress in RNN and NLP
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Word embedding
Word embedding Word embedding
Word embedding
 
Time series deep learning
Time series   deep learningTime series   deep learning
Time series deep learning
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
 
Time series forecasting with machine learning
Time series forecasting with machine learningTime series forecasting with machine learning
Time series forecasting with machine learning
 

Ähnlich wie AWS Deep Learning Models for Time Series Forecasting

Automatic Model Tuning Using Amazon SageMaker (AIM412) - AWS re:Invent 2018
Automatic Model Tuning Using Amazon SageMaker (AIM412) - AWS re:Invent 2018Automatic Model Tuning Using Amazon SageMaker (AIM412) - AWS re:Invent 2018
Automatic Model Tuning Using Amazon SageMaker (AIM412) - AWS re:Invent 2018Amazon Web Services
 
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...Amazon Web Services
 
Achieving Global Consistency Using AWS CloudFormation StackSets - AWS Online ...
Achieving Global Consistency Using AWS CloudFormation StackSets - AWS Online ...Achieving Global Consistency Using AWS CloudFormation StackSets - AWS Online ...
Achieving Global Consistency Using AWS CloudFormation StackSets - AWS Online ...Amazon Web Services
 
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...Amazon Web Services
 
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...Amazon Web Services
 
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...Athens Big Data
 
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...Amazon Web Services
 
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Amazon Web Services
 
Using Java to deploy Deep Learning models with MXNet
Using Java to deploy Deep Learning models with MXNetUsing Java to deploy Deep Learning models with MXNet
Using Java to deploy Deep Learning models with MXNetApache MXNet
 
Deep Learning in Java with Apache MXNet
Deep Learning in Java with Apache MXNetDeep Learning in Java with Apache MXNet
Deep Learning in Java with Apache MXNetQing Lan
 
Deep Dive on Amazon Neptune (DAT403) - AWS re:Invent 2018
Deep Dive on Amazon Neptune (DAT403) - AWS re:Invent 2018Deep Dive on Amazon Neptune (DAT403) - AWS re:Invent 2018
Deep Dive on Amazon Neptune (DAT403) - AWS re:Invent 2018Amazon Web Services
 
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...Amazon Web Services
 
Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - ...
Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - ...Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - ...
Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - ...Amazon Web Services
 
Optimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceOptimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceAmazon Web Services
 
Building Applications with Apache MXNet
Building Applications with Apache MXNetBuilding Applications with Apache MXNet
Building Applications with Apache MXNetApache MXNet
 
How Intuit TurboTax Ran Entirely on AWS for 2017 Taxes (ARC307) - AWS re:Inve...
How Intuit TurboTax Ran Entirely on AWS for 2017 Taxes (ARC307) - AWS re:Inve...How Intuit TurboTax Ran Entirely on AWS for 2017 Taxes (ARC307) - AWS re:Inve...
How Intuit TurboTax Ran Entirely on AWS for 2017 Taxes (ARC307) - AWS re:Inve...Amazon Web Services
 
[REPEAT] Deep Learning for Developers: An Introduction, Featuring Samsung SDS...
[REPEAT] Deep Learning for Developers: An Introduction, Featuring Samsung SDS...[REPEAT] Deep Learning for Developers: An Introduction, Featuring Samsung SDS...
[REPEAT] Deep Learning for Developers: An Introduction, Featuring Samsung SDS...Amazon Web Services
 
Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) -...
Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) -...Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) -...
Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) -...Amazon Web Services
 
Cost-Effectively Running Distributed Systems at Scale in the Cloud (CMP349) -...
Cost-Effectively Running Distributed Systems at Scale in the Cloud (CMP349) -...Cost-Effectively Running Distributed Systems at Scale in the Cloud (CMP349) -...
Cost-Effectively Running Distributed Systems at Scale in the Cloud (CMP349) -...Amazon Web Services
 

Ähnlich wie AWS Deep Learning Models for Time Series Forecasting (20)

Automatic Model Tuning Using Amazon SageMaker (AIM412) - AWS re:Invent 2018
Automatic Model Tuning Using Amazon SageMaker (AIM412) - AWS re:Invent 2018Automatic Model Tuning Using Amazon SageMaker (AIM412) - AWS re:Invent 2018
Automatic Model Tuning Using Amazon SageMaker (AIM412) - AWS re:Invent 2018
 
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
 
Achieving Global Consistency Using AWS CloudFormation StackSets - AWS Online ...
Achieving Global Consistency Using AWS CloudFormation StackSets - AWS Online ...Achieving Global Consistency Using AWS CloudFormation StackSets - AWS Online ...
Achieving Global Consistency Using AWS CloudFormation StackSets - AWS Online ...
 
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...
 
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
 
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
 
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
 
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
 
Using Java to deploy Deep Learning models with MXNet
Using Java to deploy Deep Learning models with MXNetUsing Java to deploy Deep Learning models with MXNet
Using Java to deploy Deep Learning models with MXNet
 
Deep Learning in Java with Apache MXNet
Deep Learning in Java with Apache MXNetDeep Learning in Java with Apache MXNet
Deep Learning in Java with Apache MXNet
 
Deep Dive on Amazon Neptune (DAT403) - AWS re:Invent 2018
Deep Dive on Amazon Neptune (DAT403) - AWS re:Invent 2018Deep Dive on Amazon Neptune (DAT403) - AWS re:Invent 2018
Deep Dive on Amazon Neptune (DAT403) - AWS re:Invent 2018
 
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
 
Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - ...
Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - ...Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - ...
Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - ...
 
Deep Learning with MXNet
Deep Learning with MXNetDeep Learning with MXNet
Deep Learning with MXNet
 
Optimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceOptimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak Performance
 
Building Applications with Apache MXNet
Building Applications with Apache MXNetBuilding Applications with Apache MXNet
Building Applications with Apache MXNet
 
How Intuit TurboTax Ran Entirely on AWS for 2017 Taxes (ARC307) - AWS re:Inve...
How Intuit TurboTax Ran Entirely on AWS for 2017 Taxes (ARC307) - AWS re:Inve...How Intuit TurboTax Ran Entirely on AWS for 2017 Taxes (ARC307) - AWS re:Inve...
How Intuit TurboTax Ran Entirely on AWS for 2017 Taxes (ARC307) - AWS re:Inve...
 
[REPEAT] Deep Learning for Developers: An Introduction, Featuring Samsung SDS...
[REPEAT] Deep Learning for Developers: An Introduction, Featuring Samsung SDS...[REPEAT] Deep Learning for Developers: An Introduction, Featuring Samsung SDS...
[REPEAT] Deep Learning for Developers: An Introduction, Featuring Samsung SDS...
 
Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) -...
Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) -...Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) -...
Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) -...
 
Cost-Effectively Running Distributed Systems at Scale in the Cloud (CMP349) -...
Cost-Effectively Running Distributed Systems at Scale in the Cloud (CMP349) -...Cost-Effectively Running Distributed Systems at Scale in the Cloud (CMP349) -...
Cost-Effectively Running Distributed Systems at Scale in the Cloud (CMP349) -...
 

Kürzlich hochgeladen

Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGiovaniTrinidad
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPRPirithiRaju
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxPayal Shrivastava
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfSubhamKumar3239
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
projectile motion, impulse and moment
projectile  motion, impulse  and  momentprojectile  motion, impulse  and  moment
projectile motion, impulse and momentdonamiaquintan2
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxtuking87
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsDobusch Leonhard
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxpriyankatabhane
 
How we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptxHow we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptxJosielynTars
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxzaydmeerab121
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfGABYFIORELAMALPARTID1
 
Explainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosExplainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosZachary Labe
 

Kürzlich hochgeladen (20)

Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptx
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptx
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdf
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
projectile motion, impulse and moment
projectile  motion, impulse  and  momentprojectile  motion, impulse  and  moment
projectile motion, impulse and moment
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and Pitfalls
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
 
How we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptxHow we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptx
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptx
 
Interferons.pptx.
Interferons.pptx.Interferons.pptx.
Interferons.pptx.
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
 
Explainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosExplainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenarios
 

AWS Deep Learning Models for Time Series Forecasting

  • 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cyrus Vahid - Principal Architect – AWS Deep Learning Amazon Web Services Multivariate Time Series
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Autoregressive Models • Hyndman[1] defines autoregressive models as: ’’ In an autoregression model, we forecast the variable of interest using a linear combination of past values of the variable. The term autoregression indicates that it is a regression of the variable against itself.’’ • AR(p) model: 𝑦𝑡 = 𝑐 + 𝜙1 𝑦𝑡−1 + 𝜙𝑦𝑡−2 + … + 𝜙𝑦𝑡−𝑝 + 𝑒𝑡
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Auto Regressive Models 𝑦𝑡 = 18 − 0.8𝑦𝑡−1 + 𝑒𝑡 𝑦𝑡 = 8 + 1.3𝑦𝑡 − 1 − 0.7 𝑦𝑡−2 − 2 + 𝑒𝑡 • Autoregressive models are remarkably flexible at handling a wide range of different time series patterns. 𝑟𝑒𝑓: 𝐻𝑦𝑛𝑑𝑚𝑎𝑛 [1]
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenges faced by existing models • Most methods are designed to forecasting individual series or small groups. New set of problems have emerged: • Forecasting a large number of individual or grouped time series. • Trying to learn a global model facing the difficulty of dealing with scale of different time-series that would otherwise be related. • Many older models cannot account for environmental inputs. • Cold start problem for new items to be included in the forecast.
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Goal • Ability to learn and generalized from similar series provides us with the ability to learn more complex models without overfitting.
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. DeepAR
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Solution • DeepAR is a forecasting model based on autoregressive RNNs, which learns a global model from historical data of all time series in all datasets.[2]
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. DeepAR Advantages • Minimal manual feature engineering • Ability to provide forecast for series with little or no history. • Ability to incorporate a wide range of likelihood models. • Provides consistent estimates for subgroups.
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. DeepAR Model • Goal: Given observed values of a series 𝑖 for 𝑡 time-steps, estimating probability distribution of next 𝑇 steps; more formally, modeling the below conditional distribution is the goal: 𝑃 𝑧𝑖,𝑡0:𝑇 𝑧𝑖,1:𝑡0 , 𝑥𝑖,1:𝑇 • Parameterized by output of an AR RNN. 𝑄Θ 𝑧𝑖,𝑡0:𝑇 𝑧𝑖,1:𝑡0 , 𝑥𝑖,1:𝑇 = 𝑡=𝑡0 𝑇 𝑄Θ 𝓏𝑖,𝑡 𝑧𝑖,1:𝑡−1, 𝑥𝑖,1:𝑇 = 𝑡=𝑡0 𝑇 ℓ(𝓏𝑖,𝑡|𝜃(𝒉𝑖,𝑡, Θ)) 𝒉𝑖,𝑡 = h(𝒉𝑖,𝑡−1, 𝓏𝑖,𝑡−1, 𝑥𝑖,𝑡, Θ)
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. DeepAR Architecture • DeepAR is an encoder decode architecture, taking a number of input steps, output from encoder, and covariates, and predicts for the number of steps indicated as horizon.
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Likelihood Model – Gaussian • Gaussian likelihood for real-valued Data ℓ 𝐺 𝓏 𝜇, 𝜎 = 2𝜋𝜎2 − 1 2 𝑒 − 𝓏−𝜇 2 2𝜎2 𝜇 𝒉𝑖,𝑡 = 𝑤𝜇 𝑇 𝒉𝑖,𝑡 + 𝑏 𝜇 𝜎 𝒉𝑖,𝑡 = log 1 + 𝑒 𝑤 𝜇 𝑇 𝒉𝑖,𝑡+𝑏 𝜎 Softplus activation Network output
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Likelihood Model – Negative Bionomial • Negative-binomial likelihood for positive count data. The Negative Binomial distribution is the distribution that underlies the stochasticity in over-dispersed count data.[3] ℓ 𝑁𝐵 𝓏 𝜇, 𝛼 = Γ 𝓏 + 1 𝛼 Γ 𝓏 + 1 Γ 1 𝛼 1 1 + 𝛼𝜇 1 𝛼 𝛼𝜇 1 + 𝛼𝜇 𝓏 𝜇 𝒉𝑖,𝑡 = log 1 + 𝑒 𝑤 𝜇 𝑇 𝒉𝑖,𝑡+𝑏 𝜇 𝛼 𝒉𝑖,𝑡 = log 1 + 𝑒 𝑤 𝛼 𝑇 𝒉𝑖,𝑡+𝑏 𝛼 • 𝜇 𝑎𝑛𝑑 𝛼𝑎𝑟𝑒 𝑏𝑜𝑡ℎ 𝑜𝑢𝑡𝑝𝑢𝑡 𝑜𝑓 𝑎 𝑑𝑒𝑛𝑠𝑒 𝑙𝑎𝑦𝑒𝑟 𝑤𝑖𝑡ℎ 𝑠𝑜𝑓𝑡𝑝𝑙𝑢𝑠 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛 • 𝛼 𝑠𝑐𝑎𝑙𝑒𝑠 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑡𝑜 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Scaling • Non-linearity results in loss of scale. • Solution: • Dividing AR inputs by item-dependent scale factor. • Multiplying scale-dependent likelihood by the same factor. • 𝑣𝑖 = 1 + 1 𝑡0 𝑡=1 𝑡0 𝓏𝑖,𝑡
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Comparison
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Code https://github.com/awslabs/amazon-sagemaker- examples/blob/master/introduction_to_amazon_algorithms/deepar_electricity/DeepAR- Electricity.ipynb
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. LSTNet
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenge • Autoregressive models may fail to capture mixture of long and short term patterns.` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Solution – LSTNet[4] • Long and Short Terms Time-series Networks is designed to capture mix long- and short-term patterns in data for multivariate time-series.
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Concept • Using CNN to discover local dependencies • RNNs to capture long-term dependencies • Autoregressive model to handle scale.
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Problem Formulation • Given 𝑌 = 𝑦1, 𝑦2, … , 𝑦 𝑇 where 𝑦𝑡 𝜖ℝ 𝑛 and 𝑛 is the variable dimension, the aim is to predict 𝑦 𝑇+ℎ, and h is the horizon. • Similarly, given 𝑌 = 𝑦1, 𝑦2, … , 𝑦 𝑇+1 , we want to predict 𝑦 𝑇+1+ ℎ • The input matrix is denoted as 𝑋 = 𝑦1, 𝑦2, … , 𝑦 𝑇 𝜖ℝ 𝑛×𝑇
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Architecture
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Convolutional Component • Extract short-term patterns in the time dimension as well as local dependencies between variables. • Multiple filters of width 𝜔 and height 𝑛 = 𝑛𝑢𝑚_𝑣𝑎𝑟 • ℎ 𝑘 = 𝑅𝐸𝐿𝑈(𝑊𝑘 ∗ 𝑋 + 𝑏 𝑘)
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Recurrent Component • The output of the Conv layer is simultaneously fed to Recurrent and Recurrent-skip layers (next slide). • RNN component is GRU layer with RELU activation.* 𝑟𝑡 = 𝜎 𝑥 𝑡 𝑊𝑥𝑟 + ℎ 𝑡−1 𝑊ℎ𝑟 + 𝑏 𝑟 𝑢 𝑡 = 𝜎 𝑥 𝑡 𝑊𝑥𝑢 + ℎ 𝑡−1 𝑊ℎ𝑢 + 𝑏 𝑢 𝑐𝑡 = 𝑅𝐸𝐿𝑈 𝑥 𝑡 𝑊𝑥𝑐 + 𝑟𝑡 ⊙ (ℎ 𝑡−1 𝑊𝑐𝑟) + 𝑏 𝑐 ℎ 𝑡 = 1 − 𝑢 𝑡 ⊙ ℎ 𝑡−1 + 𝑢 𝑡 ⊙ 𝑐𝑡 * The implementation of the paper is using tanh, but the authors claim is that RELU performs better than tanh
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Recurrent-skip Component • Recurrent skip component is a recurrent layer that captures lagged long-term dependencies according to the appropriate lag. For instance hourly electricity consumption would have a lag of 24 time steps. 𝑟𝑡 = 𝜎 𝑥 𝑡 𝑊𝑥𝑟 + ℎ 𝑡−𝑝 𝑊ℎ𝑟 + 𝑏 𝑟 𝑢 𝑡 = 𝜎 𝑥 𝑡 𝑊𝑥𝑢 + ℎ 𝑡−𝑝 𝑊ℎ𝑢 + 𝑏 𝑢 𝑐𝑡 = 𝑅𝐸𝐿𝑈 𝑥 𝑡 𝑊𝑥𝑐 + 𝑟𝑡 ⊙ (ℎ 𝑡−𝑝 𝑊𝑐𝑟) + 𝑏 𝑐 ℎ 𝑡 = 1 − 𝑢 𝑡 ⊙ ℎ 𝑡−𝑝 + 𝑢 𝑡 ⊙ 𝑐𝑡
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Combining Recurrent and Recurrent-skip Outputs • A Dense layer combines the output from recurrent layers.
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Temporal Attention Layer • In case of non-seasonal data skip step p is not useful. • In such cases an attention mechanism is used, which learns the weighted combination of hidden representations at each window position of the input matrix. 𝛼 𝑡 = 𝐴𝑡𝑡𝑛𝑆𝑐𝑜𝑟𝑒 𝐻 𝑇 𝑅 , ℎ 𝑇−1 𝑅 ; 𝛼 𝑡 𝜖ℝ 𝑞 : 𝐴𝑡𝑡𝑛. 𝑤𝑒𝑖𝑔ℎ𝑡𝑠 𝐻 𝑇 𝑅 = ℎ 𝑡−𝑞 𝑅 , … , ℎ 𝑡−1 𝑅 : 𝑠𝑡𝑎𝑐𝑘𝑖𝑛𝑔 ℎ𝑖𝑑𝑑𝑒𝑛 𝑠𝑡𝑎𝑡𝑒𝑠 𝑐𝑜𝑙𝑢𝑚𝑛 − 𝑤𝑖𝑠𝑒𝑙𝑦 𝑐𝑡 = 𝐻𝑡 𝛼 𝑡: context vector ℎ 𝑡 𝐷 = 𝑊 𝑐𝑡; ℎ 𝑡−1 𝑅 + 𝑏: 𝑜𝑢𝑡𝑝𝑢𝑡 𝑖𝑠 𝑐𝑜𝑛𝑐𝑎𝑡𝑖𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑐 𝑎𝑛𝑑 𝑙𝑎𝑠𝑡 𝑤𝑖𝑛𝑑𝑜𝑤 ℎ𝑖𝑑𝑑𝑒𝑛 𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑎𝑡𝑖𝑛𝑜
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Autoregressive Component • ARC overcomes loss of scale, cased by DNN non- linearity. • ARC is a linear AR.
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Final Output • Final output is obtained by integrating AR and DNN outputs. 𝑌𝑡 = ℎ 𝑡 𝐷 + ℎ 𝑡 𝐿
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Objective Function • The paper suggests using either L1 or L2 loss function. 𝐹: 𝐹𝑟𝑜𝑏𝑒𝑛𝑖𝑜𝑢𝑠 𝑁𝑜𝑟𝑚: 𝐴 𝐹 = 𝑖=1 𝑚 𝑗=1 𝑛 |𝑎𝑖𝑗|2 ℎ: ℎ𝑜𝑟𝑖𝑧𝑜𝑛
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Metrics • Root Relative Squared Error (RSE): We want lower error. • Empirical Correlation Coefficient (CORR): We want higher correlation.
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Comparison
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Code https://github.com/safrooze/LSTNet-Gluon
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. References 1. Forecasting: Principles and Practice – Rob J Hyndman, George Athanasopoulos https://www.otexts.org/fpp/8/3 2. DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks - Valentin Flunkert , David Salinas , Jan Gasthaus. https://arxiv.org/abs/1704.04110 3. http://sherrytowers.com/2014/07/11/negative-binomial-likelihood/ 4. Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks, Guokun Lai et. Al https://arxiv.org/pdf/1703.07015.pdf
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.