Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services

Feature Model-Guided Online
Reinforcement Learning for
Self-Adaptive Services
Andreas Metzger1, Clément Quinton2, Zoltan Mann1, Luciano Baresi3, Klaus Pohl1
1 paluno, University of Duisburg-Essen
2 University of Lille, Inria
3 Politecnico di Milano
18th Int’l Conference on
Service-Oriented Computing
(ICSOC 2020)
Published as:
A. Metzger, C. Quinton, Z. Mann, L. Baresi, and K. Pohl, “Feature
model-guided online reinforcement learning for self-adaptive
services,” in 18th Int’l Conference on Service-Oriented Computing
(ICSOC 2020), Dubai, UAE, December 14-17, 2020, ser. LNCS, E. Kafeza,
B. Benatallah, F. Martinelli, H. Hacid, A. Bouguettaya, and H. Motahari,
Eds., vol. 12571. Springer, 2020
https://doi.org/10.1007/978-3-030-65310-1_20

Agenda
2
Motivation and
Problem
Statement
FM-guided
Online
Reinforcement
Learning
Experiments
Conclusion and
Outlook
ICSOC 2020

Fundamentals
Self-adaptive service
• Modifies itself at runtime to
maintain QoS in presence
of dynamic environment changes
Example: Self-adaptive online store
1. Monitor: Sudden increase in workload
2. Analyze: User-perceived latency too low
3. Plan: Deactivate optional
“recommendation” feature
4. Execute: Replace “recommendations”
with static banner
3ICSOC 2020
Self-Adaptation Logic
Knowledge
Analyze Plan
Monitor Execute
System Logic
Environment
MAPE-K Reference Model
(based on [Kephart & Chess, 2003])

Engineering Challenges for Self-Adaption
4
Knowledge
Analyze Plan
Monitor Execute
System Logic
Environment
“Design time uncertainty”
[Weyns et al. 2013, D’Ippolito et al. 2014]
• Infeasible to anticipate all future
environment situations
e.g., QoS of dynamically bound services
• Difficult to precisely determine the impact
of adaptation actions on QoS
e.g., exact QoS impact when adding a VM
• Simplifying assumptions
e.g., too much effort to explicitly codify all
details as knowledge
ICSOC 2020
How to develop
self-adaptation
logic?

Emerging Approach:
Online Reinforcement Learning (RL)
RL fundamentals
5ICSOC 2020
Environment
Action at
State st
Reward rt+1
Policy
Update
Action
Selection
Next state st+1
Agent
• Learn suitable action selection policy via agent’s interactions with environment
• Agent receives reward for executing an action (here: adaptation action)
• Reward expresses how suitable action was (here: QoS satisfaction)
• Update policy from reward signal = learn
• Goal of RL: optimize cumulative rewards
(based on [Sutton & Barto, 2018])

Online RL for Self-Adaptive Services
Combining MAPE-K and RL [Palm et al., 2020]
6ICSOC 2020
Knowledge
Analyze Plan
Monitor Execute
Knowledge
Analyze Plan
Monitor Execute
Reinforcement Learning
at
rt+1
pt+1
Monitor Execute
Action-
Selection
Policy p
(Knowledge)
st
Policy Update
st+1
Reward
State Policy
Adaptation
Action
State

Problem Statement
Exploitation-exploration dilemma of RL [Sutton & Barto, 2018]
• Exploit existing knowledge vs explore new knowledge
• How adaptation actions are explored impacts on learning performance
Limitations of State of the Art in RL
for self-adaptive services (see Sec 6 in paper)
• (1) Random exploration (-greedy)
• Slow learning if large set of adaptation actions
• E.g., 8 services with 2 concrete service each
= 256 combinations
• (2) Evolution-unaware exploration
• New adaptations explored with low probability and thus late
7ICSOC 2020

Agenda
8
Motivation and
Problem
Statement
FM-guided
Online
Reinforcement
Learning
Experiments
Conclusion and
Outlook
ICSOC 2020

Feature Models for Encoding Adaptations
ICSOC 2020 9
Web
Application
Data
Logging
Content
Discovery
Min Max
Medium
Search
Recommen-
dation

  

Web
Application
Data
Logging
Content
Discovery
Min Max
Medium
Search
Recommen-
dation

 

Nbr of Concurrent Users  1000  Adaptation
Mandatory
Optional
Alternative
 Activated
• Feature model expresses system configurations in compact form
• Concrete system configuration expressed as feature combination
• Adaptation expressed as runtime reconfiguration
Recommendation
 Max  Medium
Recommendation
 Max  Medium


FM-guided Exploration
ICSOC 2020 10
Web
Application
Data
Logging
Content
Discovery
Min Max
Medium
Search
Recommen-
dation
Random (as used in -greedy)
Incremental Strategy (Inc)
Feature Degree Strategy (Deg)
Explore this one first…
…then move to its sibling
Start with randomly selected
leaf feature
Explore this one first…
Start with leaf feature that has
highest feature degree…
(FD = number of configs containing f)
FD = 5 FD = 4
Recommendation
 Max  Medium
Moresystematic(lessrandom)

Evolution-aware Exploration
Determine set-theoretic difference
between FM before and after
evolution step
Removed configurations:
Delete from policy (knowledge)
• {DataLogging , Medium , ContentDiscovery , Search}
• …
Added configurations:
Explore them first
• Added feature
• {DataLogging , Optimized , ContentDiscovery , Search}
• {DataLogging , Optimized , ContentDiscovery , Recommendation}
• Removed constraint
• {DataLogging , Min , ContentDiscovery , Recommendation}
ICSOC 2020 11
Web
Application
Data
Logging
Content
Discovery
Min Max
Medium
Search
Recommen-
dation
Recommendation
 Max  Medium
Opti-
mized

Agenda
12
Motivation and
Problem
Statement
FM-guided
Online
Reinforcement
Learning
Experiments
Conclusion and
Outlook
ICSOC 2020

Experiment Setup
CloudRM – Self-adaptive Cloud Resource Management Service
• Feature Model:
• Real-world workload trace
• 10,000 tasks, 29 days
• „Simulated“ Evolution of Adaptation Space
ICSOC 2020 13
„Multiple“ Placement
„Maxsize“
Placement
CloudRM Service
„Simple“
Placement
„Consolidation-
Friendly“
Placement
Task Group
Size k
Relative
Size 
2 3 20… 0.25 0.3 1
Selection
Policy
FF BF WF
Selection
Metric
  lenmax min imb
PM Selection
Policy
(same as for
„Maxsize“)
PM Selection
Metric
(same as for
„Maxsize“)
VM Selection
Policy
max min
VM Selection
Metric
(same as for
„Maxsize“)
0.5 0.6 0.9
Evolution step #1
Evolution step #2
Evolution step #3
Initial

Experiment Setup
Parametrization of RL
• Integration of FM-guided strategies into Q-Learning
• Reward Function:
• Best hyper parameter configuration for -greedy
also used for FM-guided learning
Assessing learning performance
• 100 repetitions due to
stochastic nature
• Metrics
• Reward metrics
[Taylor & Stone, 2009]
• Plus: actual
energy + migrations
ICSOC 2020 14
e = energy
m = migrations
Time Step
Reward
Jumpstart
Asymptotic
Performance
Time to Threshold
(here: 90% of Asymptotic)
Total Reward =
Area under Curve

(1) Large Adaptation Space
ICSOC 2020 15
Asymptotic performance 0%
Time to threshold 48.6%
Jumpstart 1.3%
Total reward 58.8%
Energy savings 0.1%
Reduced VM migrations 7.8%

(2) Evolution of Adaptation Space
ICSOC 2020 16
Asymptotic performance 0.4%
Time to threshold 51.0%
Jumpstart 5.1%
Total reward 61.3%
Energy savings 0.1%
Reduced VM migrations 23.7%

Agenda
17
Motivation and
Problem
Statement
FM-guided
Online
Reinforcement
Learning
Experiments
Conclusion and
Outlook
ICSOC 2020

Conclusion and Outlook
Exploiting structural knowledge from design time (feature models)
to guide online learning for self-adaptive services.
Future enhancements
• Experiments with additional systems
• Comparison for other exploration strategies and RL algorithms
• Considering changes of existing features (on top of additions and removals)
• Methodology for defining suitable feature models during design time
18ICSOC 2020
Research leading to these results has received funding from the EU’s
H2020 research and innovation programme under grant agreements no.
Thank You!
780351 – https://enact-project.eu/ 871525 – https://fogprotect.eu/

References
See paper.
Additional ones:
[Weyns et al. 2013] Danny Weyns, Nelly Bencomo, Radu Calinescu, Javier Cámara, Carlo Ghezzi,
Vincenzo Grassi, Lars Grunske, Paola Inverardi, Jean-Marc Jézéquel, Sam Malek, Raffaela
Mirandola, Marco Mori, Giordano Tamburrelli: Perpetual Assurances for Self-Adaptive
Systems. Software Engineering for Self-Adaptive Systems 2013: 31-63
[Kephart & Chess, 2003] Kephart, J.O., Chess, D.M.: The vision of autonomic computing. IEEE
Computer 36(1), 41–50 (2003)
ICSOC 2020 19

Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (11)

Ähnlich wie Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services

Ähnlich wie Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services (20)

Mehr von Andreas Metzger

Mehr von Andreas Metzger (14)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services

Hinweis der Redaktion