A self-adaptive service can maintain its QoS requirements in the presence of dynamic environment changes. To develop a self-adaptive service, service engineers have to create self-adaptation logic encoding when the service should execute which adaptation actions. However, developing self-adaptation logic may be difficult due to design time un- certainty; e.g., anticipating all potential environment changes at design time is in most cases infeasible. Online reinforcement learning addresses design time uncertainty by learning suitable adaptation actions through interactions with the environment at runtime. To learn more about its environment, reinforcement learning has to select actions that were not selected before, which is known as exploration. How exploration happens has an impact on the performance of the learning process. We focus on two problems related to how a service’s adaptation actions are explored: (1) Existing solutions randomly explore adaptation actions and thus may exhibit slow learning if there are many possible adaptation actions to choose from. (2) Existing solutions are unaware of service evolution, and thus may explore new adaptation actions introduced during such evolu- tion rather late. We propose novel exploration strategies that use feature models (from software product line engineering) to guide exploration in the presence of many adaptation actions and in the presence of service evolution. Experimental results for a self-adaptive cloud management service indicate an average speed-up of the learning process of 58.8% in the presence of many adaptation actions, and of 61.3% in the presence of service evolution. The improved learning performance in turn led to an average QoS improvement of 7.8% and 23.7% respectively.
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
1. Feature Model-Guided Online
Reinforcement Learning for
Self-Adaptive Services
Andreas Metzger1, Clément Quinton2, Zoltan Mann1, Luciano Baresi3, Klaus Pohl1
1 paluno, University of Duisburg-Essen
2 University of Lille, Inria
3 Politecnico di Milano
18th Int’l Conference on
Service-Oriented Computing
(ICSOC 2020)
Published as:
A. Metzger, C. Quinton, Z. Mann, L. Baresi, and K. Pohl, “Feature
model-guided online reinforcement learning for self-adaptive
services,” in 18th Int’l Conference on Service-Oriented Computing
(ICSOC 2020), Dubai, UAE, December 14-17, 2020, ser. LNCS, E. Kafeza,
B. Benatallah, F. Martinelli, H. Hacid, A. Bouguettaya, and H. Motahari,
Eds., vol. 12571. Springer, 2020
https://doi.org/10.1007/978-3-030-65310-1_20
3. Fundamentals
Self-adaptive service
• Modifies itself at runtime to
maintain QoS in presence
of dynamic environment changes
Example: Self-adaptive online store
1. Monitor: Sudden increase in workload
2. Analyze: User-perceived latency too low
3. Plan: Deactivate optional
“recommendation” feature
4. Execute: Replace “recommendations”
with static banner
3ICSOC 2020
Self-Adaptation Logic
Knowledge
Analyze Plan
Monitor Execute
System Logic
Environment
MAPE-K Reference Model
(based on [Kephart & Chess, 2003])
4. Engineering Challenges for Self-Adaption
4
Self-Adaptation Logic
Knowledge
Analyze Plan
Monitor Execute
System Logic
Environment
“Design time uncertainty”
[Weyns et al. 2013, D’Ippolito et al. 2014]
• Infeasible to anticipate all future
environment situations
e.g., QoS of dynamically bound services
• Difficult to precisely determine the impact
of adaptation actions on QoS
e.g., exact QoS impact when adding a VM
• Simplifying assumptions
e.g., too much effort to explicitly codify all
details as knowledge
ICSOC 2020
How to develop
self-adaptation
logic?
5. Emerging Approach:
Online Reinforcement Learning (RL)
RL fundamentals
5ICSOC 2020
Environment
Action at
State st
Reward rt+1
Policy
Update
Action
Selection
Next state st+1
Agent
• Learn suitable action selection policy via agent’s interactions with environment
• Agent receives reward for executing an action (here: adaptation action)
• Reward expresses how suitable action was (here: QoS satisfaction)
• Update policy from reward signal = learn
• Goal of RL: optimize cumulative rewards
(based on [Sutton & Barto, 2018])
6. Online RL for Self-Adaptive Services
Combining MAPE-K and RL [Palm et al., 2020]
6ICSOC 2020
Self-Adaptation Logic
Knowledge
Analyze Plan
Monitor Execute
Self-Adaptation Logic
Knowledge
Analyze Plan
Monitor Execute
Reinforcement Learning
at
rt+1
pt+1
Monitor Execute
Action-
Selection
Self-Adaptation Logic
Policy p
(Knowledge)
st
Policy Update
st+1
Reward
State Policy
Adaptation
Action
State
7. Problem Statement
Exploitation-exploration dilemma of RL [Sutton & Barto, 2018]
• Exploit existing knowledge vs explore new knowledge
• How adaptation actions are explored impacts on learning performance
Limitations of State of the Art in RL
for self-adaptive services (see Sec 6 in paper)
• (1) Random exploration (-greedy)
• Slow learning if large set of adaptation actions
• E.g., 8 services with 2 concrete service each
= 256 combinations
• (2) Evolution-unaware exploration
• New adaptations explored with low probability and thus late
7ICSOC 2020
9. Feature Models for Encoding Adaptations
ICSOC 2020 9
Web
Application
Data
Logging
Content
Discovery
Min Max
Medium
Search
Recommen-
dation
Web
Application
Data
Logging
Content
Discovery
Min Max
Medium
Search
Recommen-
dation
Nbr of Concurrent Users 1000 Adaptation
Mandatory
Optional
Alternative
Activated
• Feature model expresses system configurations in compact form
• Concrete system configuration expressed as feature combination
• Adaptation expressed as runtime reconfiguration
Recommendation
Max Medium
Recommendation
Max Medium
10. FM-guided Exploration
ICSOC 2020 10
Web
Application
Data
Logging
Content
Discovery
Min Max
Medium
Search
Recommen-
dation
Random (as used in -greedy)
Incremental Strategy (Inc)
Feature Degree Strategy (Deg)
Explore this one first…
…then move to its sibling
Start with randomly selected
leaf feature
Explore this one first…
Start with leaf feature that has
highest feature degree…
(FD = number of configs containing f)
FD = 5 FD = 4
Recommendation
Max Medium
Moresystematic(lessrandom)
11. Evolution-aware Exploration
Determine set-theoretic difference
between FM before and after
evolution step
Removed configurations:
Delete from policy (knowledge)
• {DataLogging , Medium , ContentDiscovery , Search}
• …
Added configurations:
Explore them first
• Added feature
• {DataLogging , Optimized , ContentDiscovery , Search}
• {DataLogging , Optimized , ContentDiscovery , Recommendation}
• Removed constraint
• {DataLogging , Min , ContentDiscovery , Recommendation}
ICSOC 2020 11
Web
Application
Data
Logging
Content
Discovery
Min Max
Medium
Search
Recommen-
dation
Recommendation
Max Medium
Opti-
mized
13. Experiment Setup
CloudRM – Self-adaptive Cloud Resource Management Service
• Feature Model:
• Real-world workload trace
• 10,000 tasks, 29 days
• „Simulated“ Evolution of Adaptation Space
ICSOC 2020 13
„Multiple“ Placement
„Maxsize“
Placement
CloudRM Service
„Simple“
Placement
„Consolidation-
Friendly“
Placement
Task Group
Size k
Relative
Size
2 3 20… 0.25 0.3 1
Selection
Policy
FF BF WF
Selection
Metric
lenmax min imb
PM Selection
Policy
(same as for
„Maxsize“)
PM Selection
Metric
(same as for
„Maxsize“)
VM Selection
Policy
max min
VM Selection
Metric
(same as for
„Maxsize“)
0.5 0.6 0.9
Evolution step #1
Evolution step #2
Evolution step #3
Initial
14. Experiment Setup
Parametrization of RL
• Integration of FM-guided strategies into Q-Learning
• Reward Function:
• Best hyper parameter configuration for -greedy
also used for FM-guided learning
Assessing learning performance
• 100 repetitions due to
stochastic nature
• Metrics
• Reward metrics
[Taylor & Stone, 2009]
• Plus: actual
energy + migrations
ICSOC 2020 14
e = energy
m = migrations
Time Step
Reward
Jumpstart
Asymptotic
Performance
Time to Threshold
(here: 90% of Asymptotic)
Total Reward =
Area under Curve
15. (1) Large Adaptation Space
ICSOC 2020 15
Asymptotic performance 0%
Time to threshold 48.6%
Jumpstart 1.3%
Total reward 58.8%
Energy savings 0.1%
Reduced VM migrations 7.8%
16. (2) Evolution of Adaptation Space
ICSOC 2020 16
Asymptotic performance 0.4%
Time to threshold 51.0%
Jumpstart 5.1%
Total reward 61.3%
Energy savings 0.1%
Reduced VM migrations 23.7%
18. Conclusion and Outlook
Exploiting structural knowledge from design time (feature models)
to guide online learning for self-adaptive services.
Future enhancements
• Experiments with additional systems
• Comparison for other exploration strategies and RL algorithms
• Considering changes of existing features (on top of additions and removals)
• Methodology for defining suitable feature models during design time
18ICSOC 2020
Research leading to these results has received funding from the EU’s
H2020 research and innovation programme under grant agreements no.
Thank You!
780351 – https://enact-project.eu/ 871525 – https://fogprotect.eu/
19. References
See paper.
Additional ones:
[Weyns et al. 2013] Danny Weyns, Nelly Bencomo, Radu Calinescu, Javier Cámara, Carlo Ghezzi,
Vincenzo Grassi, Lars Grunske, Paola Inverardi, Jean-Marc Jézéquel, Sam Malek, Raffaela
Mirandola, Marco Mori, Giordano Tamburrelli: Perpetual Assurances for Self-Adaptive
Systems. Software Engineering for Self-Adaptive Systems 2013: 31-63
[Kephart & Chess, 2003] Kephart, J.O., Chess, D.M.: The vision of autonomic computing. IEEE
Computer 36(1), 41–50 (2003)
ICSOC 2020 19
Hinweis der Redaktion
Welcome to our presentation on …
My name is… and I am happy to present this joint work with …
Strategy exploits semantics typically encoded in feature models. Non-leaf features are usually abstract features, which delegate their realization to their sub-features. Sub-features thus may offer different realizations of their abstract parent feature.
If no configuration containing f or a sibling feature of f is found, then the strategy moves on to the parent feature of f, which is repeated until a configuration is
found (line 13) or the root feature is reached (line 22).