SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
My Nguyen – Solutions Architect – Amazon Web Services Vietnam
AWS’s philosophy on
designing
MLOps platform
Dec 2020
© 2019, Amazon Web Services, Inc. or its Affiliates.
Agenda
• What is MLOps?
• DevOps vs MLOps
• DevOps practices inheritance
• Machine learning development lifecycle
• Unique driving factors to MLOps
• Personas
• Unique challenges faced by ML workload
• MLOps practices on Amazon SageMaker
• Complete separation of steps (and their environments)
• Versioning & tracking
• Pipeline automation
• Continuous improvement
• Demo
• QnA
2
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
What is MLOps?
Operationalizing machine learning workloads
© 2019, Amazon Web Services, Inc. or its Affiliates.
DevOps vs MLOps 4
© 2019, Amazon Web Services, Inc. or its Affiliates.
Notes: Technology is just a piece of the overall picture 5
© 2019, Amazon Web Services, Inc. or its Affiliates.
DevOps practices inheritance
• Communication & collaboration
• Continuous integration
• Continuous delivery/deployment
• Microservices design
• Infrastructure-as-code & configuration-as-code
• Continuous monitoring & logging
6
© 2019, Amazon Web Services, Inc. or its Affiliates.
Machine learning development lifecycle 7
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
Unique driving factors to MLOps
© 2019, Amazon Web Services, Inc. or its Affiliates.
Personas
• Business stakeholder
• Data scientist
• Domain expert
• Data engineer
• Security engineer
• Machine learning/DevOps engineer
• Software engineer
All with different skillsets & priorities
9
© 2019, Amazon Web Services, Inc. or its Affiliates.
Unique challenges
• Data:
• The need to utilize production data in development activities
• Dependencies on data pipelines
• Longer experiment lifecycles
• Output of model artifacts:
• Independent lifecycles between model and integrated applications/systems
• Monitoring & tracking of experiments and models
• Unique metrics for performance evaluation
10
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
MLOps practices on Amazon SageMaker
© 2019, Amazon Web Services, Inc. or its Affiliates.
Complete separation of steps
101011010
010101010
000011110
Data processing Explore
& Build
Train
&Validate
Deploy Monitor
12
© 2019, Amazon Web Services, Inc. or its Affiliates.
Versioning & tracking of every steps 13
© 2019, Amazon Web Services, Inc. or its Affiliates.
Pipeline automation
Metaflow Apache Airflow AWS Step FunctionsKubeflowFlyte
14
© 2019, Amazon Web Services, Inc. or its Affiliates.
SageMaker workflow
The notebook: An entry-point / studio / IDE
Notebook: Explore and Interact
Data Scientists
SageMaker Container
Runtime
Elastic Container
Registry (ECR)
Simple Storage
Service (S3)
15
© 2019, Amazon Web Services, Inc. or its Affiliates.
SageMaker Container
Runtime
Elastic Container
Registry (ECR)
Simple Storage
Service (S3)
SageMaker workflow
Prepare data and script; find or build container image(s)
Notebook: Explore and Interact
Training Data
Custom Code
Training Image
Framework Code
Data Scientists
16
© 2019, Amazon Web Services, Inc. or its Affiliates.
SageMaker Container
Runtime
Elastic Container
Registry (ECR)
Simple Storage
Service (S3)
SageMaker workflow
Run a training job to create a model artifact
Notebook: Explore and Interact
Training Job
Custom
model.tar.gz
Training Data
Custom Code Training Image
Framework CodeFrameworkData
Data Scientists
17
© 2019, Amazon Web Services, Inc. or its Affiliates.
SageMaker Container
Runtime
Elastic Container
Registry (ECR)
Simple Storage
Service (S3)
SageMaker workflow
Deploy the model to a real-time inference endpoint
Notebook: Explore and Interact
Inference Endpoint
Custom
Inference Image
model.tar.gz
Training Data
Framework Code
Training Image
Framework Code
FrameworkModel
Data Scientists
Inference Requests
Custom Code
18
© 2019, Amazon Web Services, Inc. or its Affiliates.
SageMaker Container
Runtime
Elastic Container
Registry (ECR)
Simple Storage
Service (S3)
SageMaker workflow
(…Or run a batch transform job)
Notebook: Explore and Interact
Transform Job
Custom
Inference Image
model.tar.gz Framework Code
Training Image
Framework Code
FrameworkModel
Data Scientists
Input Data
Custom Code
Results
19
© 2019, Amazon Web Services, Inc. or its Affiliates.
SageMaker Container
Runtime
Elastic Container
Registry (ECR)
Simple Storage
Service (S3)
SageMaker workflow
Notebook: Explore and Interact
Training Job
Endpoint /Transformer
Custom
Custom
Inference Image
model.tar.gz
Training Data
Custom Code
Framework Code
Training Image
Framework Code
FrameworkModel
FrameworkData
Data Scientists
Inference Requests
20
© 2019, Amazon Web Services, Inc. or its Affiliates.
Continuous improvement
SageMaker
Hosting
Services
SageMaker
Batch
Transform
SageMaker
Notebooks
SageMaker
Autopilot
SageMaker
Experiments
SageMaker
GroundTruth
SageMaker
Processing
SageMaker
Model
Monitor
Amazon
Augmented
AI
SageMaker
Training
SageMaker
Debugger
SageMaker
Hyperparameter
Tuning
SageMaker Studio, the First Fully Integrated Development
Environment For Machine Learning
21
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
Demo
Transformation from local notebook to SageMaker workflow
© 2019, Amazon Web Services, Inc. or its Affiliates.
The bigger picture 23
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
QnA
References:
https://d1.awsstatic.com/whitepapers/architecture/wellarchitected-Machine-Learning-Lens.pdf
https://github.com/aws-samples/aws-stepfunctions-byoc-mlops-using-data-science-sdk
https://github.com/apac-ml-tfc/sagemaker-workshop-101
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
Thank you!
My Nguyen - https://www.linkedin.com/in/mynguyen6512/

Weitere ähnliche Inhalte

Was ist angesagt?

Accelerating Your Cloud Migration Journey with MAP
Accelerating Your Cloud Migration Journey with MAPAccelerating Your Cloud Migration Journey with MAP
Accelerating Your Cloud Migration Journey with MAPAmazon Web Services
 
Amazon SageMaker 모델 배포 방법 소개::김대근, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나
Amazon SageMaker 모델 배포 방법 소개::김대근, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나Amazon SageMaker 모델 배포 방법 소개::김대근, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나
Amazon SageMaker 모델 배포 방법 소개::김대근, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나Amazon Web Services Korea
 
ABN AMRO DevSecOps Journey
ABN AMRO DevSecOps JourneyABN AMRO DevSecOps Journey
ABN AMRO DevSecOps JourneyDerek E. Weeks
 
Using Amazon SageMaker to build, train, & deploy your ML Models
Using Amazon SageMaker to build, train, & deploy your ML ModelsUsing Amazon SageMaker to build, train, & deploy your ML Models
Using Amazon SageMaker to build, train, & deploy your ML ModelsAmazon Web Services
 
Introduction to Machine Learning with Azure & Databricks
Introduction to Machine Learning with Azure & DatabricksIntroduction to Machine Learning with Azure & Databricks
Introduction to Machine Learning with Azure & DatabricksCCG
 
Introduction to Google Cloud Platform
Introduction to Google Cloud PlatformIntroduction to Google Cloud Platform
Introduction to Google Cloud PlatformOpsta
 
Dos and Don'ts of DevSecOps
Dos and Don'ts of DevSecOpsDos and Don'ts of DevSecOps
Dos and Don'ts of DevSecOpsPriyanka Aash
 
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...Amazon Web Services
 
App Modernization Pitch Deck.pptx
App Modernization Pitch Deck.pptxApp Modernization Pitch Deck.pptx
App Modernization Pitch Deck.pptxMONISH407209
 
Introduction to Google Compute Engine
Introduction to Google Compute EngineIntroduction to Google Compute Engine
Introduction to Google Compute EngineColin Su
 
Improve monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsImprove monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsNilesh Gule
 
AIOps - The next 5 years
AIOps - The next 5 yearsAIOps - The next 5 years
AIOps - The next 5 yearsMoogsoft
 
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLJordan Birdsell
 
MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)Julien SIMON
 

Was ist angesagt? (20)

Cloud Migration Workshop
Cloud Migration WorkshopCloud Migration Workshop
Cloud Migration Workshop
 
Accelerating Your Cloud Migration Journey with MAP
Accelerating Your Cloud Migration Journey with MAPAccelerating Your Cloud Migration Journey with MAP
Accelerating Your Cloud Migration Journey with MAP
 
Amazon SageMaker 모델 배포 방법 소개::김대근, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나
Amazon SageMaker 모델 배포 방법 소개::김대근, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나Amazon SageMaker 모델 배포 방법 소개::김대근, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나
Amazon SageMaker 모델 배포 방법 소개::김대근, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나
 
ABN AMRO DevSecOps Journey
ABN AMRO DevSecOps JourneyABN AMRO DevSecOps Journey
ABN AMRO DevSecOps Journey
 
Using Amazon SageMaker to build, train, & deploy your ML Models
Using Amazon SageMaker to build, train, & deploy your ML ModelsUsing Amazon SageMaker to build, train, & deploy your ML Models
Using Amazon SageMaker to build, train, & deploy your ML Models
 
Cloud Migration: A How-To Guide
Cloud Migration: A How-To GuideCloud Migration: A How-To Guide
Cloud Migration: A How-To Guide
 
Introduction to Machine Learning with Azure & Databricks
Introduction to Machine Learning with Azure & DatabricksIntroduction to Machine Learning with Azure & Databricks
Introduction to Machine Learning with Azure & Databricks
 
Introduction to Google Cloud Platform
Introduction to Google Cloud PlatformIntroduction to Google Cloud Platform
Introduction to Google Cloud Platform
 
Dos and Don'ts of DevSecOps
Dos and Don'ts of DevSecOpsDos and Don'ts of DevSecOps
Dos and Don'ts of DevSecOps
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...
 
App Modernization Pitch Deck.pptx
App Modernization Pitch Deck.pptxApp Modernization Pitch Deck.pptx
App Modernization Pitch Deck.pptx
 
Introduction to Google Compute Engine
Introduction to Google Compute EngineIntroduction to Google Compute Engine
Introduction to Google Compute Engine
 
Improve monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsImprove monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss tools
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
 
AIOps - The next 5 years
AIOps - The next 5 yearsAIOps - The next 5 years
AIOps - The next 5 years
 
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Migrating to the Cloud
Migrating to the CloudMigrating to the Cloud
Migrating to the Cloud
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of ML
 
MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)
 

Ähnlich wie Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform

AWS DevDay Cologne - CI/CD for modern applications
AWS DevDay Cologne - CI/CD for modern applicationsAWS DevDay Cologne - CI/CD for modern applications
AWS DevDay Cologne - CI/CD for modern applicationsCobus Bernard
 
Become a Machine Learning Developer with AWS Services
Become a Machine Learning Developer with AWS ServicesBecome a Machine Learning Developer with AWS Services
Become a Machine Learning Developer with AWS ServicesAmazon Web Services
 
Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)Julien SIMON
 
Amazon SageMaker workshop
Amazon SageMaker workshopAmazon SageMaker workshop
Amazon SageMaker workshopJulien SIMON
 
WhereML a Serverless ML Powered Location Guessing Twitter Bot
WhereML a Serverless ML Powered Location Guessing Twitter BotWhereML a Serverless ML Powered Location Guessing Twitter Bot
WhereML a Serverless ML Powered Location Guessing Twitter BotRandall Hunt
 
Train & Deploy ML Models with Amazon Sagemaker: Collision 2018
Train & Deploy ML Models with Amazon Sagemaker: Collision 2018Train & Deploy ML Models with Amazon Sagemaker: Collision 2018
Train & Deploy ML Models with Amazon Sagemaker: Collision 2018Amazon Web Services
 
Integrate Machine Learning into Your Spring Application in Less than an Hour
Integrate Machine Learning into Your Spring Application in Less than an HourIntegrate Machine Learning into Your Spring Application in Less than an Hour
Integrate Machine Learning into Your Spring Application in Less than an HourVMware Tanzu
 
Modern Applications Development on AWS
Modern Applications Development on AWSModern Applications Development on AWS
Modern Applications Development on AWSBoaz Ziniman
 
Supercharge your Machine Learning Solutions with Amazon SageMaker
Supercharge your Machine Learning Solutions with Amazon SageMakerSupercharge your Machine Learning Solutions with Amazon SageMaker
Supercharge your Machine Learning Solutions with Amazon SageMakerAmazon Web Services
 
Build Modern Applications that Align with Twelve-Factor Methods (API303) - AW...
Build Modern Applications that Align with Twelve-Factor Methods (API303) - AW...Build Modern Applications that Align with Twelve-Factor Methods (API303) - AW...
Build Modern Applications that Align with Twelve-Factor Methods (API303) - AW...Amazon Web Services
 
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Julien SIMON
 
AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...
AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...
AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...Jonathan Dion
 
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Amazon Web Services
 
CICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdfCICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdfAmazon Web Services
 
Mainframe Modernization with AWS: Patterns and Best Practices
Mainframe Modernization with AWS: Patterns and Best PracticesMainframe Modernization with AWS: Patterns and Best Practices
Mainframe Modernization with AWS: Patterns and Best PracticesAmazon Web Services
 
Driving Innovation with Serverless Applications (GPSBUS212) - AWS re:Invent 2018
Driving Innovation with Serverless Applications (GPSBUS212) - AWS re:Invent 2018Driving Innovation with Serverless Applications (GPSBUS212) - AWS re:Invent 2018
Driving Innovation with Serverless Applications (GPSBUS212) - AWS re:Invent 2018Amazon Web Services
 
[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...
[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...
[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...Amazon Web Services Korea
 
How_to_build_your_cloud_enablement_engine_with_the_people_you_already_have
How_to_build_your_cloud_enablement_engine_with_the_people_you_already_haveHow_to_build_your_cloud_enablement_engine_with_the_people_you_already_have
How_to_build_your_cloud_enablement_engine_with_the_people_you_already_haveAmazon Web Services
 

Ähnlich wie Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform (20)

AWS DevDay Cologne - CI/CD for modern applications
AWS DevDay Cologne - CI/CD for modern applicationsAWS DevDay Cologne - CI/CD for modern applications
AWS DevDay Cologne - CI/CD for modern applications
 
Become a Machine Learning Developer with AWS Services
Become a Machine Learning Developer with AWS ServicesBecome a Machine Learning Developer with AWS Services
Become a Machine Learning Developer with AWS Services
 
Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)
 
Amazon SageMaker workshop
Amazon SageMaker workshopAmazon SageMaker workshop
Amazon SageMaker workshop
 
WhereML a Serverless ML Powered Location Guessing Twitter Bot
WhereML a Serverless ML Powered Location Guessing Twitter BotWhereML a Serverless ML Powered Location Guessing Twitter Bot
WhereML a Serverless ML Powered Location Guessing Twitter Bot
 
Train & Deploy ML Models with Amazon Sagemaker: Collision 2018
Train & Deploy ML Models with Amazon Sagemaker: Collision 2018Train & Deploy ML Models with Amazon Sagemaker: Collision 2018
Train & Deploy ML Models with Amazon Sagemaker: Collision 2018
 
Integrate Machine Learning into Your Spring Application in Less than an Hour
Integrate Machine Learning into Your Spring Application in Less than an HourIntegrate Machine Learning into Your Spring Application in Less than an Hour
Integrate Machine Learning into Your Spring Application in Less than an Hour
 
Modern Applications Development on AWS
Modern Applications Development on AWSModern Applications Development on AWS
Modern Applications Development on AWS
 
Supercharge your Machine Learning Solutions with Amazon SageMaker
Supercharge your Machine Learning Solutions with Amazon SageMakerSupercharge your Machine Learning Solutions with Amazon SageMaker
Supercharge your Machine Learning Solutions with Amazon SageMaker
 
Build Modern Applications that Align with Twelve-Factor Methods (API303) - AW...
Build Modern Applications that Align with Twelve-Factor Methods (API303) - AW...Build Modern Applications that Align with Twelve-Factor Methods (API303) - AW...
Build Modern Applications that Align with Twelve-Factor Methods (API303) - AW...
 
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)
 
AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...
AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...
AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...
 
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
 
CICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdfCICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdf
 
Mainframe Modernization with AWS: Patterns and Best Practices
Mainframe Modernization with AWS: Patterns and Best PracticesMainframe Modernization with AWS: Patterns and Best Practices
Mainframe Modernization with AWS: Patterns and Best Practices
 
Driving Innovation with Serverless Applications (GPSBUS212) - AWS re:Invent 2018
Driving Innovation with Serverless Applications (GPSBUS212) - AWS re:Invent 2018Driving Innovation with Serverless Applications (GPSBUS212) - AWS re:Invent 2018
Driving Innovation with Serverless Applications (GPSBUS212) - AWS re:Invent 2018
 
CI/CD for Modern Applications
CI/CD for Modern ApplicationsCI/CD for Modern Applications
CI/CD for Modern Applications
 
[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...
[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...
[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...
 
How_to_build_your_cloud_enablement_engine_with_the_people_you_already_have
How_to_build_your_cloud_enablement_engine_with_the_people_you_already_haveHow_to_build_your_cloud_enablement_engine_with_the_people_you_already_have
How_to_build_your_cloud_enablement_engine_with_the_people_you_already_have
 
Amazon SageMaker
Amazon SageMakerAmazon SageMaker
Amazon SageMaker
 

Mehr von Grokking VN

Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banksGrokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banksGrokking VN
 
Grokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking VN
 
Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking VN
 
Grokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystifiedGrokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystifiedGrokking VN
 
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking VN
 
Grokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applicationsGrokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applicationsGrokking VN
 
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
 Grokking Techtalk #39: How to build an event driven architecture with Kafka ... Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...Grokking VN
 
Grokking Techtalk #38: Escape Analysis in Go compiler
 Grokking Techtalk #38: Escape Analysis in Go compiler Grokking Techtalk #38: Escape Analysis in Go compiler
Grokking Techtalk #38: Escape Analysis in Go compilerGrokking VN
 
Grokking Techtalk #37: Data intensive problem
 Grokking Techtalk #37: Data intensive problem Grokking Techtalk #37: Data intensive problem
Grokking Techtalk #37: Data intensive problemGrokking VN
 
Grokking Techtalk #37: Software design and refactoring
 Grokking Techtalk #37: Software design and refactoring Grokking Techtalk #37: Software design and refactoring
Grokking Techtalk #37: Software design and refactoringGrokking VN
 
Grokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellcheckingGrokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellcheckingGrokking VN
 
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
 Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer... Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...Grokking VN
 
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKIGrokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKIGrokking VN
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking VN
 
SOLID & Design Patterns
SOLID & Design PatternsSOLID & Design Patterns
SOLID & Design PatternsGrokking VN
 
Grokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking VN
 
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at ScaleGrokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at ScaleGrokking VN
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking VN
 
Grokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking VN
 
Grokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the MagicGrokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the MagicGrokking VN
 

Mehr von Grokking VN (20)

Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banksGrokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
 
Grokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles Thinking
 
Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...
 
Grokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystifiedGrokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystified
 
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
 
Grokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applicationsGrokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applications
 
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
 Grokking Techtalk #39: How to build an event driven architecture with Kafka ... Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
 
Grokking Techtalk #38: Escape Analysis in Go compiler
 Grokking Techtalk #38: Escape Analysis in Go compiler Grokking Techtalk #38: Escape Analysis in Go compiler
Grokking Techtalk #38: Escape Analysis in Go compiler
 
Grokking Techtalk #37: Data intensive problem
 Grokking Techtalk #37: Data intensive problem Grokking Techtalk #37: Data intensive problem
Grokking Techtalk #37: Data intensive problem
 
Grokking Techtalk #37: Software design and refactoring
 Grokking Techtalk #37: Software design and refactoring Grokking Techtalk #37: Software design and refactoring
Grokking Techtalk #37: Software design and refactoring
 
Grokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellcheckingGrokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellchecking
 
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
 Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer... Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
 
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKIGrokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKI
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
 
SOLID & Design Patterns
SOLID & Design PatternsSOLID & Design Patterns
SOLID & Design Patterns
 
Grokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous Communications
 
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at ScaleGrokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
 
Grokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search Tree
 
Grokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the MagicGrokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the Magic
 

Kürzlich hochgeladen

Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 

Kürzlich hochgeladen (20)

Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 

Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform

  • 1. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. My Nguyen – Solutions Architect – Amazon Web Services Vietnam AWS’s philosophy on designing MLOps platform Dec 2020
  • 2. © 2019, Amazon Web Services, Inc. or its Affiliates. Agenda • What is MLOps? • DevOps vs MLOps • DevOps practices inheritance • Machine learning development lifecycle • Unique driving factors to MLOps • Personas • Unique challenges faced by ML workload • MLOps practices on Amazon SageMaker • Complete separation of steps (and their environments) • Versioning & tracking • Pipeline automation • Continuous improvement • Demo • QnA 2
  • 3. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. What is MLOps? Operationalizing machine learning workloads
  • 4. © 2019, Amazon Web Services, Inc. or its Affiliates. DevOps vs MLOps 4
  • 5. © 2019, Amazon Web Services, Inc. or its Affiliates. Notes: Technology is just a piece of the overall picture 5
  • 6. © 2019, Amazon Web Services, Inc. or its Affiliates. DevOps practices inheritance • Communication & collaboration • Continuous integration • Continuous delivery/deployment • Microservices design • Infrastructure-as-code & configuration-as-code • Continuous monitoring & logging 6
  • 7. © 2019, Amazon Web Services, Inc. or its Affiliates. Machine learning development lifecycle 7
  • 8. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. Unique driving factors to MLOps
  • 9. © 2019, Amazon Web Services, Inc. or its Affiliates. Personas • Business stakeholder • Data scientist • Domain expert • Data engineer • Security engineer • Machine learning/DevOps engineer • Software engineer All with different skillsets & priorities 9
  • 10. © 2019, Amazon Web Services, Inc. or its Affiliates. Unique challenges • Data: • The need to utilize production data in development activities • Dependencies on data pipelines • Longer experiment lifecycles • Output of model artifacts: • Independent lifecycles between model and integrated applications/systems • Monitoring & tracking of experiments and models • Unique metrics for performance evaluation 10
  • 11. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. MLOps practices on Amazon SageMaker
  • 12. © 2019, Amazon Web Services, Inc. or its Affiliates. Complete separation of steps 101011010 010101010 000011110 Data processing Explore & Build Train &Validate Deploy Monitor 12
  • 13. © 2019, Amazon Web Services, Inc. or its Affiliates. Versioning & tracking of every steps 13
  • 14. © 2019, Amazon Web Services, Inc. or its Affiliates. Pipeline automation Metaflow Apache Airflow AWS Step FunctionsKubeflowFlyte 14
  • 15. © 2019, Amazon Web Services, Inc. or its Affiliates. SageMaker workflow The notebook: An entry-point / studio / IDE Notebook: Explore and Interact Data Scientists SageMaker Container Runtime Elastic Container Registry (ECR) Simple Storage Service (S3) 15
  • 16. © 2019, Amazon Web Services, Inc. or its Affiliates. SageMaker Container Runtime Elastic Container Registry (ECR) Simple Storage Service (S3) SageMaker workflow Prepare data and script; find or build container image(s) Notebook: Explore and Interact Training Data Custom Code Training Image Framework Code Data Scientists 16
  • 17. © 2019, Amazon Web Services, Inc. or its Affiliates. SageMaker Container Runtime Elastic Container Registry (ECR) Simple Storage Service (S3) SageMaker workflow Run a training job to create a model artifact Notebook: Explore and Interact Training Job Custom model.tar.gz Training Data Custom Code Training Image Framework CodeFrameworkData Data Scientists 17
  • 18. © 2019, Amazon Web Services, Inc. or its Affiliates. SageMaker Container Runtime Elastic Container Registry (ECR) Simple Storage Service (S3) SageMaker workflow Deploy the model to a real-time inference endpoint Notebook: Explore and Interact Inference Endpoint Custom Inference Image model.tar.gz Training Data Framework Code Training Image Framework Code FrameworkModel Data Scientists Inference Requests Custom Code 18
  • 19. © 2019, Amazon Web Services, Inc. or its Affiliates. SageMaker Container Runtime Elastic Container Registry (ECR) Simple Storage Service (S3) SageMaker workflow (…Or run a batch transform job) Notebook: Explore and Interact Transform Job Custom Inference Image model.tar.gz Framework Code Training Image Framework Code FrameworkModel Data Scientists Input Data Custom Code Results 19
  • 20. © 2019, Amazon Web Services, Inc. or its Affiliates. SageMaker Container Runtime Elastic Container Registry (ECR) Simple Storage Service (S3) SageMaker workflow Notebook: Explore and Interact Training Job Endpoint /Transformer Custom Custom Inference Image model.tar.gz Training Data Custom Code Framework Code Training Image Framework Code FrameworkModel FrameworkData Data Scientists Inference Requests 20
  • 21. © 2019, Amazon Web Services, Inc. or its Affiliates. Continuous improvement SageMaker Hosting Services SageMaker Batch Transform SageMaker Notebooks SageMaker Autopilot SageMaker Experiments SageMaker GroundTruth SageMaker Processing SageMaker Model Monitor Amazon Augmented AI SageMaker Training SageMaker Debugger SageMaker Hyperparameter Tuning SageMaker Studio, the First Fully Integrated Development Environment For Machine Learning 21
  • 22. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. Demo Transformation from local notebook to SageMaker workflow
  • 23. © 2019, Amazon Web Services, Inc. or its Affiliates. The bigger picture 23
  • 24. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. QnA References: https://d1.awsstatic.com/whitepapers/architecture/wellarchitected-Machine-Learning-Lens.pdf https://github.com/aws-samples/aws-stepfunctions-byoc-mlops-using-data-science-sdk https://github.com/apac-ml-tfc/sagemaker-workshop-101
  • 25. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. Thank you! My Nguyen - https://www.linkedin.com/in/mynguyen6512/

Hinweis der Redaktion

  1. Build trên nền Non trẻ hơn
  2. Also pipeline-as-code & policy-as-code
  3. Different skillset & priorities
  4. Also pipeline-as-code & policy-as-code
  5. Code versioning controls Shared environments, IDE – Jupyter Note/Lab Infrastructure as code Self-service environment SaaS
  6. Most importantly: training & processing Separation of source, environments, etc. Security Experiment lifecycles Pricing Efficiency
  7. Reproduceability is hard End-to-end tracability Dashboard ->
  8. Netflix built metaflow Lyft build Flyte Kubeflow Apache Airflow Important factor: skill set & enforce Metaflow Netflix built metaflow Netflix is a huge customer of AWS In production since 2018 Made open source by Netflix & AWS in 2019 What is it? Basic concepts of metaflow Deploying to AWS is easy Flyte A K8s native distributed workflow orchestrator used at Lyft for: Data science Pricing Fraud detection Locations ETA and more Enables highly concurrent, scalable workflows for ML and data processing Core concepts of Flyte – task, DAG, workflows, control flow specification. Actual task can be in any language – tasks executed as containers. Provisions necessary resources dynamically, executes tasks as docker containers, and de-provisions resources when tasks are complete to control costs. Supports execution across 100s of machines e.g. production model training Kubeflow, Airflow are fairly popular Airflow Amazon SageMaker with Apache Airflow 1.10.1. If you use Airflow, you can use SageMaker Workflow in Apache Airflow More details from https://sagemaker.readthedocs.io/en/stable/using_workflow.html Many customers want to use the fully managed capabilities of Amazon SageMaker for machine learning, but also want platform and infrastructure teams to continue using Kubernetes for orchestration and managing pipelines. SageMaker addresses this requirement by letting Kubernetes users train and deploy models in SageMaker using SageMaker-Kubeflow operations and pipelines. With operators and pipelines, Kubernetes users can access fully managed SageMaker ML tools and engines, natively from Kubeflow. This eliminates the need to manually manage and optimize ML infrastructure in Kubernetes while still preserving control of overall orchestration through Kubernetes. Using SageMaker operators and pipelines for Kubernetes, you can get the benefits of a fully managed service for machine learning in Kubernetes, without migrating workloads. If you use Kubernetes, you can use SageMaker Operators for Kubernetes You can install the Sagemaker Operator for Kubernetes using the provided Helm Chart Once you have this operator installed, K8s users can natively invoke SageMaker features like model training, Hyperparameter Tuning and Batch Transform jobs They can also setup model serving using SageMaker Model Hosting Services https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_operators_for_kubernetes.html#what-is-an-operator https://eksworkshop.com/advanced/420_kubeflow/pipelines/ We see customers build serverless ML workflows using AWS Step Functions Open source - Step Functions Data Science SDK for SageMaker Create workflows to pre-process data, train/deploy models using SageMaker Data pre-processing can be done using AWS Glue SageMaker functionality like model training, HPO and end point creation is accessible Use the SDK to create and visualize the workflows Scale workflows without having to worry about infrastructure https://aws.amazon.com/about-aws/whats-new/2019/11/introducing-aws-step-functions-data-science-sdk-amazon-sagemaker/ Many good tools exist. You can run any of the tools we saw earlier on AWS. Remember - Tools are meant to make your life easier Don’t get fixated on the tools. Work backwards from the problem you are trying to solve. So think about your existing s/w engg workflows and tools Ask yourself, which tools will best augment what you already have Ask yourself, which tools are your people most comfortable with AWS approach is use the tools that work for you
  9. Easy to think of SageMaker as Notebook. The key thing to remember is that the notebook UI we see a lot in the demos is just a part of the SageMaker platform – and an optional part at that! The notebook is the front-end environment in which we’ll experiment with our data and code. Keep that instance low-cost resource. Value of separation… When we’re ready to try and train or deploy a model, we’ll be spinning up separate, dedicated infrastructure in the SageMaker container runtime – which means we have lots of flexibility to choose resources cost-effectively and only pay for what we need. All managed The orchestration that SageMaker gives us to make this happen is closely integrated to these other two services: The images defining our containers will need to be stored in Amazon ECR (there’s not currently an integration for external registries like DockerHub – but if you have a particular technology in mind our service team would appreciate the feedback! …And the preferred storage platform for not just our input data but also model artifacts and other stuff generated in the workflow will be Amazon S3. Why? <The generic S3 pitch – it’s got everything you need for a data lake> Most integrated service, arguably most mature, tiers, security models, high durability Recaping: 4 things …So let’s look at how that end-to-end process works.
  10. To start with I have: The data that I want to train on (prepared and loaded to S3) – pre-processed already, in Notebook, but also option for other services like Glue or Processing Jobs to … The training script I’d like to run (e.g. defining neural network shape and fitting routine – on the notebook instance where I’m working) minimum code One of the pre-prepared SageMaker framework container images somewhere in Amazon ECR – maybe TensorFlow, PyTorch, or MXNet repeatable, controlled, re-producable
  11. So what’s happening when we start a training job by calling “estimator.fit()” in those examples from before? We’re gonna start seeing a lot of arrows here, so the cool thing to remember is that all of the arrows are things *SageMaker is doing for you* - not things you need to do yourself! First, assuming you provide a custom code script (or folder of code), the SageMaker SDK is going to zip that up and upload it to a new location in S3. So you can’t forget to check your working version in to git, and you won’t lose track of that version that worked well in the middle of your experiments: The results are going to be traceable to the code that created them. Next, SageMaker is going to spin up whatever infrastructure you asked for in the fit() request, and pull down the docker image to run on it SageMaker will also start downloading your source data from S3 into the container – no messing about with S3 API calls in your script – your code can read it from folder, just as if you were running locally. Env params… As the container fires up, that framework application does a load of helpful prep but one particularly important thing: It installs any additional inline dependencies specified for your custom code, then starts it up and passes in the parameters of the training job. Your code runs, prints status to the console, and saves the trained model to disk just like you normally would… But SageMaker takes care of zipping and uploading that final model to S3 – and also other output mechanisms like sending the logs to CloudWatch and collecting metrics. Pay only for … So the benefit we’ve gained here is that our custom code can be quite simple: Load a CSV from file, make a random forest, save it to file, etc. We can even add specify additional dependencies via a requirements.txt file… and SageMaker plus the framework container will orchestrate these overhead tasks to give us this nice lineage-traceable workflow with all of the cool features we talked about earlier – with no extra code complexity required on our part.
  12. When it’s time to deploy that model to an inference endpoint, we simply reference: Our model artifact tarball from S3 An inference container (which might be the same one as for training, or might be a different image because the dependencies could be differently optimized for run-time) And maybe some custom code again: This time just defining some helper functions that we might want to customize from the built-in inference flow, such as how to de/serialize requests and responses, or how the model file(s) need to be loaded from disk into memory if the process is different from standard. How it’s optimized As in training, SageMaker will handle the creation of infrastructure and loading of these components for us. If we used the ‘estimator’ pattern from the high-level SageMaker SDK, all we need to call is a single estimator.deploy(…) function to make it happen. Again here the intent is that any custom code needed can be small: Just providing a few optional functions for serialization, model loading, etc… Rather than writing and having to maintain a model server, integrations with TorchServe or TensorFlow Serving, etc. Custom input format (JSON)…
  13. Not today, but… In SageMaker, batch transform jobs function pretty much identically to real time inference endpoints from a user code point of view: The batch transform engine handles reading your source data from S3, feeding it through your model, storing the results back to S3, and shutting down the resources again as soon as the job is done. Pay only for…
  14. Mechanism: how easiest for different personas? Skillset dependency – learning curve …So that’s our overview picture for framework containers: You write pretty minimal code just as you usually would for experimenting in your notebook. But instead of running that code locally, which can make things like infrastructure optimization, experiment tracking, and inference deployment tricky… SageMaker provides some nice streamlined, high-level APIs to trigger containerized training and inference jobs (or deploy endpoints) on separate infrastructure. At the fundamental level, the system is super flexible because you can make fully custom container images and model artifact tarballs… But the framework container images together with the SageMaker SDK library (for your notebook) enable this higher-level, container-plus-custom-code workflow. Same as the morning, just diff drawing Solve problems on experimenting, tracking, etc.
  15. Also lession learnt & best practices
  16. The Repeatable stage is generally focused on applying automation as the number of machine learning workloads running in production increases. In general, at this stage many of the activities in building, training and deploying machine learning models is automated. The introduction of automation reduces manual hand-offs between teams and reduces the operational overhead of previously manual/ad-hoc tasks. The ability to orchestrate machine learning workflows into automated machine learning also depends on having a data strategy and automated data processing tasks. Queue Management: Ability to manage, schedule, and prioritize tasks Resource Management: Access to horizontally scalable compute that can scale based on workflow task requirements Workflow Operators: Error handling, retry and conditional logic functions Workflow Logs: Centralized logs and configuration parameters for execution and task level logs The Reliable stage builds on the automation from the Repeatable stage but aims to ensure automation is balanced with practices aimed to increase quality, enable end-to-end traceability, increase reliability through automatic rollbacks, increase visibility into development and operational health, and ensure repeatability. In general, at this stage MLOps practices of Infrastructure-as-Code/Configuration-as-Code, Continuous Integration, Continuous Delivery/Deployment, and Continuous Monitoring are introduced.