SlideShare ist ein Scribd-Unternehmen logo
1 von 51
1
From idea to production in a day
Leveraging Azure ML and Streamlit
to build and user test machine learning
ideas quickly
Florian Roscheck
PyCon DE & PyData Berlin 2024
2
3
4
How do we use it
to build + test
quickly?
What is our tech
stack?
What are we
building?
5
Hi, I’m Florian!
Sr. Data Scientist
Florian Roscheck
• Sr. Data Scientist at Henkel
• Instructor for Apache Spark
with 7k+ students
• Vice President NumFOCUS
Affiliated Project Selection Committee
• Active on LinkedIn
6
WHAT TO BUILD
IN ONE DAY?
A Minimum
Viable Product
• Enough features to be usable
• Ability to collect user feedback
7
WHAT TO BUILD
IN ONE DAY?
A Minimum
Viable Product
BUILD
M
E
A
-
S
U
R
E
LEARN
To learn about users quickly,
we want to implement
build-measure-learn loop
To make users happier over time,
we aim to create data flywheel
8
Ready?
9
Ready?
Data not in
place
Environment
issues
Lost in
modeling
Inappropriate
user interface
No feedback
about use
Difficult
collaboration
10
BUILD
M
E
A
-
S
U
R
E
LEARN
Data not in
place
Environment
issues
Lost in
modeling
Inappropriate
user interface
No feedback
about use
Difficult
collaboration
GET DATA
BEFOREHAND
11
HOW TO BUILD IN ONE DAY?
A Time-Saving Stack Environment
issues
Lost in
modeling
Inappropriate
user interface
No feedback
about use
Difficult
collaboration
Azure ML
Notebooks
Automated ML on Azure
Streamlit
Azure Application Insights
+ Streamlit
12
Let’s build!
13
Example: Trash Recognizer App
bottle
• Customer: Waste management company
• Need: Want to evaluate computer vision
solutions for recognizing trash
• Idea: Waste management professionals
manually evaluate performance through app
with feedback functionality
14
BUILD
M
E
A
-
S
U
R
E
LEARN
1 Get Data
2
Train
Model
3 Build App
4
Deploy App
with Model
5
Collect
Feedback
Our Plan
15
BUILD
M
E
A
-
S
U
R
E
LEARN
1 Get Data
2
Train
Model
3 Build App
4
Deploy App
with Model
5
Collect
Feedback
Our Plan
16
Training Data: TACO Trash Image Dataset
• TACO: Trash Annotations in Context
• Dataset of 1.5k images with 4.7k+ annotations
• Annotations for 60 categories, incl. backgrounds
• Open source
Proença, P
. F., & Simões, P
. (2020). TACO: Trash Annotations in Context for
Litter Detection. arXiv Preprint arXiv:2003.06975.
tacodataset.org Source: tacodataset.org
17
Source: tacodataset.org
18
What is Azure ML?
• Cloud-based ML platform by Microsoft
• Run ad-hoc analyses with
Jupyter Notebooks
• Run and track machine learning
experiments through tight integration
with MLFlow
• Version data and models
• Build complex and reproducible modeling
pipelines
• Deploy models as API
Screenshot of Azure ML web app
19
Basics of Getting Data
Data Asset
TACO-annotations
• Azure ML-managed
• Like a mask for files
• Sharable
• Version controlled
• Interactively explorable
TACO
GitHub repo
!
Azure ML Workspace
Azure blob
storage
Azure ML Notebook
0_prepare_dataset.ipynb
• Like Jupyter Notebook
• Managed environment
• Sharable
• Runs on compute in
workspace
Reproducible
environment
Collaboration-
ready
ENABLERS
20
Movie
21
BUILD
M
E
A
-
S
U
R
E
LEARN
1 Get Data
2
Train
Model
3 Build App
4
Deploy App
with Model
5
Collect
Feedback
Our Plan
22
Automated
Machine Learning
23
!
24
Automated Machine Learning
on Azure ML
• Automated ML: Try different models and hyperparameters
that are automatically selected
• We have very little time for modeling!
• Depending on data and problem type, automated machine
learning can provide a reasonable starting point for
modeling with a high return on time investment
• Azure ML offers automated ML pipelines for several
common tasks, incl. classification, regression, forecasting,
NLP
, and computer vision
Modeling time
saver
ENABLERS
25
Setting Up AutoML Through Code
Create compute cluster
1
Define training job
2
Submit job to compute cluster
3
Azure ML Notebooks
1_training.ipynb
Azure ML Workspace
TACO-annotations
TACO-training
26
Compute Cluster Creation Tips & Tricks
Save Money
Shut down unused instances
120 seconds to auto shutdown
Pick auto-evictable machine
(own case: 80% cheaper)
4 experiments
can run in parallel
Tesla T4 GPU w/ 16 GB memory,
56 GB RAM, 8 vCPUs,
but many options available!
Pick a Fitting Compute
27
Increasing Efficiency for
Automated ML on Azure
• Set ML parameters based on your data science knowledge
• Train/test/validation split, cross-validation, etc.
• Hyperparameter selection strategy
• Restrict hyperparameter search space
• Set job limits
• Max no. of trials
• Max runtime per trial or of all trials
• Termination based on score
28
3 Hours
Later
29
Our annotations, linked to the job
MLFlow model!
30
Models ordered by performance
Azure AutoML experimented with
a single model type
31
Training Results
Movie
32
How to Dig Deeper
• Metrics are comprehensive and look
great – but what are we looking at?
• More details in the logs:
• Tip: Read Azure ML documentation!
• You still need data science knowledge to
understand what Azure ML is doing here.
Section of std_log.txt file in Outputs + logs tab
33
We have
a model!
34
BUILD
M
E
A
-
S
U
R
E
LEARN
1 Get Data
2
Train
Model
3 Build App
4
Deploy App
with Model
5
Collect
Feedback
Our Plan
35
The Power of ONNX
for Model Packaging
• Great: Azure ML packaged model in MLFlow format
• The Issue: Tight MLFlow model dependencies restrict platforms where
model can be used
• 204 (!) pinned dependencies, incl. 31 Azure-specific packages
• Experienced issues installing some (azureml-dataprep-native)
on macOS (M1)
• The Solution: Use ONNX model file (byproduct of Azure AutoML training)
and use it with a single dependency: onnxruntime
• ONNX (Open Neural Network Exchange): Open standard for deep
learning models, makes models work across frameworks
• ONNX Runtime: Cross-platform, open source ML model accelerator
You can now use your AutoML-trained model outside of Azure!
Flexible model
deployment
ENABLERS
36
BUILD
M
E
A
-
S
U
R
E
LEARN
1 Get Data
2
Train
Model
3 Build App
4
Deploy App
with Model
5
Collect
Feedback
Our Plan
37
Building an App with Streamlit
• Streamlit is an open-source app framework for creating
data-based web apps in Python
• My experience with Streamlit:
• The Good: Very easy and fast to code and use, apps
look great and work – Wow!
• The Good-to-Know: Complex workflows with state
management harder to program, may be perceived as
slow by users in comparison to “professional” web apps
• Streamlit is perfect for getting a user-facing app off the
ground and testing your data-based product ideas!
Streamlit logo, see streamlit.io
38
Streamlit App Example
39
Streamlit App Blueprint
Trash Recognizer
Upload image(s)
Detected Trash
- 2 items for yellow trash can
- 1 item for blue trash can
No trash detected.
Detected Trash
- 1 item for other trash can
[Model + data description] Load ONNX model
1
Preprocess images
2
Run model inference
3
Postprocess images
4
What the user sees What the app does
Display results
5
40
Easy-to-use
interface
ENABLERS
Movie
41
BUILD
M
E
A
-
S
U
R
E
LEARN
1 Get Data
2
Train
Model
3 Build App
4
Deploy App
with Model
5
Collect
Feedback
Our Plan
42
Thanks,
Data Science
Engineering Team!
43
Deployment Pipeline
• Henkel Data Science Engineering team
developed pipeline for one-click
deployment of data science infrastructure,
incl. Streamlit apps, on secure Azure cloud
infrastructure
• Open sourced via article series
“Kickstarting Data Science Projects in
Azure DevOps” by Roberto Alonso
• Part 1 and 2 already available on
Henkel Data & Analytics Blog medium.com/henkel-data-and-analytics
medium.com/henkel-data-and-analytics
44
45
BUILD
M
E
A
-
S
U
R
E
LEARN
1 Get Data
2
Train
Model
3 Build App
4
Deploy App
with Model
5
Collect
Feedback
Our Plan
46
Collecting Feedback
streamlit-feedback
Azure
Application Insights
Python logging
Azure Dashboards
Open-source feedback
plugin for Streamlit
Use AzureLogHandler through
opencensus logging extension
Collect logs from
application, query with
Kusto language
Interactive live dashboards on
Azure for application metrics
47
Movie
48
Easy and fast
measurement
ENABLERS
49
bottle
BUILD
M
E
A
-
S
U
R
E
LEARN
Reproducible
environment
Collaboration-
ready
Modeling
time saver
Flexible model
deployment
Easy and fast
measurement
Easy-to-use
interface
Learning
culture
50
Code, Slides, Details
PyData team + sponsors, Henkel, incl. Henkel Data Science CoE team,
Open-source contributors for TACO, ONNX, ONNX Runtime, Streamlit,
streamlit-feedback, Streamlit for reaching out before talk
Learn More
• Build-Measure-Learn Loop: The Lean Startup | Methodology
• Data Flywheel: Data Flywheel: Scaling a world-class data strategy
• Dataset: Tacodataset.org
• Automated Machine Learning on Azure: What is automated ML?
• ONNX: ONNX Runtime, ONNX File Format
• Streamlit: Get started with Streamlit, streamlit-feedback
• Azure Tricks for Data Science: Henkel Data & Analytics Blog
• Logging to Azure from Python: Monitor Python applications
• Azure Dashboards: Dashboards of Azure Log Analytics data
• A similar project: Instance Segmentation with Azure Machine Learning github.com/flrs/build_and_test_ml_quickly
Thanks
Photo credits, in order of appearance: Greg Rakozy, Canva Studio, Sewupari Studio, Massimo Botturi, Charlotte Coneybeer, Desola Landre Ologun, Studio Saiz, Claudio
Schwarz, NASA, Alena Darmel, Anna Shvetz, Vadim B, The Lucky Neko, Visual Tag Mx; User icon from “Redefining Women” icon collection by Iconathon
51
What are
your questions?
Sr. Data Scientist
linkedin.com/in/florianroscheck
github.com/flrs
Florian Roscheck
Let’s connect!

Weitere ähnliche Inhalte

Ähnlich wie From idea to production in a day – Leveraging Azure ML and Streamlit to build and user test machine learning ideas quickly

2020 10 22 AI Fundamentals - Azure Machine Learning
2020 10 22 AI Fundamentals - Azure Machine Learning2020 10 22 AI Fundamentals - Azure Machine Learning
2020 10 22 AI Fundamentals - Azure Machine LearningBruno Capuano
 
innovations born in the cloud - cloud data services from IBM to prototype you...
innovations born in the cloud - cloud data services from IBM to prototype you...innovations born in the cloud - cloud data services from IBM to prototype you...
innovations born in the cloud - cloud data services from IBM to prototype you...Wilfried Hoge
 
IBM Bluemix OpenWhisk: Interconnect 2016, Las Vegas: CCD-1088: The Future of ...
IBM Bluemix OpenWhisk: Interconnect 2016, Las Vegas: CCD-1088: The Future of ...IBM Bluemix OpenWhisk: Interconnect 2016, Las Vegas: CCD-1088: The Future of ...
IBM Bluemix OpenWhisk: Interconnect 2016, Las Vegas: CCD-1088: The Future of ...OpenWhisk
 
Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramFIWARE
 
SplunkLive London 2014 Developer Presentation
SplunkLive London 2014  Developer PresentationSplunkLive London 2014  Developer Presentation
SplunkLive London 2014 Developer PresentationDamien Dallimore
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsDatabricks
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltreMarco Parenzan
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureFei Chen
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Databricks
 
Legion - AI Runtime Platform
Legion -  AI Runtime PlatformLegion -  AI Runtime Platform
Legion - AI Runtime PlatformAlexey Kharlamov
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Luciano Resende
 
Bodywork - GitOps for Machine Learning
Bodywork - GitOps for Machine LearningBodywork - GitOps for Machine Learning
Bodywork - GitOps for Machine LearningAlex Ioannides
 
Global AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksAlberto Diaz Martin
 
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Akash Tandon
 
UI5con 2018 - Keynote
UI5con 2018 - KeynoteUI5con 2018 - Keynote
UI5con 2018 - KeynotePeter Muessig
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Mule any pointstudio
Mule any pointstudioMule any pointstudio
Mule any pointstudiohimajareddys
 

Ähnlich wie From idea to production in a day – Leveraging Azure ML and Streamlit to build and user test machine learning ideas quickly (20)

2020 10 22 AI Fundamentals - Azure Machine Learning
2020 10 22 AI Fundamentals - Azure Machine Learning2020 10 22 AI Fundamentals - Azure Machine Learning
2020 10 22 AI Fundamentals - Azure Machine Learning
 
innovations born in the cloud - cloud data services from IBM to prototype you...
innovations born in the cloud - cloud data services from IBM to prototype you...innovations born in the cloud - cloud data services from IBM to prototype you...
innovations born in the cloud - cloud data services from IBM to prototype you...
 
IBM Bluemix OpenWhisk: Interconnect 2016, Las Vegas: CCD-1088: The Future of ...
IBM Bluemix OpenWhisk: Interconnect 2016, Las Vegas: CCD-1088: The Future of ...IBM Bluemix OpenWhisk: Interconnect 2016, Las Vegas: CCD-1088: The Future of ...
IBM Bluemix OpenWhisk: Interconnect 2016, Las Vegas: CCD-1088: The Future of ...
 
IBM Bluemix Openwhisk
IBM Bluemix OpenwhiskIBM Bluemix Openwhisk
IBM Bluemix Openwhisk
 
Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers Program
 
SplunkLive London 2014 Developer Presentation
SplunkLive London 2014  Developer PresentationSplunkLive London 2014  Developer Presentation
SplunkLive London 2014 Developer Presentation
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest Airports
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
 
Legion - AI Runtime Platform
Legion -  AI Runtime PlatformLegion -  AI Runtime Platform
Legion - AI Runtime Platform
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
Bodywork - GitOps for Machine Learning
Bodywork - GitOps for Machine LearningBodywork - GitOps for Machine Learning
Bodywork - GitOps for Machine Learning
 
Global AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure Databricks
 
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
UI5con 2018 - Keynote
UI5con 2018 - KeynoteUI5con 2018 - Keynote
UI5con 2018 - Keynote
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Mule any pointstudio
Mule any pointstudioMule any pointstudio
Mule any pointstudio
 
Mule any pointstudio
Mule any pointstudioMule any pointstudio
Mule any pointstudio
 

Kürzlich hochgeladen

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 

Kürzlich hochgeladen (20)

Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 

From idea to production in a day – Leveraging Azure ML and Streamlit to build and user test machine learning ideas quickly

  • 1. 1 From idea to production in a day Leveraging Azure ML and Streamlit to build and user test machine learning ideas quickly Florian Roscheck PyCon DE & PyData Berlin 2024
  • 2. 2
  • 3. 3
  • 4. 4 How do we use it to build + test quickly? What is our tech stack? What are we building?
  • 5. 5 Hi, I’m Florian! Sr. Data Scientist Florian Roscheck • Sr. Data Scientist at Henkel • Instructor for Apache Spark with 7k+ students • Vice President NumFOCUS Affiliated Project Selection Committee • Active on LinkedIn
  • 6. 6 WHAT TO BUILD IN ONE DAY? A Minimum Viable Product • Enough features to be usable • Ability to collect user feedback
  • 7. 7 WHAT TO BUILD IN ONE DAY? A Minimum Viable Product BUILD M E A - S U R E LEARN To learn about users quickly, we want to implement build-measure-learn loop To make users happier over time, we aim to create data flywheel
  • 9. 9 Ready? Data not in place Environment issues Lost in modeling Inappropriate user interface No feedback about use Difficult collaboration
  • 10. 10 BUILD M E A - S U R E LEARN Data not in place Environment issues Lost in modeling Inappropriate user interface No feedback about use Difficult collaboration GET DATA BEFOREHAND
  • 11. 11 HOW TO BUILD IN ONE DAY? A Time-Saving Stack Environment issues Lost in modeling Inappropriate user interface No feedback about use Difficult collaboration Azure ML Notebooks Automated ML on Azure Streamlit Azure Application Insights + Streamlit
  • 13. 13 Example: Trash Recognizer App bottle • Customer: Waste management company • Need: Want to evaluate computer vision solutions for recognizing trash • Idea: Waste management professionals manually evaluate performance through app with feedback functionality
  • 14. 14 BUILD M E A - S U R E LEARN 1 Get Data 2 Train Model 3 Build App 4 Deploy App with Model 5 Collect Feedback Our Plan
  • 15. 15 BUILD M E A - S U R E LEARN 1 Get Data 2 Train Model 3 Build App 4 Deploy App with Model 5 Collect Feedback Our Plan
  • 16. 16 Training Data: TACO Trash Image Dataset • TACO: Trash Annotations in Context • Dataset of 1.5k images with 4.7k+ annotations • Annotations for 60 categories, incl. backgrounds • Open source Proença, P . F., & Simões, P . (2020). TACO: Trash Annotations in Context for Litter Detection. arXiv Preprint arXiv:2003.06975. tacodataset.org Source: tacodataset.org
  • 18. 18 What is Azure ML? • Cloud-based ML platform by Microsoft • Run ad-hoc analyses with Jupyter Notebooks • Run and track machine learning experiments through tight integration with MLFlow • Version data and models • Build complex and reproducible modeling pipelines • Deploy models as API Screenshot of Azure ML web app
  • 19. 19 Basics of Getting Data Data Asset TACO-annotations • Azure ML-managed • Like a mask for files • Sharable • Version controlled • Interactively explorable TACO GitHub repo ! Azure ML Workspace Azure blob storage Azure ML Notebook 0_prepare_dataset.ipynb • Like Jupyter Notebook • Managed environment • Sharable • Runs on compute in workspace Reproducible environment Collaboration- ready ENABLERS
  • 21. 21 BUILD M E A - S U R E LEARN 1 Get Data 2 Train Model 3 Build App 4 Deploy App with Model 5 Collect Feedback Our Plan
  • 23. 23 !
  • 24. 24 Automated Machine Learning on Azure ML • Automated ML: Try different models and hyperparameters that are automatically selected • We have very little time for modeling! • Depending on data and problem type, automated machine learning can provide a reasonable starting point for modeling with a high return on time investment • Azure ML offers automated ML pipelines for several common tasks, incl. classification, regression, forecasting, NLP , and computer vision Modeling time saver ENABLERS
  • 25. 25 Setting Up AutoML Through Code Create compute cluster 1 Define training job 2 Submit job to compute cluster 3 Azure ML Notebooks 1_training.ipynb Azure ML Workspace TACO-annotations TACO-training
  • 26. 26 Compute Cluster Creation Tips & Tricks Save Money Shut down unused instances 120 seconds to auto shutdown Pick auto-evictable machine (own case: 80% cheaper) 4 experiments can run in parallel Tesla T4 GPU w/ 16 GB memory, 56 GB RAM, 8 vCPUs, but many options available! Pick a Fitting Compute
  • 27. 27 Increasing Efficiency for Automated ML on Azure • Set ML parameters based on your data science knowledge • Train/test/validation split, cross-validation, etc. • Hyperparameter selection strategy • Restrict hyperparameter search space • Set job limits • Max no. of trials • Max runtime per trial or of all trials • Termination based on score
  • 29. 29 Our annotations, linked to the job MLFlow model!
  • 30. 30 Models ordered by performance Azure AutoML experimented with a single model type
  • 32. 32 How to Dig Deeper • Metrics are comprehensive and look great – but what are we looking at? • More details in the logs: • Tip: Read Azure ML documentation! • You still need data science knowledge to understand what Azure ML is doing here. Section of std_log.txt file in Outputs + logs tab
  • 34. 34 BUILD M E A - S U R E LEARN 1 Get Data 2 Train Model 3 Build App 4 Deploy App with Model 5 Collect Feedback Our Plan
  • 35. 35 The Power of ONNX for Model Packaging • Great: Azure ML packaged model in MLFlow format • The Issue: Tight MLFlow model dependencies restrict platforms where model can be used • 204 (!) pinned dependencies, incl. 31 Azure-specific packages • Experienced issues installing some (azureml-dataprep-native) on macOS (M1) • The Solution: Use ONNX model file (byproduct of Azure AutoML training) and use it with a single dependency: onnxruntime • ONNX (Open Neural Network Exchange): Open standard for deep learning models, makes models work across frameworks • ONNX Runtime: Cross-platform, open source ML model accelerator You can now use your AutoML-trained model outside of Azure! Flexible model deployment ENABLERS
  • 36. 36 BUILD M E A - S U R E LEARN 1 Get Data 2 Train Model 3 Build App 4 Deploy App with Model 5 Collect Feedback Our Plan
  • 37. 37 Building an App with Streamlit • Streamlit is an open-source app framework for creating data-based web apps in Python • My experience with Streamlit: • The Good: Very easy and fast to code and use, apps look great and work – Wow! • The Good-to-Know: Complex workflows with state management harder to program, may be perceived as slow by users in comparison to “professional” web apps • Streamlit is perfect for getting a user-facing app off the ground and testing your data-based product ideas! Streamlit logo, see streamlit.io
  • 39. 39 Streamlit App Blueprint Trash Recognizer Upload image(s) Detected Trash - 2 items for yellow trash can - 1 item for blue trash can No trash detected. Detected Trash - 1 item for other trash can [Model + data description] Load ONNX model 1 Preprocess images 2 Run model inference 3 Postprocess images 4 What the user sees What the app does Display results 5
  • 41. 41 BUILD M E A - S U R E LEARN 1 Get Data 2 Train Model 3 Build App 4 Deploy App with Model 5 Collect Feedback Our Plan
  • 43. 43 Deployment Pipeline • Henkel Data Science Engineering team developed pipeline for one-click deployment of data science infrastructure, incl. Streamlit apps, on secure Azure cloud infrastructure • Open sourced via article series “Kickstarting Data Science Projects in Azure DevOps” by Roberto Alonso • Part 1 and 2 already available on Henkel Data & Analytics Blog medium.com/henkel-data-and-analytics medium.com/henkel-data-and-analytics
  • 44. 44
  • 45. 45 BUILD M E A - S U R E LEARN 1 Get Data 2 Train Model 3 Build App 4 Deploy App with Model 5 Collect Feedback Our Plan
  • 46. 46 Collecting Feedback streamlit-feedback Azure Application Insights Python logging Azure Dashboards Open-source feedback plugin for Streamlit Use AzureLogHandler through opencensus logging extension Collect logs from application, query with Kusto language Interactive live dashboards on Azure for application metrics
  • 50. 50 Code, Slides, Details PyData team + sponsors, Henkel, incl. Henkel Data Science CoE team, Open-source contributors for TACO, ONNX, ONNX Runtime, Streamlit, streamlit-feedback, Streamlit for reaching out before talk Learn More • Build-Measure-Learn Loop: The Lean Startup | Methodology • Data Flywheel: Data Flywheel: Scaling a world-class data strategy • Dataset: Tacodataset.org • Automated Machine Learning on Azure: What is automated ML? • ONNX: ONNX Runtime, ONNX File Format • Streamlit: Get started with Streamlit, streamlit-feedback • Azure Tricks for Data Science: Henkel Data & Analytics Blog • Logging to Azure from Python: Monitor Python applications • Azure Dashboards: Dashboards of Azure Log Analytics data • A similar project: Instance Segmentation with Azure Machine Learning github.com/flrs/build_and_test_ml_quickly Thanks Photo credits, in order of appearance: Greg Rakozy, Canva Studio, Sewupari Studio, Massimo Botturi, Charlotte Coneybeer, Desola Landre Ologun, Studio Saiz, Claudio Schwarz, NASA, Alena Darmel, Anna Shvetz, Vadim B, The Lucky Neko, Visual Tag Mx; User icon from “Redefining Women” icon collection by Iconathon
  • 51. 51 What are your questions? Sr. Data Scientist linkedin.com/in/florianroscheck github.com/flrs Florian Roscheck Let’s connect!