Weitere ähnliche Inhalte
Ähnlich wie CWIN17 Frankfurt / Cloudera
Ähnlich wie CWIN17 Frankfurt / Cloudera (20)
Kürzlich hochgeladen (14)
CWIN17 Frankfurt / Cloudera
- 1. 1© Cloudera, Inc. All rights reserved.
Connected Services
Stefan Lipp/Jochen Faltermeier
CWIN 2017 - Frankfurt
- 2. 2© Cloudera, Inc. All rights reserved. 2© Cloudera, Inc. All rights reserved.
Cloudera at-a-glance
Customersuccess
Large enterprises fueling growth
48% 140%+
customergrowth netexpansion
Last 4 years Global
8000 customers
Expansion driven
by data and new
use cases
Openpartnernetwork
Best of breed solutions
3000+
partners
Vast ecosystem
of solution &
service providers
Firsttomarket
Open source innovation
2008
founded
1600+
Clouderans
Global team doing
business in
28 countries
Big data innovators
from Google,
Yahoo and Oracle
- 3. 3© Cloudera, Inc. All rights reserved.
The data-driven enterprise
Explosion of data and devices (IoT)
30B
connected
devices
440x
more
data
Transformation of IT infrastructure
open
source
cloud
machine
learning
$200B
total
market1
1 IDC Worldwide Big Data and Business Analytics Market Through 2020
- 4. 4© Cloudera, Inc. All rights reserved.
We believe
data can make what is impossible
today, possible tomorrow
- 5. 5© Cloudera, Inc. All rights reserved.
We empower
people to transform complex data
into clear and actionable insights
DRIVE
CUSTOMER INSIGHTS
CONNECT
PRODUCTS & SERVICES (IoT)
PROTECT
BUSINESS
- 6. 6© Cloudera, Inc. All rights reserved.
We deliver
the modern platform for machine learning and analytics
optimized for the cloud
RUNS ANYWHERE
Cloud
Multi-cloud
On-premises
SCALABLE
Elastic
Cost-effective
Lower TCO
ENTERPRISE GRADE
Secure
Performant
Compliant
- 7. 7© Cloudera, Inc. All rights reserved.
DRIVE CUSTOMER INSIGHTS CONNECT PRODUCTS & SERVICES (IoT) PROTECT BUSINESS
Delivering greater value through
improved customer understanding
Powering predictive analytics to increase
performance and reduce fleet downtime
Creating new revenue streams with an advanced
anti-fraud solution
Cloudera powering data-driven customers
- 8. 8© Cloudera, Inc. All rights reserved.
Introduction
Navistar is a leading manufacturer of commercial trucks, buses, defense vehicles and
engines. Since 1831, our history has been interwoven with some of the most defining
moments in world history. Whether it was America's westward expansion or WWII, we
were there, pushing the limits of what's possible and driving history forward. But that
doesn't mean we're stuck in the past. We're determined to keep delivering smart, sustainable
technologies - because we believe that innovation defines America's future, too.
- 9. 9© Cloudera, Inc. All rights reserved.
The Data Challenge & Pre-Hadoop Challenge
In late 2013, Navistar launched OnCommand™ Connection. OnCommand™ Connection is
part of the OnCommand™ family of fleet Management Services from Navistar.
OnCommand™ Connection leverages data feeds from telematics service providers and
marries it with Meteorological, Geographical, Engineering, Vehicle Usage, Traffic,
Historical Warranty, Service and Part Inventory Data to provide:
Real-time vehicle performance data streamlined within a single portal.
Service Advisory’s and Scheduling before problems occur
Optimized service plans and part delivery to the nearest dealer when problems do occur
We now actively monitor more than 300,000 vehicles and are adding to that total daily
- 10. 10© Cloudera, Inc. All rights reserved.
Using Predictive Maintenance to Improve
Performance and Reduce Fleet Downtime
• OnCommand Connection is collecting
telematics and geolocation data across
the fleet
• Reduced maintenance costs to $.03 per
mile from $.12-$.15 per mile
• Centralizing data from 13 systems with
varying frequency and semantic
definitions
• Real-time visibility of ca. 300,000 trucks
in order to improve uptime and vehicle
performance
MANUFACTURING
» SERVICE IMPROVEMENT
» PREDICTIVE ANALYTICS
» PROCESS IMPROVEMENT
- 11. 11© Cloudera, Inc. All rights reserved.
Benefits & Impact
Quantifying Hadoop’s impact:
By having literally all of our data in one place, we can perform analytics on an ad-hoc
basis. Historically, simple questions required months to answer as we built out subject
areas and transformed data.
Our “Publish” Cluster brings the data to the consumer and it is certified.
We have reduced hard dollar spending on proprietary hardware and expensive disk
solutions, but also soft dollars in our speed to deliver answers.
We can evaluate “what if” scenarios without the risk of impacting production processes.
We can evaluate billions of rows of data and deliver answers in hours not weeks.
- 12. 12© Cloudera, Inc. All rights reserved.
Data/Software >Analytics >Automation >AI is eating the world
„the innovation foodchain“ MarcAndreessen
Navistar IR Deck – H1 2017
− Connected services to reduce
maintenance cost and improve
vehicle uptime
− Advanced driver assistance
systems and platooning to
improve fuel efficiency
and safety
− Automated record-keeping to
enhance driver productivity
- 13. 13© Cloudera, Inc. All rights reserved.
#1 Telematics provider with 130 billion miles
of driving data collected from black boxes in
connected cars
Challenge:
• Drive analytics on 12 million miles of
driving data collected every hour
Solution:
• Telematics solution based on Cloudera to
process data from black boxes
• Analytics around driving behavior, risks,
location, braking patterns, contextual
elements and crash information
• Provide Usage Based Insurance services
TELEMATICS
» CONNECTED VEHICLES
» INSURANCE TELEMATICS
» PREDICTIVE ANALYTICS
Connected Car Telematics for Insurance
CASE STUDY
DATA-DRIVEN
PROCESS
IOT &
Connected
Products
- 15. 15© Cloudera, Inc. All rights reserved.
The IoT Ecosystem &Architecture
IoT Gateway
Gateway
• Edge-Processing
• Edge-Analytics
IoT Data Storage, Processing & Analytics
Centralized IoT Analytics
• Time Series Data, Trends
• Machine Learning
• Context Enrichment
• Deeper business insights
Distributed Data
Processing & Analytics
• Cloud & On-PremiseConnected Things
• Analytics at the edge
• For immediate response
Data Center
Cloud
IoT Analytics
Enterprise Data Sources
Combining sensor data with contextual data is the
key to value creation from IoT
- 17. 17© Cloudera, Inc. All rights reserved.
The Cloudera Platform for IoT – Data Mgmt. Value Chain
Data Sources Data Ingest Data Storage & Processing
Serving, Analytics &
Machine Learning
ENTERPRISE DATA HUB
Apache Kafka
Stream or batch ingestion of IoT data
Apache Sqoop
Ingestion of data from relational sources
Apache Hadoop
Storage (HDFS) & deep batch processing
Apache Kudu
Storage & serving for fast changing data
Apache HBase
NoSQL data store for real time
applications
Apache Impala
MPP SQL for fast analytics
Cloudera Search
Real time searchConnected Things/ Data
Sources
Structured Data Sources Security, Scalability & Easy Management
Deployment Flexibility:
Datacenter Cloud
Apache Spark
Stream & iterative processing, ML
- 18. 18© Cloudera, Inc. All rights reserved.
Cloudera for IoT – Key Innovations / Differentiators
Ideal for real-time analytics on IoT
and time series data. Simplifies
Lambda architectures for running
real-time analytics on streaming data
Preserve business flexibility and data
portability and minimize cloud lock-in
by running in any one of the three
major public cloud providers or in
private cloud
Kudu: Real-Time Analytics Shared Data Experience SDX Data Science Workbench
Collaborative hub for enterprise
data science and an integrated
development environment for
running Python, R, & Scala with
support for Spark
- 19. 19© Cloudera, Inc. All rights reserved.
HDFS
Fast Scans, Analytics
and Processing of
Stored Data
Fast On-Line
Updates &
Data Serving
Arbitrary Storage
(Active Archive)
Fast Analytics
(on fast-changing or
frequently-updated data)
Kudu – Fast Analytics on Fast Data
RealTimeUsecasesthatfallbetweenHDFSandHBaseweredifficulttomanage
Unchanging
Fast Changing
Frequent Updates
HBase
Append-Only
Real-Time
Complex Hybrid
Architectures
Analytic
Gap
Pace of Analysis
Pace of Data
- 20. 20© Cloudera, Inc. All rights reserved.
S3 | ADLS | HDFS | KUDU
Cloudera Enterprise
20CONFIDENTIAL—RESTRICTED
The modern platform for machine learning and analytics optimized for the cloud
EXTENSIBLE
SERVICES
CORE
SERVICES DATA
ENGINEERING
OPERATIONAL
DATABASE
ANALYTIC
DATABASE
DATA CATALOG
INGEST &
REPLICATION
SECURITY GOVERNANCE
WORKLOAD
MANAGEMENT
DATA
SCIENCE
SHARED DATA
EXPERIENCE
SHARED STORAGE
- 21. 21© Cloudera, Inc. All rights reserved.
• Unified security – protects sensitive data with consistent controls,
even for transient and recurring workloads
• Consistent governance – enables secure self-service access to all
relevant data and increases compliance
• Easy workload management – increases user productivity and boosts
job predictability
• Flexible ingest and replication – aggregates a single copy of all data,
provides disaster recovery, and eases migration
• Shared catalog – defines and preserves structure and business
context of data for new applications and partner solutions
Open platform services
Built for multi-function analytics | Optimized for cloud
SHARED
DATA
EXPERIENCE
- 22. 22© Cloudera, Inc. All rights reserved.
Shared: Data, Operations, Governance, Security, Metadata
Data Engineering Data Science Deployment
Data Wrangling
Visualization and
Analysis
Model Training
& Testing Batch Scoring
Online Scoring
Serving
Data GovernanceCuration
Processing
Acquisition
Reports, Dashboards
Dev: Collaboration, Version Control Ops: Deployment, Scheduling, Orchestration
Support the complete data science workflow
From data to exploration to action
- 23. 23© Cloudera, Inc. All rights reserved.
Accelerates data science from
development to production with:
● Secure self-service data access
● On-demand compute
● Support for Python, R, and Scala
● Project dependency isolation for
multiple library versions
● Workflow automation, version
control, collaboration and sharing
Cloudera Data Science Workbench
Self-service data science for the enterprise
- 24. 24© Cloudera, Inc. All rights reserved.
Amodern data science architecture
CDH CDH
Cloudera Manager
gateway nodes CDH nodes
● Built on Docker and Kubernetes
● Runs on dedicated gateway nodes
● User sessions run in isolated “engine”
containers which:
○ Host Kerberos-authenticated
Python/R/Scala runtimes
○ Interact with Spark via YARN
client mode (Driver runs in
container, workers on CDH)
● Single-cluster only (for now)
Hive, HDFS, ...
CDSW CDSW
...
Master
...
Engine
EngineEngine
EngineEngine
- 25. 25© Cloudera, Inc. All rights reserved.
“Our data scientists want GPUs, but we
can’t find a way to deliver multi-tenancy.
If they go to the cloud on their own, it’s
expensive and we lose governance.”
●Extend existing CDSW benefits to GPU-
optimized deep learning tools
●Schedule & share GPU resources
●Train on GPUs, deploy on CPUs
●Works on-premises or cloud
Accelerated deep learning on-demand with GPUs
Data Science Workbench
GPUCPU
CDH
CPU
CDH
CPU
single-node training
distributed
training, scoring
Multi-tenant GPU support on-premises or cloud
- 26. 26© Cloudera, Inc. All rights reserved.
Open Ecosystem Black Box
An open ecosystem for agility and innovation
- 27. 27© Cloudera, Inc. All rights reserved.
Run anywhere. Deploy any way.
Simple Unified Enterprise
Proven at scale
Trusted security
Hybrid or multi cloud
Platform-as-a-Service
Simplifies operations
Works with your tools
- 28. 28© Cloudera, Inc. All rights reserved.
RealtimeAnalytics bzw. OperationalAnalytics?
my definition
„apply logic and mathematics real-time on data to improve operations“
Model Analyze Repeat
# Aggregate relational, NoSQL, structured & unstructured data
# Accelerate data science from exploration to production using R, Python, Spark and more
# Deploy pipelines and models on-premise or in the cloud.
Seeking Abnormal Behavior
# Serve real-time data at scale for real-time decision making
# Stream processing & analytics on changing operational data
„
- 29. 29© Cloudera, Inc. All rights reserved.
Lohnt sich das überhaupt?
HW > Data/Software > Analytics > Automation > AI/ML Technology Foodchain aus „Digital or Dead“