SlideShare ist ein Scribd-Unternehmen logo
1 von 33
1© Cloudera, Inc. All rights reserved.
Simplifying Real-Time
Architectures for IoT using
Apache Kudu
Vijay Raja| Solutions Marketing Lead, IoT
Ryan Lippert | Product Marketing, Operational DB
2© Cloudera, Inc. All rights reserved.
IoT – Key Drivers & Objectives
Drive Internal
Efficiencies
Improve Product
& Customer Exp.
New Services &
Business Models
• Predictive Maintenance
• Real-time monitoring
• Ops optimization
• Reduced equipment
down-times
• Product Usage Analytics
• Personalized products &
offerings
• Improved Product
Development
• New usage based
business models
• New service offerings
• E.g. On Command Connect
• Remote Monitoring
Who are my customers?
How are they using my products?
How can I lower downtime?
How can I drive efficiencies?
How do we implement a usage-based
model?
How can I launch new revenue streams?
3© Cloudera, Inc. All rights reserved.
2 PB of data/car/ year 1 – 2 TB of data / day 1 – 5 TB of data / day
4© Cloudera, Inc. All rights reserved.
IoT Data Characteristics
- The Foundation of Hadoop’s Potential
IoT data comes from a variety of different sources
• Massive volumes of intermittent data streams
• Generated from a variety of data sources
• Predominantly time-series
• Can come in streams (real-time) or batches
• Diverse data structures and schemas
• Some of it may be perishable
Combining sensor data with contextual data is the key to
value creation from IoT
5© Cloudera, Inc. All rights reserved.
Polling Question - 1
Where is your organization in your IoT journey?
A. Not sure where to start
B. Currently exploring use cases
C. Implementing our first IoT use case
D. Already deployed first IoT use case
E. Multiple IoT use cases in production
(Single Choice)
6© Cloudera, Inc. All rights reserved.
The IoT Ecosystem & Architecture
IoT Gateway
Data Center
Gateway
• Data Routing
• Edge-Processing
• Edge-Storage
IoT Data Storage, Processing & Analytics
Centralized IoT Data Analytics
• Time Series Data, Trends
• Machine Learning
• Context Enrichment
• Deeper business insights
Distributed Data
Processing & Analytics
• Cloud & On-Premise
Cloud
Sensors/ Things
• Analytics at the edge
• For Immediate
response
IoT Analytics
Enterprise Data Sources
7© Cloudera, Inc. All rights reserved.
What Happens at the Edge & What happens in the Cloud?
• Analytics that needs to be acted upon
immediately
• Low latency req. - Hazard detection,
collision avoidance etc.
• Human response times
• Context Enrichment
• Time series Analysis
• Comparative / Trend analysis
• Machine Learning
Cloud
Analytics
Edge
Analytics
Cloud
Analytics
8© Cloudera, Inc. All rights reserved.
Cloudera Enterprise – Hadoop as a Data Platform for IoT
Sensors/ IoT
Data Sources
Internal Systems External Sources
BI Solutions Real-Time AppsSearch Data Science
Workbench
SQL
Machine
Learning
Data Center
Cloud
Sensor/ IoT Data
IoT Gateway
• Data Storage
• Data Processing
• Machine Learning
• Real-time Analytics
OPERATIONS
Cloudera Manager
Cloudera Director
DATA
MANAGEMENT
Cloudera Navigator
Encrypt and KeyTrustee
Optimizer
BATCH
Sqoop
REAL-TIME
Kafka, Flume
PROCESS, ANALYZE, SERVE
UNIFIED SERVICES
RESOURCE MANAGEMENT
YARN
SECURITY
Sentry, RecordService
FILESYSTEM
HDFS
RELATIONAL
Kudu
NoSQL
HBase
STORE
INTEGRATE
BATCH
Spark, Hive, Pig
MapReduce
STREAM
Spark
SQL
Impala
SEARCH
Solr
SDK
Partners
9© Cloudera, Inc. All rights reserved.
IoT: Lots of Buzz, but what is the core concept?
And critically, what do we need from our infrastructure?
IoT promises prediction
and optimization, but
often delivers
monitoring.
The right solution allows you to
analyze data and serve
information in time to change
business outcomes.
That means the right solution is
built on real-time analytics.
10© Cloudera, Inc. All rights reserved.
IoT: Driven by Data
11© Cloudera, Inc. All rights reserved.
Polling Question - 2
What area of the real-time data chain does your organization need the
most help with?
A. Data ingest
B. Data processing
C. Data serving
D. All of the above
(Single Choice)
12© Cloudera, Inc. All rights reserved.
HDFS
Fast Scans, Analytics
and Processing of
Stored Data
Fast On-Line
Updates &
Data Serving
Arbitrary Storage
(Active Archive)
Fast Analytics
(on fast-changing or
frequently-updated data)
Traditional Hadoop Databases Leave a Gap
Use cases that fall between HDFS and HBase were difficult to manage
Unchanging
Fast Changing
Frequent Updates
HBase
Append-Only
Real-Time
Complex Hybrid
Architectures
Analytic
Gap
Pace of Analysis
PaceofData
13© Cloudera, Inc. All rights reserved.
The Trouble with Lambda
Batch Layer
Serving Layer
Speed Layer
New Data
Data Lake
(HDFS)
Precompute
Views
Stream or
Micro Batch
Increment
Views
Data
Application
“Real-time” Increment
Batch Recompute
Merge
Hadoop
Storm/Spark
HBase
Impala
Code must be kept in sync
Restatement is difficult
14© Cloudera, Inc. All rights reserved.
Updateable Analytic Storage
Simple real-time analytics and updates with Apache Kudu
Kudu: Storage for fast analytics on fast data
• Simplified architecture for building real-time analytic
applications
• Designed for next-generation hardware for faster analytic
performance across frameworks
• Native Hadoop storage engine
Flexibility for the right tools for the right use
case in one platform
• Only analytic database for Hadoop with Kudu + Impala
• Simple real-time applications with Kudu + Spark
Use cases
• Time series data
• Machine data analytics
• Online reporting
STRUCTURED
Sqoop
UNSTRUCTURED
Kafka, Flume
PROCESS, ANALYZE, SERVE
UNIFIED SERVICES
RESOURCE MANAGEMENT
YARN
SECURITY
Sentry, RecordService
STORE
INTEGRATE
BATCH
Spark, Hive, Pig
MapReduce
STREAM
Spark
SQL
Impala
SEARCH
Solr
OTHER
Kite
NoSQL
HBase
FILESYSTEM
HDFS
RELATIONAL
Kudu
OBJECT
Cloud
15© Cloudera, Inc. All rights reserved.
HDFS
Fast Scans, Analytics
and Processing of
Stored Data
Fast On-Line
Updates &
Data Serving
Arbitrary Storage
(Active Archive)
Fast Analytics
(on fast-changing or
frequently-updated data)
Kudu: Fast Analytics on Fast-Changing Data
New storage engine enables new Hadoop use cases
Unchanging
Fast Changing
Frequent Updates
HBase
Append-Only
Real-Time
Kudu Kudu fills the Gap
Modern analytic
applications often
require complex data
flow & difficult
integration work to
move data between
HBase & HDFS
Analytic
Gap
Pace of Analysis
PaceofData
16© Cloudera, Inc. All rights reserved.
Better Together
Kudu Benefits from Integration with the Apache Ecosystem
Spark – Stream Processing for Kudu
• Open standard for real-time stream processing
• Effective for automating decision processes and machine
learning
• Use Cases include: Time Series Data & Machine Data
Analytics
Impala – High-Performance BI & SQL for Kudu
• Open standard for interactive SQL queries
• Powers analytic database workloads with flexibility, scale, and
open architecture
• Use Cases include: Online Reporting
17© Cloudera, Inc. All rights reserved.
Why Kudu, Why Cloudera?
A simultaneous combination of sequential and random reads and writes
Can you insert time series data
in real time? How long does it
take to prepare it for analysis?
Can you get results and act fast
enough to change outcomes?
Can you handle large volumes
of machine-generated data? Do
you have the tools to identify
problems or threats? Can your
system do machine learning?
Time Series Data Machine Data Analytics
18© Cloudera, Inc. All rights reserved.
Kudu Increases the Value of Time Series Data
Time Series
Inserts, updates, scans, lookups
Workload
Examples
Stream market data; IoT; fraud detection &
prevention; risk monitoring; connected cars;
Time series data is most valuable if you can
analyze it to change outcomes in real time.
Kudu simulateneously enables:
• Time series data inserted/updated as it arrives
• Analytic scans to find trends on fresh time series data
• Lookups to quickly visit the point in time where an
event occured
19© Cloudera, Inc. All rights reserved.
Kudu Keeps Your Business Operational
Machine Data
Analytics
Inserts, scans, lookups
Workload
Examples
Network threat detection; network health
monitoring; application performance
monitoring
Kudu can help spot problems before they
happen. Real-time data inserts with the ability to
analyze trends identifies potential problems.
Kudu identifies trouble through:
• Unlimited storage, yielding better historic trend analysis
• Fast inserts to enable an up-to-date network view
• Fast scans identify/flag undesired states for remedy
20© Cloudera, Inc. All rights reserved.
Operational DB: Real-Time Architecture
Driving the Model Through Machine Learning
Kafka
Spark
Streaming
Spark MLlib
IoT Analytics
Individual Session
Full Model/Learning
Genesis
Spark
1 Event
Occurs
2
Messaging
3
Stream
Processing 4
Land in
Relational
Store
5
Apply ML
Libraries
IoT Data
Sources
Other Data Sources
21© Cloudera, Inc. All rights reserved.
Operational DB: Real-Time Architecture
MLlib & K-Means: Defining Microsegments via Machine Learning
Height
Weight
Height
Weight
1 2
Height
Weight
3
Height
Weight
4
L
M
S
XL
L
M
S
XS
Near
Custom
?
22© Cloudera, Inc. All rights reserved.
Operational DB: Real-Time Architecture
Driving Prediction and Optimization
Kafka
Spark
Streaming
Spark MLlib
IoT Analytics
Individual Session
1
Data
Processed
Genesis
Spark
2
Request Processed/
Kudu Queried
3
4
Results
Returned
Results
Processed
5
Processed
Data
Returned
Full Model/Learning
IoT Data
Sources
Other Data Sources
23© Cloudera, Inc. All rights reserved.
Operational DB: Real-Time Architecture
Driving Prediction and Optimization
Step 1: Data Processed
Apache Spark processes the data from the event (car sensors, manufacturing,
wearables, etc), which potentially involves keeping a running list of the last X
number of events
Step 2: Request Processed/Kudu Queried
A Spark application uses the data gathered in step one to query Kudu’s database
in a predefined manner to look for similar patterns defined via machine learning
Step 3: Kudu Results Returned
Kudu returns the results from the query in step 2 back to Spark to determine what
needs to be returned to the application
Step 4: Results Processed
Spark associates the results from Kudu with the information stored from the
current event to determine the next step to feed back to the application
Step 5: Processed Data Returned
The machine-generated, best possible outcome is prescribed and served to the
application
24© Cloudera, Inc. All rights reserved.
Operational DB: IoT Use Case
Prediction and Optimization
Kafka
Spark
Streaming
Spark MLlib
Application
Individual Session
Sensor Data
Spark
Full Model/Learning
Data Request Sent For Stream Processing
Data Cleaned/Ordered/Processed, Then
Delivered to Kudu for Modelling
Automated processes based on machine
learning enable prediction and
optimization at a new level.
Illustrative,
models will likely
have >2
dimensions
IoT Data
Sources
Kudu
Other Data Sources
25© Cloudera, Inc. All rights reserved.
Key IoT Use Cases
26© Cloudera, Inc. All rights reserved.
Using Predictive Maintenance to Improve
Performance and Reduce Fleet Downtime
• Real-time visibility of 300,000+ trucks in
order to improve uptime and vehicle
performance
• OnCommand Connection is collecting
telematics and geolocation data across
the fleet
• Reduced maintenance costs to $.03 per
mile from $.12-$.15 per mile
• Centralizing data from 13 systems with
varying frequency and semantic
definitions
TRANSPORTATION
» PREDICTIVE MAINTENANCE
» IMPROVED SERVICE
» DATA DRIVEN PRODUCTS
DATA-DRIVEN
PRODUCTS
CASE STUDY
27© Cloudera, Inc. All rights reserved.
Predictive Maintenance on industrial-
grade turbines for hydro power stations
Challenge:
• Gather, store and analyze noise levels
from turbines for anomaly detection
Solution:
• Cloudera platform used to gather and
analyze acoustic data/audio files coming
from the turbines in real-time
• Using diagnostic solution to monitor the
health of turbines and predict failures
in advance
PREDICTIVE MAINTENANCE
» INDUSTRIAL IoT
» LOWERED DOWNTIME
» LOWERED COSTS
Predictive Maintenance - Turbines
DATA-DRIVEN
PROCESS
CASE STUDY
DATA-DRIVEN
PRODUCTS
28© Cloudera, Inc. All rights reserved.
#1 Telematics provider with 130 billion
miles of driving data collected from black
boxes in connected cars
Challenge:
• Drive analytics on 12 million miles of
driving data collected every hour
Solution:
• Telematics solution based on Cloudera
to process data from black boxes
• Analytics around driving behavior, risks,
location, braking patterns, contextual
elements and crash information
TELEMATICS
» CONNECTED VEHICLES
» INSURANCE TELEMATICS
» PREDICTIVE ANALYTICS
Connected Car Telematics for Insurance
CASE STUDY
DATA-DRIVEN
PROCESS
DATA-DRIVEN
PRODUCTS
29© Cloudera, Inc. All rights reserved.
Powering a Variety of IoT Use Cases…
Connected Vehicles
Usage Based Insurance
Industrial IoT
Predictive Maintenance
Smart Cities/ Ports Oil & Gas
Aerospace & Aviation Smart Healthcare
30© Cloudera, Inc. All rights reserved.
Connected Car Demo
31© Cloudera, Inc. All rights reserved.
Connected Car – Demo Architecture
OPERATIONS
Cloudera Manager
Cloudera Director
DATA
MANAGEMENT
Cloudera Navigator
Encrypt and KeyTrustee
Optimizer
BATCH
Sqoop
REAL-TIME
Kafka, Flume
PROCESS, ANALYZE, SERVE
UNIFIED SERVICES
RESOURCE MANAGEMENT
YARN
SECURITY
Sentry, RecordService
FILESYSTEM
HDFS
RELATIONAL
Kudu
NoSQL
HBase
STORE
INTEGRATE
BATCH
Spark, Hive, Pig
MapReduce
STREAM
Spark
SQL
Impala
SEARCH
Solr
SDK
Partners
Cloudera Enterprise Data Hub
MQTT -
Kafka
Bridge
Connected Car
Simulator
Data Ingest &
Pipeline
Enterprise Data Hub BI & Visualization
Streaming Data:
• Time
• VIN
• Location
• Mileage
• Speed
• Acceleration
• Brakes applied?
• Turn signal on?
• Lane departed?
• Collision
detected?
• Hazard detected?
StreamSets Data
Collector
32© Cloudera, Inc. All rights reserved.
Connected Car – Demo Architecture
Cloudera Enterprise Data Hub
MQTT -
Kafka
Bridge
Connected Car
Simulator
Data Ingest &
Pipeline
Enterprise Data Hub BI & Visualization
Streaming Data:
• Time
• VIN
• Location
• Mileage
• Acceleration
• Speed
• Brakes applied?
• Turn signal on?
• Lane departed?
• Collision
detected?
• Hazard detected?
Data Storage Layer
Search
#2
#1
Pub-Sub Messaging
System
Real-Time
Processing Engine
StreamSets Data
Collector
Interactive SQL Engine
33© Cloudera, Inc. All rights reserved.
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataMike Percy
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is FailingDataWorks Summit
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsDatabricks
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Delta Lake with Azure Databricks
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure DatabricksDustin Vannoy
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...StreamNative
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufVerverica
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 

Was ist angesagt? (20)

Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Delta Lake with Azure Databricks
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure Databricks
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 

Ähnlich wie Apache Kudu for Real-Time IoT Architectures

Enabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache KuduEnabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache KuduGrant Henke
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...MapR Technologies
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...DataStax Academy
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformCloudera, Inc.
 
IoT Connected Brewery
IoT Connected BreweryIoT Connected Brewery
IoT Connected BreweryJason Hubbard
 
IoT-Enabled Predictive Maintenance
IoT-Enabled Predictive MaintenanceIoT-Enabled Predictive Maintenance
IoT-Enabled Predictive MaintenanceCloudera, Inc.
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Cloudera, Inc.
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantagePrecisely
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...Cloudera, Inc.
 
Hadoop and Manufacturing
Hadoop and ManufacturingHadoop and Manufacturing
Hadoop and ManufacturingCloudera, Inc.
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsCloudera, Inc.
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsightsWilfried Hoge
 
Intuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchIntuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchCloudera, Inc.
 

Ähnlich wie Apache Kudu for Real-Time IoT Architectures (20)

Enabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache KuduEnabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache Kudu
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Top 5 IoT Use Cases
Top 5 IoT Use CasesTop 5 IoT Use Cases
Top 5 IoT Use Cases
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data Platform
 
IoT Connected Brewery
IoT Connected BreweryIoT Connected Brewery
IoT Connected Brewery
 
IoT-Enabled Predictive Maintenance
IoT-Enabled Predictive MaintenanceIoT-Enabled Predictive Maintenance
IoT-Enabled Predictive Maintenance
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
CS-Op Analytics
CS-Op AnalyticsCS-Op Analytics
CS-Op Analytics
 
Hadoop and Manufacturing
Hadoop and ManufacturingHadoop and Manufacturing
Hadoop and Manufacturing
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of Things
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsights
 
Intuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchIntuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with Search
 

Mehr von Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 

Kürzlich hochgeladen (20)

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 

Apache Kudu for Real-Time IoT Architectures

  • 1. 1© Cloudera, Inc. All rights reserved. Simplifying Real-Time Architectures for IoT using Apache Kudu Vijay Raja| Solutions Marketing Lead, IoT Ryan Lippert | Product Marketing, Operational DB
  • 2. 2© Cloudera, Inc. All rights reserved. IoT – Key Drivers & Objectives Drive Internal Efficiencies Improve Product & Customer Exp. New Services & Business Models • Predictive Maintenance • Real-time monitoring • Ops optimization • Reduced equipment down-times • Product Usage Analytics • Personalized products & offerings • Improved Product Development • New usage based business models • New service offerings • E.g. On Command Connect • Remote Monitoring Who are my customers? How are they using my products? How can I lower downtime? How can I drive efficiencies? How do we implement a usage-based model? How can I launch new revenue streams?
  • 3. 3© Cloudera, Inc. All rights reserved. 2 PB of data/car/ year 1 – 2 TB of data / day 1 – 5 TB of data / day
  • 4. 4© Cloudera, Inc. All rights reserved. IoT Data Characteristics - The Foundation of Hadoop’s Potential IoT data comes from a variety of different sources • Massive volumes of intermittent data streams • Generated from a variety of data sources • Predominantly time-series • Can come in streams (real-time) or batches • Diverse data structures and schemas • Some of it may be perishable Combining sensor data with contextual data is the key to value creation from IoT
  • 5. 5© Cloudera, Inc. All rights reserved. Polling Question - 1 Where is your organization in your IoT journey? A. Not sure where to start B. Currently exploring use cases C. Implementing our first IoT use case D. Already deployed first IoT use case E. Multiple IoT use cases in production (Single Choice)
  • 6. 6© Cloudera, Inc. All rights reserved. The IoT Ecosystem & Architecture IoT Gateway Data Center Gateway • Data Routing • Edge-Processing • Edge-Storage IoT Data Storage, Processing & Analytics Centralized IoT Data Analytics • Time Series Data, Trends • Machine Learning • Context Enrichment • Deeper business insights Distributed Data Processing & Analytics • Cloud & On-Premise Cloud Sensors/ Things • Analytics at the edge • For Immediate response IoT Analytics Enterprise Data Sources
  • 7. 7© Cloudera, Inc. All rights reserved. What Happens at the Edge & What happens in the Cloud? • Analytics that needs to be acted upon immediately • Low latency req. - Hazard detection, collision avoidance etc. • Human response times • Context Enrichment • Time series Analysis • Comparative / Trend analysis • Machine Learning Cloud Analytics Edge Analytics Cloud Analytics
  • 8. 8© Cloudera, Inc. All rights reserved. Cloudera Enterprise – Hadoop as a Data Platform for IoT Sensors/ IoT Data Sources Internal Systems External Sources BI Solutions Real-Time AppsSearch Data Science Workbench SQL Machine Learning Data Center Cloud Sensor/ IoT Data IoT Gateway • Data Storage • Data Processing • Machine Learning • Real-time Analytics OPERATIONS Cloudera Manager Cloudera Director DATA MANAGEMENT Cloudera Navigator Encrypt and KeyTrustee Optimizer BATCH Sqoop REAL-TIME Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService FILESYSTEM HDFS RELATIONAL Kudu NoSQL HBase STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr SDK Partners
  • 9. 9© Cloudera, Inc. All rights reserved. IoT: Lots of Buzz, but what is the core concept? And critically, what do we need from our infrastructure? IoT promises prediction and optimization, but often delivers monitoring. The right solution allows you to analyze data and serve information in time to change business outcomes. That means the right solution is built on real-time analytics.
  • 10. 10© Cloudera, Inc. All rights reserved. IoT: Driven by Data
  • 11. 11© Cloudera, Inc. All rights reserved. Polling Question - 2 What area of the real-time data chain does your organization need the most help with? A. Data ingest B. Data processing C. Data serving D. All of the above (Single Choice)
  • 12. 12© Cloudera, Inc. All rights reserved. HDFS Fast Scans, Analytics and Processing of Stored Data Fast On-Line Updates & Data Serving Arbitrary Storage (Active Archive) Fast Analytics (on fast-changing or frequently-updated data) Traditional Hadoop Databases Leave a Gap Use cases that fall between HDFS and HBase were difficult to manage Unchanging Fast Changing Frequent Updates HBase Append-Only Real-Time Complex Hybrid Architectures Analytic Gap Pace of Analysis PaceofData
  • 13. 13© Cloudera, Inc. All rights reserved. The Trouble with Lambda Batch Layer Serving Layer Speed Layer New Data Data Lake (HDFS) Precompute Views Stream or Micro Batch Increment Views Data Application “Real-time” Increment Batch Recompute Merge Hadoop Storm/Spark HBase Impala Code must be kept in sync Restatement is difficult
  • 14. 14© Cloudera, Inc. All rights reserved. Updateable Analytic Storage Simple real-time analytics and updates with Apache Kudu Kudu: Storage for fast analytics on fast data • Simplified architecture for building real-time analytic applications • Designed for next-generation hardware for faster analytic performance across frameworks • Native Hadoop storage engine Flexibility for the right tools for the right use case in one platform • Only analytic database for Hadoop with Kudu + Impala • Simple real-time applications with Kudu + Spark Use cases • Time series data • Machine data analytics • Online reporting STRUCTURED Sqoop UNSTRUCTURED Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr OTHER Kite NoSQL HBase FILESYSTEM HDFS RELATIONAL Kudu OBJECT Cloud
  • 15. 15© Cloudera, Inc. All rights reserved. HDFS Fast Scans, Analytics and Processing of Stored Data Fast On-Line Updates & Data Serving Arbitrary Storage (Active Archive) Fast Analytics (on fast-changing or frequently-updated data) Kudu: Fast Analytics on Fast-Changing Data New storage engine enables new Hadoop use cases Unchanging Fast Changing Frequent Updates HBase Append-Only Real-Time Kudu Kudu fills the Gap Modern analytic applications often require complex data flow & difficult integration work to move data between HBase & HDFS Analytic Gap Pace of Analysis PaceofData
  • 16. 16© Cloudera, Inc. All rights reserved. Better Together Kudu Benefits from Integration with the Apache Ecosystem Spark – Stream Processing for Kudu • Open standard for real-time stream processing • Effective for automating decision processes and machine learning • Use Cases include: Time Series Data & Machine Data Analytics Impala – High-Performance BI & SQL for Kudu • Open standard for interactive SQL queries • Powers analytic database workloads with flexibility, scale, and open architecture • Use Cases include: Online Reporting
  • 17. 17© Cloudera, Inc. All rights reserved. Why Kudu, Why Cloudera? A simultaneous combination of sequential and random reads and writes Can you insert time series data in real time? How long does it take to prepare it for analysis? Can you get results and act fast enough to change outcomes? Can you handle large volumes of machine-generated data? Do you have the tools to identify problems or threats? Can your system do machine learning? Time Series Data Machine Data Analytics
  • 18. 18© Cloudera, Inc. All rights reserved. Kudu Increases the Value of Time Series Data Time Series Inserts, updates, scans, lookups Workload Examples Stream market data; IoT; fraud detection & prevention; risk monitoring; connected cars; Time series data is most valuable if you can analyze it to change outcomes in real time. Kudu simulateneously enables: • Time series data inserted/updated as it arrives • Analytic scans to find trends on fresh time series data • Lookups to quickly visit the point in time where an event occured
  • 19. 19© Cloudera, Inc. All rights reserved. Kudu Keeps Your Business Operational Machine Data Analytics Inserts, scans, lookups Workload Examples Network threat detection; network health monitoring; application performance monitoring Kudu can help spot problems before they happen. Real-time data inserts with the ability to analyze trends identifies potential problems. Kudu identifies trouble through: • Unlimited storage, yielding better historic trend analysis • Fast inserts to enable an up-to-date network view • Fast scans identify/flag undesired states for remedy
  • 20. 20© Cloudera, Inc. All rights reserved. Operational DB: Real-Time Architecture Driving the Model Through Machine Learning Kafka Spark Streaming Spark MLlib IoT Analytics Individual Session Full Model/Learning Genesis Spark 1 Event Occurs 2 Messaging 3 Stream Processing 4 Land in Relational Store 5 Apply ML Libraries IoT Data Sources Other Data Sources
  • 21. 21© Cloudera, Inc. All rights reserved. Operational DB: Real-Time Architecture MLlib & K-Means: Defining Microsegments via Machine Learning Height Weight Height Weight 1 2 Height Weight 3 Height Weight 4 L M S XL L M S XS Near Custom ?
  • 22. 22© Cloudera, Inc. All rights reserved. Operational DB: Real-Time Architecture Driving Prediction and Optimization Kafka Spark Streaming Spark MLlib IoT Analytics Individual Session 1 Data Processed Genesis Spark 2 Request Processed/ Kudu Queried 3 4 Results Returned Results Processed 5 Processed Data Returned Full Model/Learning IoT Data Sources Other Data Sources
  • 23. 23© Cloudera, Inc. All rights reserved. Operational DB: Real-Time Architecture Driving Prediction and Optimization Step 1: Data Processed Apache Spark processes the data from the event (car sensors, manufacturing, wearables, etc), which potentially involves keeping a running list of the last X number of events Step 2: Request Processed/Kudu Queried A Spark application uses the data gathered in step one to query Kudu’s database in a predefined manner to look for similar patterns defined via machine learning Step 3: Kudu Results Returned Kudu returns the results from the query in step 2 back to Spark to determine what needs to be returned to the application Step 4: Results Processed Spark associates the results from Kudu with the information stored from the current event to determine the next step to feed back to the application Step 5: Processed Data Returned The machine-generated, best possible outcome is prescribed and served to the application
  • 24. 24© Cloudera, Inc. All rights reserved. Operational DB: IoT Use Case Prediction and Optimization Kafka Spark Streaming Spark MLlib Application Individual Session Sensor Data Spark Full Model/Learning Data Request Sent For Stream Processing Data Cleaned/Ordered/Processed, Then Delivered to Kudu for Modelling Automated processes based on machine learning enable prediction and optimization at a new level. Illustrative, models will likely have >2 dimensions IoT Data Sources Kudu Other Data Sources
  • 25. 25© Cloudera, Inc. All rights reserved. Key IoT Use Cases
  • 26. 26© Cloudera, Inc. All rights reserved. Using Predictive Maintenance to Improve Performance and Reduce Fleet Downtime • Real-time visibility of 300,000+ trucks in order to improve uptime and vehicle performance • OnCommand Connection is collecting telematics and geolocation data across the fleet • Reduced maintenance costs to $.03 per mile from $.12-$.15 per mile • Centralizing data from 13 systems with varying frequency and semantic definitions TRANSPORTATION » PREDICTIVE MAINTENANCE » IMPROVED SERVICE » DATA DRIVEN PRODUCTS DATA-DRIVEN PRODUCTS CASE STUDY
  • 27. 27© Cloudera, Inc. All rights reserved. Predictive Maintenance on industrial- grade turbines for hydro power stations Challenge: • Gather, store and analyze noise levels from turbines for anomaly detection Solution: • Cloudera platform used to gather and analyze acoustic data/audio files coming from the turbines in real-time • Using diagnostic solution to monitor the health of turbines and predict failures in advance PREDICTIVE MAINTENANCE » INDUSTRIAL IoT » LOWERED DOWNTIME » LOWERED COSTS Predictive Maintenance - Turbines DATA-DRIVEN PROCESS CASE STUDY DATA-DRIVEN PRODUCTS
  • 28. 28© Cloudera, Inc. All rights reserved. #1 Telematics provider with 130 billion miles of driving data collected from black boxes in connected cars Challenge: • Drive analytics on 12 million miles of driving data collected every hour Solution: • Telematics solution based on Cloudera to process data from black boxes • Analytics around driving behavior, risks, location, braking patterns, contextual elements and crash information TELEMATICS » CONNECTED VEHICLES » INSURANCE TELEMATICS » PREDICTIVE ANALYTICS Connected Car Telematics for Insurance CASE STUDY DATA-DRIVEN PROCESS DATA-DRIVEN PRODUCTS
  • 29. 29© Cloudera, Inc. All rights reserved. Powering a Variety of IoT Use Cases… Connected Vehicles Usage Based Insurance Industrial IoT Predictive Maintenance Smart Cities/ Ports Oil & Gas Aerospace & Aviation Smart Healthcare
  • 30. 30© Cloudera, Inc. All rights reserved. Connected Car Demo
  • 31. 31© Cloudera, Inc. All rights reserved. Connected Car – Demo Architecture OPERATIONS Cloudera Manager Cloudera Director DATA MANAGEMENT Cloudera Navigator Encrypt and KeyTrustee Optimizer BATCH Sqoop REAL-TIME Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService FILESYSTEM HDFS RELATIONAL Kudu NoSQL HBase STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr SDK Partners Cloudera Enterprise Data Hub MQTT - Kafka Bridge Connected Car Simulator Data Ingest & Pipeline Enterprise Data Hub BI & Visualization Streaming Data: • Time • VIN • Location • Mileage • Speed • Acceleration • Brakes applied? • Turn signal on? • Lane departed? • Collision detected? • Hazard detected? StreamSets Data Collector
  • 32. 32© Cloudera, Inc. All rights reserved. Connected Car – Demo Architecture Cloudera Enterprise Data Hub MQTT - Kafka Bridge Connected Car Simulator Data Ingest & Pipeline Enterprise Data Hub BI & Visualization Streaming Data: • Time • VIN • Location • Mileage • Acceleration • Speed • Brakes applied? • Turn signal on? • Lane departed? • Collision detected? • Hazard detected? Data Storage Layer Search #2 #1 Pub-Sub Messaging System Real-Time Processing Engine StreamSets Data Collector Interactive SQL Engine
  • 33. 33© Cloudera, Inc. All rights reserved. Thank You