SlideShare a Scribd company logo
1 of 32
Download to read offline
Mark Rittman, Independent Analyst, MJR Analytics
DATA INTEGRATION AND DATA WAREHOUSING
FOR CLOUD, BIG DATA AND IOT: 

WHAT’S NEW, WHAT’S COMING … AND WHAT’S MISSING RIGHT NOW
BIG DATA WORLD, LONDON
London, March 2017
•Oracle ACE Director, Independent Analyst
•Past ODTUG Exec Board Member + Oracle Scene Editor
•Author of two books on Oracle BI
•Co-founder & CTO of Rittman Mead
•15+ Years in Oracle BI, DW, ETL + now Big Data
•Host of the Drill to Detail Podcast (www.drilltodetail.com)
•Based in Brighton & work in London, UK
About The Presenter
2
A BIT OF HISTORY…
3
•Data warehouses provided a unified view of the business
•Single place to store key data and metrics
•Joined-up view of the business
•Aggregates and conformed dimensions
•ETL routines to load, cleanse and conform data
•BI tools for simple, guided access to information
•Tabular data access using SQL-generating tools
•Drill paths, hierarchies, facts, attributes
•Fast access to pre-computed aggregates
•Packaged BI for fast-start ERP analytics
4
Oracle
MongoDB
Oracle
Sybase
IBM	DB/2
MS	SQL	
MS	SQL	Server
Core	ERP	Platform
Retail	
Banking	
Call	Center	
E-Commerce	
CRM	


Business	
Intelligence	
Tools


Data	Warehouse
Access	&

Performance

Layer
ODS	/

Foundation

Layer
4
Data Warehousing Back in Mid-2000’s
How Traditional RDBMS Data Warehousing Scaled-Up
5
Shared-Everything	Architectures	(i.e.	
Oracle	RAC,	Exadata)
Shared-Nothing	Architectures

(e.g.	Teradata,	Netezza)
•Google needed to store and query their vast amount of server log files
•And wanted to do so using cheap, commodity hardware
•Google File System and MapReduce designed together for this use
Around the Same Time…
6
•GFS optimised for particular task at hand -
computing PageRank for sites
•Streaming reads for PageRank calcs, block writes for
crawler whole-site dumps
•Master node only holds metadata
•Stops client/master I/O being bottleneck, also acts as
traffic controller for clients
•Simple design, optimised for specific Google Need
•MapReduce focused on simple computations on
abstraction framework
•Select & filter (MAP) and reduce (aggregate) functions,
easily to distribute on cluster
•MapReduce abstracted cluster compute, HDFS
abstracted cluster storage
•Projects that inspired Apache Hadoop + HDFS
Google File System + MapReduce Key Innovations
7
•A way of storing (non-relational) data cheaply and easily expandable
•Gave us a way of scaling beyond TB-size without paying $$$
•First use-cases were offline storage, active archive of data
Hadoop’s Original Appeal to Data Warehouse Owners
8
(c) 2013
•Driven by pace of business, and user demands for more agility and control
•Traditional IT-governed data loading not always appropriate
•Not all data needed to be modelled right-away
•Not all data suited storing in tabular form
•New ways of analyzing data beyond SQL
•Graph analysis
•Machine learning
Data Warehousing and ETL Needed Some Agility
9
•Hadoop started by being synonymous with MapReduce, and Java coding
•But YARN (Yet another Resource Negotiator) broke this dependency
•Hadoop now just handles resource management
•Multiple different query engines can run against data in-place
•General-purpose (e.g. MapReduce)
•Graph processing
•Machine Learning
•Real-Time Processing
Hadoop 2.0 - Enabling Multiple Query Engines
10
•Storing data in format it arrived in, and then applying schema at query time
•Suits data that may be analysed in different ways by different tools
•In addition, some datatypes may have schema embedded in file format
•Key benefit - fast arriving data of unknown value can get to users earlier
•Made possible by tools such as Apache Hive + SerDes,

Apache Drill and self-describing file formats, HDFS storage
Advent of Schema-on-Read, and Data Lakes
11
•Data now landed in Hadoop clusters, NoSQL databases and Cloud Storage
•Flexible data storage platform with cheap storage, flexible schema support + compute
•Solves the problem of how to store new types of data + choose best time/way to process it
•Hadoop/NoSQL increasingly used for all store/transform/query tasks
Data Warehousing Circa 2010 : The “Data Lake”
12
Data	Transfer Data	Access
Data	Factory
Data	Reservoir
Business	
Intelligence	Tools
Hadoop	Platform
File	Based	
Integration
Stream	
Based	
Integration
Data	streams
Discovery	&	Development	Labs
Safe	&	secure	Discovery	and	Development	
environment
Data	sets	and	
samples
Models	and	
programs
Marketing	/
Sales	Applications
Models
Machine
Learning
Segments
Operational	Data
Transactions
Customer
Master	ata
Unstructured	Data
Voice	+	Chat	
Transcripts
ETL	Based
Integration
Raw	
Customer	Data
Data	stored	in	
the	original	
format	(usually	
files)		such	as	
SS7,	ASN.1,	
JSON	etc.
Mapped	
Customer	Data
Data	sets	
produced	by	
mapping	and	
transforming	
raw	data
DATA WAREHOUSING 

& BIG DATA TODAY…
13
•On-premise Hadoop, even with simple resilient clustering, will hit limits
•Clusters can reach 5000+ nodes, need to scale-up for demand peaks etc
•Scale limits are encountered way beyond those for DWs…
•… but future is elastically-scaled, query and compute-as-a-service
On-Premise Big Data Analytics Hits Its Limits
14
Oracle	Big	Data	Cloud	Compute	Edition	
Free	$300	developer	credit	at:

https://cloud.oracle.com/en_US/tryit
•New generation of big data platform services from Google, Amazon, Oracle
•Combines three key innovations from earlier technologies:
•Organising of data into tables and columns (from RDBMS DWs)
•Massively-scalable and distributed storage and query (from Big Data)
•Elastically-scalable Platform-as-a-Service (from Cloud)
Elastically-Scalable Data Warehouse-as-a-Service
15
Example Architecture : Google BigQuery
16
•And things come full-circle … analytics
typically requires tabular data
•Google BigQuery based-on DremelX
massively-parallel query engine
•But stores data columnar and provides SQL
interface
•Solves the problem of providing DW-like
functionality at scale, as-a-service
•This is the future … ;-)
BigQuery : Big Data Meets Data Warehousing
17
DATAFLOW PIPELINES 

ARE THE NEW ETL…
18
New ways to do BI
New ways to do BI
MACHINE LEARNING & SEARCH FOR 

“AUTOMAGIC” SCHEMA DISCOVERY
21
New ways to do BI
•By definition there's lots of data in a big data system ... so how do you find the data you
want?
•Google's own internal solution - GOODS ("Google Dataset Search")
•Uses crawler to discover new datasets
•ML classification routines to infer domain
•Data provenance and lineage
•Indexes and catalogs 26bn datasets
•Other users, vendors also have solutions
•Oracle Big Data Discovery
•Datameer
•Platfora
•Cloudera Navigator
Google GOODS - Catalog + Search At Google-Scale
23
A NEW TAKE ON BI…
24
•Came out if the data science movement, as a way to "show workings"
•A set of reproducible steps that tell a story about the data
•as well as being a better command-line environment for data analysis
•One example is Jupyter, evolution of iPython notebook
•supports pySpark, Pandas etc
•See also Apache Zepplin
Web-Based Data Analysis Notebooks
25
AND EMERGING OPEN-SOURCE

BI TOOLS AND PLATFORMS
26
And Emerging Open-Source

BI Tools and Platforms
wp-content/uploads/2016/05/paper.pdf
And Emerging Open-Source

BI Tools and Platforms
… Which Is What I’m Working On Right Now
30
THANK YOU
31
Mark Rittman, Independent Analyst, MJR Analytics
DATA INTEGRATION AND DATA WAREHOUSING
FOR CLOUD, BIG DATA AND IOT: 

WHAT’S NEW, WHAT’S COMING … AND WHAT’S MISSING RIGHT NOW
BIG DATA WORLD, LONDON
London, March 2017

More Related Content

What's hot

Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
DataWorks Summit
 

What's hot (20)

Data Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsData Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management Requirements
 
Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksBig Data & Data Lakes Building Blocks
Big Data & Data Lakes Building Blocks
 
Data lake
Data lakeData lake
Data lake
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data Warehouse
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data Lake
 
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real World
 
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle Database
 
Solution architecture for big data projects
Solution architecture for big data projectsSolution architecture for big data projects
Solution architecture for big data projects
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 

Viewers also liked

электроное портфолио
электроное    портфолио электроное    портфолио
электроное портфолио
azzaq1
 
2 k jeyaprakash diversity of medicinal plants used by adi community in and ar...
2 k jeyaprakash diversity of medicinal plants used by adi community in and ar...2 k jeyaprakash diversity of medicinal plants used by adi community in and ar...
2 k jeyaprakash diversity of medicinal plants used by adi community in and ar...
Dheeraj Vasu
 
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
Shinnosuke Takamichi
 
Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討
Shinnosuke Takamichi
 

Viewers also liked (20)

B. Malyshev. Legal regulation of the Police in the reform context (2016)
B. Malyshev. Legal regulation of the Police in the reform context (2016)B. Malyshev. Legal regulation of the Police in the reform context (2016)
B. Malyshev. Legal regulation of the Police in the reform context (2016)
 
электроное портфолио
электроное    портфолио электроное    портфолио
электроное портфолио
 
Corte penal internacional2_IAFJSR
Corte penal internacional2_IAFJSRCorte penal internacional2_IAFJSR
Corte penal internacional2_IAFJSR
 
Java 8 collections
Java 8  collectionsJava 8  collections
Java 8 collections
 
Lacteos el condor arequipe
Lacteos el condor  arequipeLacteos el condor  arequipe
Lacteos el condor arequipe
 
Matriz 2 fase 1 antoine_mario_gc177
Matriz 2  fase 1 antoine_mario_gc177Matriz 2  fase 1 antoine_mario_gc177
Matriz 2 fase 1 antoine_mario_gc177
 
Gr02 KIT post-emergenza
Gr02 KIT post-emergenzaGr02 KIT post-emergenza
Gr02 KIT post-emergenza
 
Electroquímica celdas ecuación de nerst-leyes de faraday
Electroquímica celdas ecuación de nerst-leyes de faradayElectroquímica celdas ecuación de nerst-leyes de faraday
Electroquímica celdas ecuación de nerst-leyes de faraday
 
(14-03-2017) Rabdomiolisis(PPT)
(14-03-2017) Rabdomiolisis(PPT)(14-03-2017) Rabdomiolisis(PPT)
(14-03-2017) Rabdomiolisis(PPT)
 
Climate Change Presentation Handout 2015
Climate Change Presentation Handout 2015Climate Change Presentation Handout 2015
Climate Change Presentation Handout 2015
 
3Com 3C17501
3Com 3C175013Com 3C17501
3Com 3C17501
 
Le rapport de la mission “Musées du XXIe siècle”
Le rapport de la mission “Musées du XXIe siècle”Le rapport de la mission “Musées du XXIe siècle”
Le rapport de la mission “Musées du XXIe siècle”
 
E. Krapivin. Accreditation of the Police in Ukraine: outcomes and conclusions...
E. Krapivin. Accreditation of the Police in Ukraine: outcomes and conclusions...E. Krapivin. Accreditation of the Police in Ukraine: outcomes and conclusions...
E. Krapivin. Accreditation of the Police in Ukraine: outcomes and conclusions...
 
Gamestorming booster 2017
Gamestorming   booster 2017Gamestorming   booster 2017
Gamestorming booster 2017
 
ZTE FDD INSTALLATION
ZTE FDD INSTALLATION ZTE FDD INSTALLATION
ZTE FDD INSTALLATION
 
Netwealth portfolio construction series - Successful value investing in small...
Netwealth portfolio construction series - Successful value investing in small...Netwealth portfolio construction series - Successful value investing in small...
Netwealth portfolio construction series - Successful value investing in small...
 
2 k jeyaprakash diversity of medicinal plants used by adi community in and ar...
2 k jeyaprakash diversity of medicinal plants used by adi community in and ar...2 k jeyaprakash diversity of medicinal plants used by adi community in and ar...
2 k jeyaprakash diversity of medicinal plants used by adi community in and ar...
 
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
 
Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討
 
Camera surveilance 7 seminar salman
Camera surveilance 7 seminar salmanCamera surveilance 7 seminar salman
Camera surveilance 7 seminar salman
 

Similar to Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s New, What’s Coming … and What’s Missing Right Now

Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
Andrew Brust
 

Similar to Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s New, What’s Coming … and What’s Missing Right Now (20)

Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Using Oracle Big Data Discovey as a Data Scientist's Toolkit
Using Oracle Big Data Discovey as a Data Scientist's ToolkitUsing Oracle Big Data Discovey as a Data Scientist's Toolkit
Using Oracle Big Data Discovey as a Data Scientist's Toolkit
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Transform from database professional to a Big Data architect
Transform from database professional to a Big Data architectTransform from database professional to a Big Data architect
Transform from database professional to a Big Data architect
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
The Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersThe Hadoop Ecosystem for Developers
The Hadoop Ecosystem for Developers
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Apache drill
Apache drillApache drill
Apache drill
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
 

More from Rittman Analytics

More from Rittman Analytics (16)

From Zero to One with Rittman Analytics
From Zero to One with Rittman AnalyticsFrom Zero to One with Rittman Analytics
From Zero to One with Rittman Analytics
 
Where Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big DataWhere Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big Data
 
User Engagement Analysis using the new Looker System Activity Model
User Engagement Analysis using the new Looker System Activity ModelUser Engagement Analysis using the new Looker System Activity Model
User Engagement Analysis using the new Looker System Activity Model
 
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data LakeFrom BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
 
Planning a Strategy for Autonomous Analytics and Data Warehousing
Planning a Strategy for Autonomous Analytics and Data WarehousingPlanning a Strategy for Autonomous Analytics and Data Warehousing
Planning a Strategy for Autonomous Analytics and Data Warehousing
 
Where Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big DataWhere Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big Data
 
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data WarehouseData Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
 
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data LakeFrom BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
 
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations DataUsing Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
 
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake EditionFrom BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
 
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations DataUsing Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
 
Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...
Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...
Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...
 
Analytics, BigQuery, Looker and How I Became an Internet Meme for 48 Hours
Analytics, BigQuery, Looker and How I Became an Internet Meme for 48 HoursAnalytics, BigQuery, Looker and How I Became an Internet Meme for 48 Hours
Analytics, BigQuery, Looker and How I Became an Internet Meme for 48 Hours
 
Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17
 
Petabytes to Personalization - Data Analytics with Qubit and Looker
Petabytes to Personalization - Data Analytics with Qubit and LookerPetabytes to Personalization - Data Analytics with Qubit and Looker
Petabytes to Personalization - Data Analytics with Qubit and Looker
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 

Recently uploaded

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 

Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s New, What’s Coming … and What’s Missing Right Now

  • 1. Mark Rittman, Independent Analyst, MJR Analytics DATA INTEGRATION AND DATA WAREHOUSING FOR CLOUD, BIG DATA AND IOT: 
 WHAT’S NEW, WHAT’S COMING … AND WHAT’S MISSING RIGHT NOW BIG DATA WORLD, LONDON London, March 2017
  • 2. •Oracle ACE Director, Independent Analyst •Past ODTUG Exec Board Member + Oracle Scene Editor •Author of two books on Oracle BI •Co-founder & CTO of Rittman Mead •15+ Years in Oracle BI, DW, ETL + now Big Data •Host of the Drill to Detail Podcast (www.drilltodetail.com) •Based in Brighton & work in London, UK About The Presenter 2
  • 3. A BIT OF HISTORY… 3
  • 4. •Data warehouses provided a unified view of the business •Single place to store key data and metrics •Joined-up view of the business •Aggregates and conformed dimensions •ETL routines to load, cleanse and conform data •BI tools for simple, guided access to information •Tabular data access using SQL-generating tools •Drill paths, hierarchies, facts, attributes •Fast access to pre-computed aggregates •Packaged BI for fast-start ERP analytics 4 Oracle MongoDB Oracle Sybase IBM DB/2 MS SQL MS SQL Server Core ERP Platform Retail Banking Call Center E-Commerce CRM 
 Business Intelligence Tools 
 Data Warehouse Access &
 Performance
 Layer ODS /
 Foundation
 Layer 4 Data Warehousing Back in Mid-2000’s
  • 5. How Traditional RDBMS Data Warehousing Scaled-Up 5 Shared-Everything Architectures (i.e. Oracle RAC, Exadata) Shared-Nothing Architectures
 (e.g. Teradata, Netezza)
  • 6. •Google needed to store and query their vast amount of server log files •And wanted to do so using cheap, commodity hardware •Google File System and MapReduce designed together for this use Around the Same Time… 6
  • 7. •GFS optimised for particular task at hand - computing PageRank for sites •Streaming reads for PageRank calcs, block writes for crawler whole-site dumps •Master node only holds metadata •Stops client/master I/O being bottleneck, also acts as traffic controller for clients •Simple design, optimised for specific Google Need •MapReduce focused on simple computations on abstraction framework •Select & filter (MAP) and reduce (aggregate) functions, easily to distribute on cluster •MapReduce abstracted cluster compute, HDFS abstracted cluster storage •Projects that inspired Apache Hadoop + HDFS Google File System + MapReduce Key Innovations 7
  • 8. •A way of storing (non-relational) data cheaply and easily expandable •Gave us a way of scaling beyond TB-size without paying $$$ •First use-cases were offline storage, active archive of data Hadoop’s Original Appeal to Data Warehouse Owners 8 (c) 2013
  • 9. •Driven by pace of business, and user demands for more agility and control •Traditional IT-governed data loading not always appropriate •Not all data needed to be modelled right-away •Not all data suited storing in tabular form •New ways of analyzing data beyond SQL •Graph analysis •Machine learning Data Warehousing and ETL Needed Some Agility 9
  • 10. •Hadoop started by being synonymous with MapReduce, and Java coding •But YARN (Yet another Resource Negotiator) broke this dependency •Hadoop now just handles resource management •Multiple different query engines can run against data in-place •General-purpose (e.g. MapReduce) •Graph processing •Machine Learning •Real-Time Processing Hadoop 2.0 - Enabling Multiple Query Engines 10
  • 11. •Storing data in format it arrived in, and then applying schema at query time •Suits data that may be analysed in different ways by different tools •In addition, some datatypes may have schema embedded in file format •Key benefit - fast arriving data of unknown value can get to users earlier •Made possible by tools such as Apache Hive + SerDes,
 Apache Drill and self-describing file formats, HDFS storage Advent of Schema-on-Read, and Data Lakes 11
  • 12. •Data now landed in Hadoop clusters, NoSQL databases and Cloud Storage •Flexible data storage platform with cheap storage, flexible schema support + compute •Solves the problem of how to store new types of data + choose best time/way to process it •Hadoop/NoSQL increasingly used for all store/transform/query tasks Data Warehousing Circa 2010 : The “Data Lake” 12 Data Transfer Data Access Data Factory Data Reservoir Business Intelligence Tools Hadoop Platform File Based Integration Stream Based Integration Data streams Discovery & Development Labs Safe & secure Discovery and Development environment Data sets and samples Models and programs Marketing / Sales Applications Models Machine Learning Segments Operational Data Transactions Customer Master ata Unstructured Data Voice + Chat Transcripts ETL Based Integration Raw Customer Data Data stored in the original format (usually files) such as SS7, ASN.1, JSON etc. Mapped Customer Data Data sets produced by mapping and transforming raw data
  • 13. DATA WAREHOUSING 
 & BIG DATA TODAY… 13
  • 14. •On-premise Hadoop, even with simple resilient clustering, will hit limits •Clusters can reach 5000+ nodes, need to scale-up for demand peaks etc •Scale limits are encountered way beyond those for DWs… •… but future is elastically-scaled, query and compute-as-a-service On-Premise Big Data Analytics Hits Its Limits 14 Oracle Big Data Cloud Compute Edition Free $300 developer credit at:
 https://cloud.oracle.com/en_US/tryit
  • 15. •New generation of big data platform services from Google, Amazon, Oracle •Combines three key innovations from earlier technologies: •Organising of data into tables and columns (from RDBMS DWs) •Massively-scalable and distributed storage and query (from Big Data) •Elastically-scalable Platform-as-a-Service (from Cloud) Elastically-Scalable Data Warehouse-as-a-Service 15
  • 16. Example Architecture : Google BigQuery 16
  • 17. •And things come full-circle … analytics typically requires tabular data •Google BigQuery based-on DremelX massively-parallel query engine •But stores data columnar and provides SQL interface •Solves the problem of providing DW-like functionality at scale, as-a-service •This is the future … ;-) BigQuery : Big Data Meets Data Warehousing 17
  • 18. DATAFLOW PIPELINES 
 ARE THE NEW ETL… 18
  • 19. New ways to do BI
  • 20. New ways to do BI
  • 21. MACHINE LEARNING & SEARCH FOR 
 “AUTOMAGIC” SCHEMA DISCOVERY 21
  • 22. New ways to do BI
  • 23. •By definition there's lots of data in a big data system ... so how do you find the data you want? •Google's own internal solution - GOODS ("Google Dataset Search") •Uses crawler to discover new datasets •ML classification routines to infer domain •Data provenance and lineage •Indexes and catalogs 26bn datasets •Other users, vendors also have solutions •Oracle Big Data Discovery •Datameer •Platfora •Cloudera Navigator Google GOODS - Catalog + Search At Google-Scale 23
  • 24. A NEW TAKE ON BI… 24
  • 25. •Came out if the data science movement, as a way to "show workings" •A set of reproducible steps that tell a story about the data •as well as being a better command-line environment for data analysis •One example is Jupyter, evolution of iPython notebook •supports pySpark, Pandas etc •See also Apache Zepplin Web-Based Data Analysis Notebooks 25
  • 26. AND EMERGING OPEN-SOURCE
 BI TOOLS AND PLATFORMS 26
  • 27. And Emerging Open-Source
 BI Tools and Platforms wp-content/uploads/2016/05/paper.pdf
  • 28.
  • 29. And Emerging Open-Source
 BI Tools and Platforms
  • 30. … Which Is What I’m Working On Right Now 30
  • 32. Mark Rittman, Independent Analyst, MJR Analytics DATA INTEGRATION AND DATA WAREHOUSING FOR CLOUD, BIG DATA AND IOT: 
 WHAT’S NEW, WHAT’S COMING … AND WHAT’S MISSING RIGHT NOW BIG DATA WORLD, LONDON London, March 2017