(1) Hadoop has the opportunity to power next-generation big data architectures by integrating transactions, interactions, and observations from various sources.
(2) For Hadoop to fully power the big data wave, many communities must work together, including being diligent stewards of the open source core and providing enterprise-ready solutions and services.
(3) Integrating Hadoop with existing IT investments through services, APIs, and partner ecosystems will be vitally important to unlocking the value of big data.
4. Big Data = Transactions + Interactions + Observations
BIG DATA User Generated Content
Sensors / RFID / Devices
Petabytes Mobile Web Sentiment
Social Interactions & Feeds
User Click Stream
Spatial & GPS
Web logs WEB A/B testing Coordinates
Terabytes External
Offer history Dynamic Pricing
Demographics
Affiliate Networks
Business Data Feeds
CRM
Gigabytes Segmentation Search Marketing HD Video, Audio, Images
ERP Offer details Behavioral Speech to Text
Purchase detail Targeting
Megabytes Purchase record Customer Touches Product/Service Logs
Dynamic Funnels
Payment record SMS/MMS
Support Contacts
Increasing Data Variety and Complexity
Source: Contents of above graphic created in partnership with Teradata, Inc.
5. There is still work to
be done to ensure
HADOOP
powers the
BIG DATA WAVE
6. Many Communities Must Work As One
• Be diligent stewards of the
open source core
• Be tireless innovators Open Source
beyond the core
Vendors
• Provide robust data platform
services & open APIs
• Enable ecosystem at each
End Users
layer of the stack
• Make platform enterprise-
ready & easy to use
7. Top 10 Influencers of the Decade
1. Google
2. Apple
3. Apache Software Foundation
4. Microsoft
5. Linux Foundation
6. Eclipse Foundation
7. Twitter
8. Free Software Foundation
9. Android Project
10. VMware
Source: SD Times, http://www.sdtimes.com/link/36666
8. Top 10 Influencers of the Decade
#3
Source: SD Times, http://www.sdtimes.com/link/36666
11. Connecting Transactions + Interactions + Observations
Audio, Retain runtime models and
Video,
Images
historical data for ongoing 4 Business
refinement & analysis
Transactions
Docs,
Text, & Interactions
XML
Web
Logs,
Web, Mobile, CRM,
Clicks ERP, SCM, …
Big Data
Social, Refinery Classic
Graph,
3 Share refined data and 1 ETL
Feeds
runtime models processing
Sensors, 2
Devices,
RFID
Store, aggregate, and
transform multi-structured Business
Spatial, data to unlock value Intelligence
GPS
& Analytics
Retain historical data to
Events,
Other
unlock additional value 5
Dashboards, Reports,
Visualization, …
12. Next-Generation Big Data Architecture
Audio, Web, Mobile, CRM,
Video,
Images ERP, SCM, … Business
Transactions
Docs,
Text, & Interactions
XML
Web
Logs,
Clicks
Big Data
Social, Refinery SQL NoSQL NewSQL
Graph,
Feeds
EDW MPP NewSQL
Sensors,
Devices,
RFID
Arrows powered by Business
Spatial,
GPS
ETL, data Intelligence
movement, and data & Analytics
integration
Events, technologies
Other Dashboards, Reports,
Visualization, …
13. Data Services & Open APIs are Vital
Raw hadoop data Table access
Inconsistent metadata
Tool specific access
HCatalog Aligned metadata
RESTful API
Apache HCatalog: Hadoop’s centralized metadata service
ü Provide consistent metadata and data models across tools
ü Share data as tables in and out of HDFS
ü Enable flexible, thin-client access via RESTful APIs
14. Data Services & Open APIs In Action
Analyze website visits by the
1 Web Log files via WebHDFS APIs 4
type of end results
Website Web
Interactions Logs
Big Data
Order Refinery
DB
Data
Customer
DB
Data
Customer & Order data via Talend Process, analyze, and join data
2 3
& HCatalog for schema via Talend, Pig, & HCatalog
16. Ecosystem Completes the Puzzle
Applications, Business Tools, & Dev Tools
Data Management & Movement
Infrastructure & Systems Management
17. Solution Architectures:
Make Hadoop Enterprise-Ready & Easy to Use
Applications, Business Tools, & Dev Tools
Data Management & Movement
Infrastructure & Systems Management
18. Our Opportunity…and Our Role
By the end of 2015,
more than half the world's data will be
processed by Apache Hadoop.
1 Be diligent stewards of the open source core
2 Be tireless innovators beyond the core
3 Provide robust data platform services & open APIs
4 Enable the ecosystem at each layer of the stack
5 Make the platform enterprise-ready & easy to use