In our most recent Big Data Warehousing Meetup, we learned about transitioning from Big Data 1.0 with Hadoop 1.x with nascent technologies to the advent of Hadoop 2.x with YARN to enable distributed ETL, SQL and Analytics solutions. Caserta Concepts Chief Architect Elliott Cordo and an Actian Engineer covered the complete data value chain of an Enterprise-ready platform including data connectivity, collection, preparation, optimization and analytics with end user access.
Access additional slides from this meetup here:
http://www.slideshare.net/CasertaConcepts/big-data-warehousing-meetup-january-20
For more information on our services or upcoming events, please visit http://www.actian.com/ or http://www.casertaconcepts.com/.
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
1. Big Data 2.0: ETL & Analytics
Implementing a next generation platform
Tyler Mitchell, Paul Dingman
Innovation Lab
January 2014
2. ACTIAN – PLATFORM FOR NEXT GENERATION ANALYTICS
Outcomes
Sources
Enterprise
Applications
Data
Warehouse
Actian Analytics Platform
Connect
Analyze
Customer
Delight
Act
Social
Competitive
Advantage
Accelerators
Internet of Things
DataFlow
WWW
Machine
Data
Matrix
Vector
World-Class Risk
Management
Mobile
Traditional
NoSQL
SaaS
Disruptive New
Business Models
→
→
→
→
2
Rapid Time to Value
Unlimited Scale
Extreme Performance
Disruptive price/performance
→ Modern GUI Development
→ In-memory Analytics
→ Extends Hadoop and
NoSQL analytics
→ Complements Traditional
→
→
→
→
200+ data connectors
600+ analytic functions
Full deployment choice
Certification with broad
set of analytics tools
6. ACTIAN DATAFLOW – ETL & ANALYTICS
•
•
•
•
Predefined operators
Reduced IO
In-memory operations
Pipeline parallelism
Hadoop 2.0 - what is the big deal
YARN – a new resourced scheduler !
Yet Another Resource Scheduler”
DATAFLOW
DATAFLOW
ob Tracker and Task Tracker has been split up
to increase scalability
Remove MapReduce from core architecture
Now there is a
7. Operator Library – ETL/DQ
Reading/Writing
Text Processing
Data Exploration
Data Matching
Aggregation
Filtering
Manipulation
7
Extreme PerformanceRuns natively on Hadoop, so 500% faster than MapReduceExtreme ScaleRun on a laptopScale out to n number of nodes on any file systemExtreme AgilityETL, DQ and Analytics on Hadoop with no codingMove from any FS to any FS with no changes