SlideShare ist ein Scribd-Unternehmen logo
1 von 31
#HadoopSummit
The Time Has Come for
Big-Data-as-a-Service
Kris Applegate – Cloud and Big Data Solution Architect, Dell
Tom Phelan – Co-Founder and Chief Architect, BlueData
#HadoopSummit
Agenda
• A Brief History of Hadoop
• Data Storage and Networking Evolution
• The Virtualization Revolution
• Rise of Big-Data-as-a-Service
• Big-Data-as-a-Service (BDaaS) Defined
• BDaaS – Public Cloud or On-Premises?
• Q & A
#HadoopSummit
A Brief History of Hadoop
#HadoopSummit
In the Beginning (circa 2003) …
• Networks were slow (1 Gigabit per
second maximum)
• Siloed storage was expensive
(proprietary and often required
special hardware)
• Local HDDs were cheap and fast
enough for big data needs
Source:
http://static.googleusercontent.com/media/researc
#HadoopSummit
Bringing the Compute to the Data
Compute Storage
Co-Locate
Compute & Storage
Hadoop and HDFS are Born
#HadoopSummit
Network Improvements
#HadoopSummit
Data Compression Options in HDFS
Source:
www.slideshare.net/Hadoop_Summit/singh-kamat-june27425pmroom
#HadoopSummit
Result: Is Disk-Locality Irrelevant?
Source: https://amplab.cs.berkeley.edu/wp-
content/uploads/2011/06/disk-
irrelevant_hotos2011.pdf
Less relevant may be more accurate
•Faster data center networks
•Distributed/non-distributed caching
platforms
• Example: Alluxio (Tachyon)
•Compute and storage separation
#HadoopSummit
• Virtualization / “cloud” technology is
not absolutely required
• But realistically … the flexibility and
elasticity of BDaaS cannot be
economically provided without these
underlying technologies
BDaaS and Cloud
#HadoopSummit
The Virtualization Revolution
VMware
KVM
Docker
HyperV
LXC
#HadoopSummit
Virtualization enabled several key benefits including:
•Automation, flexibility, elasticity
• Cost reduction and consolidation
• Higher utilization, less hardware overprovisioning
•Multi-tenancy
• Security
• VxLAN
• Fault isolation
The Virtualization Revolution
#HadoopSummit
But …. the overhead involved in the virtualization
of storage and networking within a hypervisor
make it difficult to meet the performance needs of
Big Data workloads (SLAs, QoS)
The Virtualization Revolution
#HadoopSummit
• Linux Containers
• OS virtualization reduces CPU,
memory, network, and storage
virtualization overhead
• Docker file format makes containers
easy to use and share
The Virtualization Revolution
#HadoopSummit
Rise of Big-Data-as-a-Service
#HadoopSummit
Big Data New Realities
Big Data Traditional
Assumptions
Bare-metal
Disk-locality
HDFS on local disks
Big Data
New Realities
Containers
Compute and storage
separation
In-place access on remote
data stores
New Benefits
and Value
Big-Data-as-a-Service
Agility and cost savings
Faster time-to-insights
#HadoopSummit
Journey to BDaaS
2003
Google
paper
2012 Hadoop 1.0.2
Snappy Compression
2012 10 Gbit
networking in
data center
2008 Initial
release of Linux
containers
2002 Initial
release of
VMware ESX
2015 BlueData
EPIC 2.0 with
Docker
2016
BDaaS available
on-prem or cloud
2004
Big Data
era begins
2002 2016
2014
VxLANs
available
2013 Dell Hadoop
Performance
Analysis
2011 Dell first to launch
optimized Apache
Hadoop solution
2007 Hadoop
release 0.14.1
2009 Dell DCS
delivers first Big
Data server
2013 Initial
release
of Docker
2015 40 Gb
networking in
data center
2014 BlueData
wins Strata +
Hadoop World
Showcase
2009 Amazon
Launches EMR
#HadoopSummit
BDaaS – The Time Has Come
All the pieces are now available:
•Fast network hardware and good data compression
 Compute and storage separation
 Low overhead virtualization (containers)
 Ability to run network and storage-intensive workloads
•No sacrifice in performance
•Demand from end users for agility, flexibility, & speed
#HadoopSummit
Big-Data-as-a-Service Defined
“A mechanism for the delivery of statistical analysis tools and information
that helps organizations understand and use insights gained from large
information sets in order to gain a competitive advantage.”
On-Demand, Self-Service, Elastic
Big Data Infrastructure, Applications, Analytics
Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification
#HadoopSummit
• Core BDaaS
• Performance BDaaS
• Feature BDaaS
• Integrated BDaaS
Four Types of BDaaS
Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification
#HadoopSummit
Core BDaaS
• Minimal platform, such as Hadoop with YARN
Performance BDaaS
• “Downwards” vertical integration
• Includes optimized infrastructure
• Tight integration with Core BDaaS
Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification
Four Types of BDaaS
#HadoopSummit
Four Types of BDaaS
Feature BDaaS
• “Upwards” vertical integration
• Include features beyond Hadoop
• Support for multiple Core BDaaS providers
Integrated BDaaS
• Full vertical integration and optimization
• Includes both Performance BDaaS & Feature BDaaS
Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification
#HadoopSummit
BDaaS – Public Cloud or On-Prem?
#HadoopSummit
Public Cloud
• Low Capex, high Opex
• “Infinite” expandability
• Less secure?
• Less control: software,
SLAs, configs, etc
On-Premises (Private Cloud)
•High Capex, low Opex
•Eventually reach resource limit
•More secure?
•More control: software, SLAs,
configs, etc.
BDaaS – Public Cloud or On-Prem
#HadoopSummit
Challenge: Public cloud services can be proprietary
Goal: Deliver API-compatible on-prem + public cloud
• BDaaS layer (e.g. BlueData)
• PaaS layer (e.g. Cloudforms, Cloud Foundry)
• API-compatible private cloud (e.g. Microsoft Azure
Pack/Stack, OpenStack, VMware)
BDaaS – Workload Portability
#HadoopSummit
• Workloads with a shorter life than 16 months*
(e.g. Dev/Test)
• When data is in the cloud too
• Public-facing services
Example Public Cloud Use Cases
BDaaS – Public Cloud
* www.dell.com/learn/us/en/555/business~solutions~whitepapers~en/documents~microsoft-private-cloud-tco-0914.pdf
#HadoopSummit
Example On-Prem Use Cases
• High performance clusters
• Data security
• Data compliance
• Persistent clusters with > 16 month lifespan*
• High capacity clusters
• When SLAs are needed
* The BlueData EPIC software platform addresses this potential limitation
BDaaS – On-Premises / Private Cloud
#HadoopSummit
• BDaaS software platform, using Docker containers
• Self-service, on-demand Hadoop / Spark clusters
• Bring your own application / distribution / version
• Compute and storage separation
 Scale resources independently
 Clusters with < 16 month lifespan well supported (e.g. transient)
 No HDFS data ingestion penalty
• Secure multi-tenancy, Quality of Service (QoS)
BlueData EPIC – Integrated BDaaS
#HadoopSummit
Big Data On-Premises
Traditional Big Data On-Prem
IT
ManufacturingSalesR&DServices
< 30%
Utilization
Duplication of data
Management
complexity
Weeks to build
each cluster
Complex,
painful
upgrades
BlueData EPIC Software Platform
ManufacturingSalesR&DServices
BI/Analytics
Tools
> 90%
Utilization
BDaaS On-Prem with BlueData
No Duplication
of Data
Simplified
Management
Multi-Tenant
Simple,
instant
upgrades
Self-service,
on-demand
clusters
with BlueData
#HadoopSummit
NEW – BDaaS On-Prem and Cloud
• BlueData announced AWS and multi-cloud strategy
 Extending the user experience and value of BlueData to public cloud
 Single pane of glass for on-prem and off-prem Big Data workloads
 Initial AWS support; then MS Azure, Google Cloud Platform, others
• Support for data on-prem and compute in the cloud
 Leverage cloud compute elasticity while keeping data on-premises
 Eliminate challenge of data movement from on-prem to cloud
#HadoopSummit
BlueData and Dell Partnership
• Joint solution for Big-Data-as-a-Service
• BlueData = Certified Dell Technology Partner
• Installed, tested, validated on Dell hardware
• Featured in Dell’s Global Customer Solution Centers
#HadoopSummit
Kris Applegate
kris_applegate@dell.com
www.dell.com/bigdata
Tom Phelan
tap@bluedata.com
www.bluedata.com
Q & A

Weitere ähnliche Inhalte

Was ist angesagt?

2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive TuningAdam Muise
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the CloudDataWorks Summit
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Chris Nauroth
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...DataWorks Summit/Hadoop Summit
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Hortonworks
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsDataWorks Summit
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDataWorks Summit
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseGwen (Chen) Shapira
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitDataWorks Summit
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsDataWorks Summit
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...DataWorks Summit
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionDataWorks Summit
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNDataWorks Summit
 
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersHadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersDataWorks Summit/Hadoop Summit
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 

Was ist angesagt? (20)

2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
 
Big Data Platform Industrialization
Big Data Platform Industrialization Big Data Platform Industrialization
Big Data Platform Industrialization
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
 
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the fly
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle Database
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
 
To The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid AnalyticsTo The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid Analytics
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
 
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersHadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 

Ähnlich wie The Time Has Come for Big-Data-as-a-Service

0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2Raul Chong
 
Cloud - NDT - Presentation
Cloud - NDT - PresentationCloud - NDT - Presentation
Cloud - NDT - PresentationÉric Dusablon
 
Dimension Data Cloud Business Unit - Solution Offering
Dimension Data Cloud Business Unit - Solution OfferingDimension Data Cloud Business Unit - Solution Offering
Dimension Data Cloud Business Unit - Solution OfferingRifaHaryadi
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
 
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric ArchitectureShaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric ArchitectureDenodo
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFSUSE Italy
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesDataWorks Summit
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
Equinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journeyEquinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journeyPraveen Kumar
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Denodo
 
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak DataClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak DataAltinity Ltd
 
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...Radhika Puthiyetath
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeDenodo
 

Ähnlich wie The Time Has Come for Big-Data-as-a-Service (20)

Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the EnterpriseDeploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
Cloud - NDT - Presentation
Cloud - NDT - PresentationCloud - NDT - Presentation
Cloud - NDT - Presentation
 
Dimension Data Cloud Business Unit - Solution Offering
Dimension Data Cloud Business Unit - Solution OfferingDimension Data Cloud Business Unit - Solution Offering
Dimension Data Cloud Business Unit - Solution Offering
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric ArchitectureShaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual Machines
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Equinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journeyEquinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journey
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
Oracle Big Data Cloud service
Oracle Big Data Cloud serviceOracle Big Data Cloud service
Oracle Big Data Cloud service
 
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak DataClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
 
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
 

Mehr von BlueData, Inc.

Introduction to KubeDirector - SF Kubernetes Meetup
Introduction to KubeDirector - SF Kubernetes MeetupIntroduction to KubeDirector - SF Kubernetes Meetup
Introduction to KubeDirector - SF Kubernetes MeetupBlueData, Inc.
 
Dell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataDell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataBlueData, Inc.
 
BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)BlueData, Inc.
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentBlueData, Inc.
 
BlueData EPIC datasheet (en Français)
BlueData EPIC datasheet (en Français)BlueData EPIC datasheet (en Français)
BlueData EPIC datasheet (en Français)BlueData, Inc.
 
Best Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker ContainersBest Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker ContainersBlueData, Inc.
 
Bare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containersBare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containersBlueData, Inc.
 
Lessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsLessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsBlueData, Inc.
 
BlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec SheetBlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec SheetBlueData, Inc.
 
Solution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorSolution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorBlueData, Inc.
 
Hadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White PaperHadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White PaperBlueData, Inc.
 
Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorBlueData, Inc.
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentBlueData, Inc.
 
BlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData, Inc.
 
Big Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBig Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBlueData, Inc.
 
BlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData, Inc.
 
Spark Infrastructure Made Easy
Spark Infrastructure Made EasySpark Infrastructure Made Easy
Spark Infrastructure Made EasyBlueData, Inc.
 
BlueData Integration with Cloudera Manager
BlueData Integration with Cloudera ManagerBlueData Integration with Cloudera Manager
BlueData Integration with Cloudera ManagerBlueData, Inc.
 

Mehr von BlueData, Inc. (18)

Introduction to KubeDirector - SF Kubernetes Meetup
Introduction to KubeDirector - SF Kubernetes MeetupIntroduction to KubeDirector - SF Kubernetes Meetup
Introduction to KubeDirector - SF Kubernetes Meetup
 
Dell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataDell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big Data
 
BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
 
BlueData EPIC datasheet (en Français)
BlueData EPIC datasheet (en Français)BlueData EPIC datasheet (en Français)
BlueData EPIC datasheet (en Français)
 
Best Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker ContainersBest Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker Containers
 
Bare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containersBare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containers
 
Lessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsLessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark Workloads
 
BlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec SheetBlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec Sheet
 
Solution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorSolution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline Accelerator
 
Hadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White PaperHadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White Paper
 
Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab Accelerator
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environment
 
BlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData EPIC 2.0 Overview
BlueData EPIC 2.0 Overview
 
Big Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBig Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 Telco
 
BlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for Hadoop
 
Spark Infrastructure Made Easy
Spark Infrastructure Made EasySpark Infrastructure Made Easy
Spark Infrastructure Made Easy
 
BlueData Integration with Cloudera Manager
BlueData Integration with Cloudera ManagerBlueData Integration with Cloudera Manager
BlueData Integration with Cloudera Manager
 

Kürzlich hochgeladen

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROmotivationalword821
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 

Kürzlich hochgeladen (20)

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTRO
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 

The Time Has Come for Big-Data-as-a-Service

  • 1. #HadoopSummit The Time Has Come for Big-Data-as-a-Service Kris Applegate – Cloud and Big Data Solution Architect, Dell Tom Phelan – Co-Founder and Chief Architect, BlueData
  • 2. #HadoopSummit Agenda • A Brief History of Hadoop • Data Storage and Networking Evolution • The Virtualization Revolution • Rise of Big-Data-as-a-Service • Big-Data-as-a-Service (BDaaS) Defined • BDaaS – Public Cloud or On-Premises? • Q & A
  • 4. #HadoopSummit In the Beginning (circa 2003) … • Networks were slow (1 Gigabit per second maximum) • Siloed storage was expensive (proprietary and often required special hardware) • Local HDDs were cheap and fast enough for big data needs Source: http://static.googleusercontent.com/media/researc
  • 5. #HadoopSummit Bringing the Compute to the Data Compute Storage Co-Locate Compute & Storage Hadoop and HDFS are Born
  • 7. #HadoopSummit Data Compression Options in HDFS Source: www.slideshare.net/Hadoop_Summit/singh-kamat-june27425pmroom
  • 8. #HadoopSummit Result: Is Disk-Locality Irrelevant? Source: https://amplab.cs.berkeley.edu/wp- content/uploads/2011/06/disk- irrelevant_hotos2011.pdf Less relevant may be more accurate •Faster data center networks •Distributed/non-distributed caching platforms • Example: Alluxio (Tachyon) •Compute and storage separation
  • 9. #HadoopSummit • Virtualization / “cloud” technology is not absolutely required • But realistically … the flexibility and elasticity of BDaaS cannot be economically provided without these underlying technologies BDaaS and Cloud
  • 11. #HadoopSummit Virtualization enabled several key benefits including: •Automation, flexibility, elasticity • Cost reduction and consolidation • Higher utilization, less hardware overprovisioning •Multi-tenancy • Security • VxLAN • Fault isolation The Virtualization Revolution
  • 12. #HadoopSummit But …. the overhead involved in the virtualization of storage and networking within a hypervisor make it difficult to meet the performance needs of Big Data workloads (SLAs, QoS) The Virtualization Revolution
  • 13. #HadoopSummit • Linux Containers • OS virtualization reduces CPU, memory, network, and storage virtualization overhead • Docker file format makes containers easy to use and share The Virtualization Revolution
  • 15. #HadoopSummit Big Data New Realities Big Data Traditional Assumptions Bare-metal Disk-locality HDFS on local disks Big Data New Realities Containers Compute and storage separation In-place access on remote data stores New Benefits and Value Big-Data-as-a-Service Agility and cost savings Faster time-to-insights
  • 16. #HadoopSummit Journey to BDaaS 2003 Google paper 2012 Hadoop 1.0.2 Snappy Compression 2012 10 Gbit networking in data center 2008 Initial release of Linux containers 2002 Initial release of VMware ESX 2015 BlueData EPIC 2.0 with Docker 2016 BDaaS available on-prem or cloud 2004 Big Data era begins 2002 2016 2014 VxLANs available 2013 Dell Hadoop Performance Analysis 2011 Dell first to launch optimized Apache Hadoop solution 2007 Hadoop release 0.14.1 2009 Dell DCS delivers first Big Data server 2013 Initial release of Docker 2015 40 Gb networking in data center 2014 BlueData wins Strata + Hadoop World Showcase 2009 Amazon Launches EMR
  • 17. #HadoopSummit BDaaS – The Time Has Come All the pieces are now available: •Fast network hardware and good data compression  Compute and storage separation  Low overhead virtualization (containers)  Ability to run network and storage-intensive workloads •No sacrifice in performance •Demand from end users for agility, flexibility, & speed
  • 18. #HadoopSummit Big-Data-as-a-Service Defined “A mechanism for the delivery of statistical analysis tools and information that helps organizations understand and use insights gained from large information sets in order to gain a competitive advantage.” On-Demand, Self-Service, Elastic Big Data Infrastructure, Applications, Analytics Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification
  • 19. #HadoopSummit • Core BDaaS • Performance BDaaS • Feature BDaaS • Integrated BDaaS Four Types of BDaaS Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification
  • 20. #HadoopSummit Core BDaaS • Minimal platform, such as Hadoop with YARN Performance BDaaS • “Downwards” vertical integration • Includes optimized infrastructure • Tight integration with Core BDaaS Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification Four Types of BDaaS
  • 21. #HadoopSummit Four Types of BDaaS Feature BDaaS • “Upwards” vertical integration • Include features beyond Hadoop • Support for multiple Core BDaaS providers Integrated BDaaS • Full vertical integration and optimization • Includes both Performance BDaaS & Feature BDaaS Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification
  • 22. #HadoopSummit BDaaS – Public Cloud or On-Prem?
  • 23. #HadoopSummit Public Cloud • Low Capex, high Opex • “Infinite” expandability • Less secure? • Less control: software, SLAs, configs, etc On-Premises (Private Cloud) •High Capex, low Opex •Eventually reach resource limit •More secure? •More control: software, SLAs, configs, etc. BDaaS – Public Cloud or On-Prem
  • 24. #HadoopSummit Challenge: Public cloud services can be proprietary Goal: Deliver API-compatible on-prem + public cloud • BDaaS layer (e.g. BlueData) • PaaS layer (e.g. Cloudforms, Cloud Foundry) • API-compatible private cloud (e.g. Microsoft Azure Pack/Stack, OpenStack, VMware) BDaaS – Workload Portability
  • 25. #HadoopSummit • Workloads with a shorter life than 16 months* (e.g. Dev/Test) • When data is in the cloud too • Public-facing services Example Public Cloud Use Cases BDaaS – Public Cloud * www.dell.com/learn/us/en/555/business~solutions~whitepapers~en/documents~microsoft-private-cloud-tco-0914.pdf
  • 26. #HadoopSummit Example On-Prem Use Cases • High performance clusters • Data security • Data compliance • Persistent clusters with > 16 month lifespan* • High capacity clusters • When SLAs are needed * The BlueData EPIC software platform addresses this potential limitation BDaaS – On-Premises / Private Cloud
  • 27. #HadoopSummit • BDaaS software platform, using Docker containers • Self-service, on-demand Hadoop / Spark clusters • Bring your own application / distribution / version • Compute and storage separation  Scale resources independently  Clusters with < 16 month lifespan well supported (e.g. transient)  No HDFS data ingestion penalty • Secure multi-tenancy, Quality of Service (QoS) BlueData EPIC – Integrated BDaaS
  • 28. #HadoopSummit Big Data On-Premises Traditional Big Data On-Prem IT ManufacturingSalesR&DServices < 30% Utilization Duplication of data Management complexity Weeks to build each cluster Complex, painful upgrades BlueData EPIC Software Platform ManufacturingSalesR&DServices BI/Analytics Tools > 90% Utilization BDaaS On-Prem with BlueData No Duplication of Data Simplified Management Multi-Tenant Simple, instant upgrades Self-service, on-demand clusters with BlueData
  • 29. #HadoopSummit NEW – BDaaS On-Prem and Cloud • BlueData announced AWS and multi-cloud strategy  Extending the user experience and value of BlueData to public cloud  Single pane of glass for on-prem and off-prem Big Data workloads  Initial AWS support; then MS Azure, Google Cloud Platform, others • Support for data on-prem and compute in the cloud  Leverage cloud compute elasticity while keeping data on-premises  Eliminate challenge of data movement from on-prem to cloud
  • 30. #HadoopSummit BlueData and Dell Partnership • Joint solution for Big-Data-as-a-Service • BlueData = Certified Dell Technology Partner • Installed, tested, validated on Dell hardware • Featured in Dell’s Global Customer Solution Centers

Hinweis der Redaktion

  1. Tom - agenda
  2. Tom
  3. Tom – 3x data replication was enough. No need for backup/recovery/geographical replication/snapshots etc.
  4. Kris
  5. Kris
  6. Kris
  7. Tom Data locality is still important – caching. RAM &amp; SSD. tiering. Performance of Random access on data working sets that exceed cache capacity will devolve to network speed. Modern data center has at least 10 Gbit/s, more likely 40 Gbit/s network.
  8. Tom – Bare metal implementations of BDaaS are available. Costly and cumbersome.
  9. Tom First hardware virtualization - Hypervisors esx/hyperv/kvm/xen/etc Then Operating System virtualization – jails/containers/etc.
  10. Kris
  11. Kris
  12. Tom Same Docker files can be run on-prem &amp; in cloud. Remember this. It will be important later.
  13. Tom
  14. Tom Big Data users developed needs for agility, multiple clusters, remote data, independent compute &amp; storage scalability Technology progressed. Did the need drive the development of the technology or was it conincidental? Which came 1st, the chicken or the egg? Ultimately, it does not matter.
  15. Tom &amp; Kris
  16. Tom The needs/wants of the Big Data user can be met by available technology - BDaaS.
  17. Tom There are many definitions of BDaaS. Some say it is the combo of s/w &amp; data- that can be hard to grasp. We say it is functionality stack:
  18. Tom There are four types. Integrated BDaaS is the nirvana. The other three at stepping stones to get there. For the most part, the later types encompass the functionality of the earlier types. Each step gives more “help” to the organization wrt the use of their data.
  19. Kris
  20. Tom BI tools – datameer, platfora, CDH/HDP/Pivotal/MapR/BigInsights
  21. Tom This is often the $64,000 question. I have data here. I have data there. Why do I need to “get” it somewhere it order to be able to use it?
  22. Tom Before we can answer the question of how to process my data, Lets look at where the data is. Typically it will be in one of two places and each of those places has unique characteristics.
  23. Kris
  24. Kris
  25. Kris
  26. Tom One example of BDaaS. The EPIC platform from BlueData provides BDaaS.
  27. Tom Bring order to chaos. Stop “cluster sprawl” – we originally heard this term years ago from an early (alpha level) customer. The term has since entered common usage.
  28. Tom This is breaking news. EPIC support for this was announced LAST WEEK A roadmap for the delivery of the nirvana of “Integrated BDaaS” across both on-prem and public cloud With compute/storage separation – keep data on-prem, offload compute to the cloud – avoid cost, complexity, &amp; potential risk of moving data to the public cloud
  29. Kris