SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
NetApp’s Open Solution for Hadoop




                                           Technology Insight Series
 John Webster
 January, 2012




                            Eva lua t o r Gr oup
NetApp’s Open Solution for Hadoop
 




Page 1 of 6                Copyright 2012, Evaluator Group, Inc. 

                
 
NetApp’s Open Solution for Hadoop
 


Introduction 
Apache Hadoop has gained considerable attention from the enterprise IT community as a data analytics 
alternative to traditional BI systems and data warehousing. And while this is not the only alternative 
currently available, it has become highly visible. 

However, with heightened visibility comes heightened scrutiny. Hadoop’s shortcomings have also 
become more visible to enterprise IT administrators who have expressed concern over data integrity, 
system resiliency, ease of use, and maintainability.  Now, a growing number of enterprise IT‐centric 
vendors are responding to the opportunity to offer a Hadoop‐based data analytics solution that 
conforms to the demands of a production data center environment. Here we review one such solution 
that has resulted from a partnership between NetApp and Cloudera, the commercial face of Apache 
Hadoop. 

Target Market 
The NetApp Open Solution for Hadoop consists of at least two NetApp storage arrays—the E2660 which 
provides hardware RAID storage for Hadoop Data Nodes and the FAS2000 which offers system resilience 
and metadata protection capabilities to the Hadoop Name Node. The SANtricity Storage manager is also 
required. As part of the solution, Cloudera Enterprise including the Cloudera Enterprise Manage Suite 
can be included. Hadoop servers, clients, SAS HBAs, and network switches are not included. Therefore, 
this is not an appliance offering in the same way that EMC Greenplum and IBM Netezza are offered as 
pre‐integrated and installable solutions that include all componentry. 

The NetApp Open Solution for Hadoop augments traditional BI systems by allowing the BI users to 
embrace a much greater range of data types and data set sizes as well as perform reiterative queries in 
real time. And, unlike many of the early Hadoop implementations, care has been taken by the 
Cloudera/NetApp partners to help users flatten the Hadoop learning curve, and accelerate time to 
production.   

But perhaps more importantly, it is aimed at enterprise data center administrators looking for a Hadoop 
platform that can be managed in ways that are more consistent with production data center policies and 
practices. The objective is to provide an operational model that is tuned, tested, more stable and easier 
to maintain over time. 

We have been asked recently by storage administrators who are also NetApp users to suggest ways they 
can help to make emerging Hadoop environments more stable and consistent with enterprise data 
management policies regarding application availability, data protection, archive, compliance, security, 
and audit. The NetApp Open Solution for Hadoop addresses these requirements and offers a blue print 
for integrating NetApp storage arrays with Hadoop clusters that preserves Hadoop’s “shared nothing” 
architecture. 


The Shared Nothing Imperative 
Apache Hadoop users typically build their own parallelized computing clusters from commodity servers, 
each with server‐internal storage, typically in the form of a small JBOD disk array. These are commonly 

Page 2 of 6                                                     Copyright 2012, Evaluator Group, Inc. 

 
NetApp’s Open Solution for Hadoop
 
referred to as “shared nothing” architectures because all processing is done in parallel by servers in the 
cluster that are self‐contained processing units. They communicate with one another over a common 
network but otherwise do not share any other computing resources in the cluster including memory and 
storage. SAN and NAS storage, while scalable and resilient, is typically seen as lacking the kind of I/O 
performance these clusters need to rise above the capabilities of the standard data warehouse. 
Therefore, Hadoop storage is DAS.  

The practitioners of New Data Analytics processes are generally hostile to shared storage. They prefer 
direct‐attached storage (DAS) in its various forms from solid state disk (SSD) to high capacity SATA disk 
buried inside parallel processing nodes. The perception of shared storage architectures—SAN and NAS—
is that they are relatively slow, complex, and above all, expensive. These qualities are not consistent 
with New Data Analytics systems that thrive on system performance, commodity infrastructure, and low 
cost. 

Real or near‐real time information delivery is one of the defining characteristics of New Data Analytics. 
Latency is therefore avoided whenever and wherever possible. Data in memory is good. Data on 
spinning disk at the other end of a FC SAN connection is not.  


NetApp for Hadoop 
The first thing to note about the NetApp Open Solution for Hadoop is that it preserves the shared 
nothing architectural model. It provides DAS storage in the form of a NetApp E2660 array to each Data 
Node within the Hadoop cluster.  The E2660’s house a total of 60 disks per enclosure.  Configured as 
four volumes of DAS, each Data Node has its own non‐shared set of disks and each Data Node “sees” 
only its share of disk (see graphic below). Each Data Node is allocated fourteen disks within the E2660 
array as well as “array intelligence” – dual array controllers w/ hardware assisted computation of RAID 
parity. 




Page 3 of 6                                                                 Copyright 2012, Evaluator Group, Inc. 

                                                        
 
NetApp’s Open Solution for Hadoop
 




                                                                                    

               Figure 1. NetApp Open Solution for Hadoop configuration (courtesy NetApp) 

The E2660 operates as four completely separate and independent storage modules that are co‐located 
in the same 4U chassis. A single enclosure contains a total of sixty (60) two‐ or three‐TB, 7.2 K RPM 
Near‐line SAS drives.  Each module consists of 14 disks configured by the user as either RAID 5 (13 data 
+1 parity) or RAID 6 (12 data + 2 parity). The remaining four drives are available as global hot spares. 

The NetApp FAS2000, including its Data ONTAP operating system, provides NFS‐based storage for the 
Hadoop Name Node server. The FAS system offers production data center quality storage for Hadoop 
system metadata—a critical component to the overall functioning and resiliency of the Hadoop cluster. 
The integration level between the FAS system and the name node server is described by NetApp as 
modest in the first release of this solution. Later releases will use more ONTAP functionality and be 
more tightly into the Hadoop code base. 


Problems the Solution Addresses: 
At the Name Node Level 
The Hadoop Name Node is a well‐known single point of failure that can shut down the cluster when not 
functioning. The FAS2040 is used as storage for the Name Node, mitigating loss of cluster metadata due 
to Name Node failure. It functions as a single, unified repository of cluster metadata that supports faster 
recovery from disk failure.  It also serves as a repository for other cluster software including scripts and 
as such can be used to simplify cluster deployment, updates, and ongoing maintenance. 




Page 4 of 6                                                       Copyright 2012, Evaluator Group, Inc. 

 
NetApp’s Open Solution for Hadoop
 
At the Data Node Level 
Standard Hadoop clusters typically use Data Node‐based software to provide data protection and 
system resilience. Hadoop uses a distributed, host software‐based multiple data mirroring scheme that 
functions across all Data Nodes in a cluster. Upon data ingest, users typically specify that two additional 
copies of the original data be written to two other Data Nodes in the cluster 1  resulting in having three 
copies of data contained within the cluster. This provides both a degree of resilience in case of a failure 
and balanced access (load balancing) to data across the data nodes in the cluster.  

However, using a replication count of three, every TB of data ingested yields three TBs stored. In 
addition, the copy process consumes cluster processing resources and internal communications 
bandwidth that detracts from making those same resources available to analytic processes. 

NetApp moves data protection processes, and the creation of data replicas needed for adverse event 
recovery purposes, off of the Hadoop cluster and on to storage arrays that are designed to accomplish 
these tasks far more efficiently. Triple mirroring within the cluster consumes server and network 
bandwidth. Instead, NetApp allows admins to mirror data to a direct attached NetApp E2660 array via 
6GB/s SAS connections. Doing so replaces the triple mirror implemented in software that runs at the 
Data Node level with hardware RAID at that runs at the array level.  

The net result is that the Hadoop Data Nodes can be protected from the risk of disk failures that result in 
job failures. Support for non‐disruptive, simultaneous rebuild of logical volumes means that disk failures 
can be handled without disrupting the cluster and without requiring administrator intervention.  And 
the use of enterprise‐grade disk by NetApp in the E2660 array will result in fewer disk failures over time. 

Use of the E2660 can also increase overall cluster performance—even when JBOD disk used within the 
Data Nodes is replaced by the E2660—by reducing the HDFS replica count and allowing the storage 
array to process that workload. In addition, the use of hardware RAID combined with caching at the 
E2660 array level will add an additional margin of performance.  


Conclusion 
As mentioned earlier, the NetApp Open Solution for Hadoop differs from the data analytics appliance 
vendors in that it does not include Hadoop server and client hardware. This means that customers for 
this solution are free to source their own at the best price they can negotiate. Additionally, Cloudera’s 
Distribution including Apache Hadoop (CDH) is available as a free download from Cloudera. Zaloni is one 
of NetApp’s partners that offers the solution while adding custom services and support with the NetApp 
Open Solution for Hadoop. Hadoop is now emerging in enterprise production data centers as a new BI 
tool that in some cases augments already established data warehousing systems and in other cases, 
delivers functionality that is beyond the reach of the traditional data warehouse. We believe that these 
early Hadoop deployments will grow in size and importance over time. Therefore it is important to start 
with an implementation that offers production data center quality resilience and data integrity as can be 

                                                            
1
  Replication count is user controllable. Maintaining three copies of data has become standard practice. However, 
to improve performance for large bulk data loads, users can and often do reduce the replication count to one or 
two, and increase the count later. 
Page 5 of 6                                                                      Copyright 2012, Evaluator Group, Inc. 

                                                                
 
NetApp’s Open Solution for Hadoop
 
scaled upward in time by adding internal storage capacity rather than adding mode Data Nodes each 
time more storage capacity is needed. We also note that the ability to integrate an archival storage 
component for security and compliance reasons will also become more critical as time goes on. The 
NetApp Open Solution for Hadoop addresses these requirements by delivering enterprise data center 
quality storage platforms, integrated with Hadoop, that are well known and understood by enterprise IT 
administrators. 

About Evaluator Group 
Evaluator Group Inc. is dedicated to helping IT professionals and vendors create and implement strategies that make the most of 
the value of their storage and digital information. Evaluator Group services deliver in‐depth, unbiased analysis on storage 
architectures, infrastructures and management for IT professionals.  Since 1997 Evaluator Group has provided services for 
thousands of end users and vendor professionals through product and market evaluations, competitive analysis and education.  
www.evaluatorgroup.com Follow us on Twitter @evaluator_group 

                                                                          




Copyright 2012 Evaluator Group, Inc. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying 
and recording, or stored in a database or retrieval system for any purpose without the express written consent of Evaluator Group Inc. The 
information contained in this document is subject to change without notice. Evaluator Group assumes no responsibility for errors or omissions. 
Evaluator Group makes no expressed or implied warranties in this document relating to the use or operation of the products described herein. 
In no event shall Evaluator Group be liable for any indirect, special, inconsequential or incidental damages arising out of or associated with any 
aspect of this publication, even if advised of the possibility of such damages. The Evaluator Series is a trademark of Evaluator Group, Inc. All 
other trademarks are the property of their respective companies. 


Page 6 of 6                                                                               Copyright 2012, Evaluator Group, Inc. 

 
NetApp’s Open Solution for Hadoop
 
 




Page 7 of 6                Copyright 2012, Evaluator Group, Inc. 

                
 

Weitere ähnliche Inhalte

Was ist angesagt?

From Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLFrom Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLCloudera, Inc.
 
TheETLBottleneckinBigDataAnalytics(1)
TheETLBottleneckinBigDataAnalytics(1)TheETLBottleneckinBigDataAnalytics(1)
TheETLBottleneckinBigDataAnalytics(1)ruchabhandiwad
 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleHarald Erb
 
White Paper: Hadoop on EMC Isilon Scale-out NAS
White Paper: Hadoop on EMC Isilon Scale-out NAS   White Paper: Hadoop on EMC Isilon Scale-out NAS
White Paper: Hadoop on EMC Isilon Scale-out NAS EMC
 
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataHadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataWANdisco Plc
 
Comparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheComparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheSandeepTaksande
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
Hadoop and Netezza - Co-existence or Competition?
Hadoop and Netezza - Co-existence or Competition?Hadoop and Netezza - Co-existence or Competition?
Hadoop and Netezza - Co-existence or Competition?Krishnan Parasuraman
 
DBA 101 : Calling all New Database Administrators (PPT)
DBA 101 : Calling all New Database Administrators (PPT)DBA 101 : Calling all New Database Administrators (PPT)
DBA 101 : Calling all New Database Administrators (PPT)Gustavo Rene Antunez
 
Big Data - Hadoop Ecosystem
Big Data -  Hadoop Ecosystem Big Data -  Hadoop Ecosystem
Big Data - Hadoop Ecosystem nuriadelasheras
 
Hadoop-DS: Which SQL-on-Hadoop Rules the Herd
Hadoop-DS: Which SQL-on-Hadoop Rules the HerdHadoop-DS: Which SQL-on-Hadoop Rules the Herd
Hadoop-DS: Which SQL-on-Hadoop Rules the HerdIBM Analytics
 
Partitioning your Oracle Data Warehouse - Just a simple task?
Partitioning your Oracle Data Warehouse - Just a simple task?Partitioning your Oracle Data Warehouse - Just a simple task?
Partitioning your Oracle Data Warehouse - Just a simple task?Trivadis
 
Hitachi NAS Software Datasheet
Hitachi NAS Software DatasheetHitachi NAS Software Datasheet
Hitachi NAS Software DatasheetHitachi Vantara
 
Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Muthu Natarajan
 

Was ist angesagt? (20)

From Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLFrom Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETL
 
TheETLBottleneckinBigDataAnalytics(1)
TheETLBottleneckinBigDataAnalytics(1)TheETLBottleneckinBigDataAnalytics(1)
TheETLBottleneckinBigDataAnalytics(1)
 
Oracle in Database Hadoop
Oracle in Database HadoopOracle in Database Hadoop
Oracle in Database Hadoop
 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by Example
 
White Paper: Hadoop on EMC Isilon Scale-out NAS
White Paper: Hadoop on EMC Isilon Scale-out NAS   White Paper: Hadoop on EMC Isilon Scale-out NAS
White Paper: Hadoop on EMC Isilon Scale-out NAS
 
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataHadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big Data
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
Comparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheComparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs Apache
 
HDFS
HDFSHDFS
HDFS
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Hadoop and Netezza - Co-existence or Competition?
Hadoop and Netezza - Co-existence or Competition?Hadoop and Netezza - Co-existence or Competition?
Hadoop and Netezza - Co-existence or Competition?
 
DBA 101 : Calling all New Database Administrators (PPT)
DBA 101 : Calling all New Database Administrators (PPT)DBA 101 : Calling all New Database Administrators (PPT)
DBA 101 : Calling all New Database Administrators (PPT)
 
Understanding hdfs
Understanding hdfsUnderstanding hdfs
Understanding hdfs
 
Big Data - Hadoop Ecosystem
Big Data -  Hadoop Ecosystem Big Data -  Hadoop Ecosystem
Big Data - Hadoop Ecosystem
 
Hadoop-DS: Which SQL-on-Hadoop Rules the Herd
Hadoop-DS: Which SQL-on-Hadoop Rules the HerdHadoop-DS: Which SQL-on-Hadoop Rules the Herd
Hadoop-DS: Which SQL-on-Hadoop Rules the Herd
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 
Partitioning your Oracle Data Warehouse - Just a simple task?
Partitioning your Oracle Data Warehouse - Just a simple task?Partitioning your Oracle Data Warehouse - Just a simple task?
Partitioning your Oracle Data Warehouse - Just a simple task?
 
In15orlesss hadoop
In15orlesss hadoopIn15orlesss hadoop
In15orlesss hadoop
 
Hitachi NAS Software Datasheet
Hitachi NAS Software DatasheetHitachi NAS Software Datasheet
Hitachi NAS Software Datasheet
 
Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.
 

Ähnlich wie NetApp’s Open Solution for Hadoop

Dell PowerEdge R750 servers featuring Dell PowerEdge RAID Controllers (PERC 1...
Dell PowerEdge R750 servers featuring Dell PowerEdge RAID Controllers (PERC 1...Dell PowerEdge R750 servers featuring Dell PowerEdge RAID Controllers (PERC 1...
Dell PowerEdge R750 servers featuring Dell PowerEdge RAID Controllers (PERC 1...Principled Technologies
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Cognizant
 
Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khanKamranKhan587
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs sparkamarkayam
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesDavid Tjahjono,MD,MBA(UK)
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideDouglas Bernardini
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with HadoopNalini Mehta
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherJanBask Training
 
BIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfBIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfDIVYA370851
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoopOmar Jaber
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoopAditi Yadav
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxUttara University
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop BasicsSonal Tiwari
 
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...Samsung Business USA
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystemrohitraj268
 

Ähnlich wie NetApp’s Open Solution for Hadoop (20)

Dell PowerEdge R750 servers featuring Dell PowerEdge RAID Controllers (PERC 1...
Dell PowerEdge R750 servers featuring Dell PowerEdge RAID Controllers (PERC 1...Dell PowerEdge R750 servers featuring Dell PowerEdge RAID Controllers (PERC 1...
Dell PowerEdge R750 servers featuring Dell PowerEdge RAID Controllers (PERC 1...
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
 
Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khan
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs spark
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution Times
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
Hadoop Tutorial for Beginners
Hadoop Tutorial for BeginnersHadoop Tutorial for Beginners
Hadoop Tutorial for Beginners
 
BIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfBIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdf
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoop
 
Why Spark over Hadoop?
Why Spark over Hadoop?Why Spark over Hadoop?
Why Spark over Hadoop?
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoop
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 

Mehr von NetApp

DevOps the NetApp Way: 10 Rules for Forming a DevOps Team
DevOps the NetApp Way: 10 Rules for Forming a DevOps TeamDevOps the NetApp Way: 10 Rules for Forming a DevOps Team
DevOps the NetApp Way: 10 Rules for Forming a DevOps TeamNetApp
 
10 Reasons to Choose NetApp for EUC/VDI
10 Reasons to Choose NetApp for EUC/VDI10 Reasons to Choose NetApp for EUC/VDI
10 Reasons to Choose NetApp for EUC/VDINetApp
 
Spot Lets NetApp Get the Most Out of the Cloud
Spot Lets NetApp Get the Most Out of the CloudSpot Lets NetApp Get the Most Out of the Cloud
Spot Lets NetApp Get the Most Out of the CloudNetApp
 
NetApp #WFH: COVID-19 Impact Report
NetApp #WFH: COVID-19 Impact ReportNetApp #WFH: COVID-19 Impact Report
NetApp #WFH: COVID-19 Impact ReportNetApp
 
4 Ways FlexPod Forms the Foundation for Cisco and NetApp Success
4 Ways FlexPod Forms the Foundation for Cisco and NetApp Success4 Ways FlexPod Forms the Foundation for Cisco and NetApp Success
4 Ways FlexPod Forms the Foundation for Cisco and NetApp SuccessNetApp
 
NetApp 2020 Predictions
NetApp 2020 Predictions NetApp 2020 Predictions
NetApp 2020 Predictions NetApp
 
NetApp 2020 Predictions
NetApp 2020 Predictions NetApp 2020 Predictions
NetApp 2020 Predictions NetApp
 
NetApp 2020 Predictions in Tech
NetApp 2020 Predictions in TechNetApp 2020 Predictions in Tech
NetApp 2020 Predictions in TechNetApp
 
Corporate IT at NetApp
Corporate IT at NetAppCorporate IT at NetApp
Corporate IT at NetAppNetApp
 
Modernize small and mid-sized enterprise data management with the AFF C190
Modernize small and mid-sized enterprise data management with the AFF C190Modernize small and mid-sized enterprise data management with the AFF C190
Modernize small and mid-sized enterprise data management with the AFF C190NetApp
 
Achieving Target State Architecture in NetApp IT
Achieving Target State Architecture in NetApp ITAchieving Target State Architecture in NetApp IT
Achieving Target State Architecture in NetApp ITNetApp
 
10 Reasons Why Your SAP Applications Belong on NetApp
10 Reasons Why Your SAP Applications Belong on NetApp10 Reasons Why Your SAP Applications Belong on NetApp
10 Reasons Why Your SAP Applications Belong on NetAppNetApp
 
Turbocharge Your Data with Intel Optane Technology and MAX Data
Turbocharge Your Data with Intel Optane Technology and MAX DataTurbocharge Your Data with Intel Optane Technology and MAX Data
Turbocharge Your Data with Intel Optane Technology and MAX DataNetApp
 
Redefining HCI: How to Go from Hyper Converged to Hybrid Cloud Infrastructure
Redefining HCI: How to Go from Hyper Converged to Hybrid Cloud InfrastructureRedefining HCI: How to Go from Hyper Converged to Hybrid Cloud Infrastructure
Redefining HCI: How to Go from Hyper Converged to Hybrid Cloud InfrastructureNetApp
 
Webinar: NetApp SaaS Backup
Webinar: NetApp SaaS BackupWebinar: NetApp SaaS Backup
Webinar: NetApp SaaS BackupNetApp
 
NetApp 2019 Perspectives
NetApp 2019 PerspectivesNetApp 2019 Perspectives
NetApp 2019 PerspectivesNetApp
 
Künstliche Intelligenz ist in deutschen Unter- nehmen Chefsache
Künstliche Intelligenz ist in deutschen Unter- nehmen ChefsacheKünstliche Intelligenz ist in deutschen Unter- nehmen Chefsache
Künstliche Intelligenz ist in deutschen Unter- nehmen ChefsacheNetApp
 
Iperconvergenza come migliora gli economics del tuo IT
Iperconvergenza come migliora gli economics del tuo ITIperconvergenza come migliora gli economics del tuo IT
Iperconvergenza come migliora gli economics del tuo ITNetApp
 
10 Good Reasons: NetApp for Artificial Intelligence / Deep Learning
10 Good Reasons: NetApp for Artificial Intelligence / Deep Learning10 Good Reasons: NetApp for Artificial Intelligence / Deep Learning
10 Good Reasons: NetApp for Artificial Intelligence / Deep LearningNetApp
 
NetApp IT’s Tiered Archive Approach for Active IQ
NetApp IT’s Tiered Archive Approach for Active IQNetApp IT’s Tiered Archive Approach for Active IQ
NetApp IT’s Tiered Archive Approach for Active IQNetApp
 

Mehr von NetApp (20)

DevOps the NetApp Way: 10 Rules for Forming a DevOps Team
DevOps the NetApp Way: 10 Rules for Forming a DevOps TeamDevOps the NetApp Way: 10 Rules for Forming a DevOps Team
DevOps the NetApp Way: 10 Rules for Forming a DevOps Team
 
10 Reasons to Choose NetApp for EUC/VDI
10 Reasons to Choose NetApp for EUC/VDI10 Reasons to Choose NetApp for EUC/VDI
10 Reasons to Choose NetApp for EUC/VDI
 
Spot Lets NetApp Get the Most Out of the Cloud
Spot Lets NetApp Get the Most Out of the CloudSpot Lets NetApp Get the Most Out of the Cloud
Spot Lets NetApp Get the Most Out of the Cloud
 
NetApp #WFH: COVID-19 Impact Report
NetApp #WFH: COVID-19 Impact ReportNetApp #WFH: COVID-19 Impact Report
NetApp #WFH: COVID-19 Impact Report
 
4 Ways FlexPod Forms the Foundation for Cisco and NetApp Success
4 Ways FlexPod Forms the Foundation for Cisco and NetApp Success4 Ways FlexPod Forms the Foundation for Cisco and NetApp Success
4 Ways FlexPod Forms the Foundation for Cisco and NetApp Success
 
NetApp 2020 Predictions
NetApp 2020 Predictions NetApp 2020 Predictions
NetApp 2020 Predictions
 
NetApp 2020 Predictions
NetApp 2020 Predictions NetApp 2020 Predictions
NetApp 2020 Predictions
 
NetApp 2020 Predictions in Tech
NetApp 2020 Predictions in TechNetApp 2020 Predictions in Tech
NetApp 2020 Predictions in Tech
 
Corporate IT at NetApp
Corporate IT at NetAppCorporate IT at NetApp
Corporate IT at NetApp
 
Modernize small and mid-sized enterprise data management with the AFF C190
Modernize small and mid-sized enterprise data management with the AFF C190Modernize small and mid-sized enterprise data management with the AFF C190
Modernize small and mid-sized enterprise data management with the AFF C190
 
Achieving Target State Architecture in NetApp IT
Achieving Target State Architecture in NetApp ITAchieving Target State Architecture in NetApp IT
Achieving Target State Architecture in NetApp IT
 
10 Reasons Why Your SAP Applications Belong on NetApp
10 Reasons Why Your SAP Applications Belong on NetApp10 Reasons Why Your SAP Applications Belong on NetApp
10 Reasons Why Your SAP Applications Belong on NetApp
 
Turbocharge Your Data with Intel Optane Technology and MAX Data
Turbocharge Your Data with Intel Optane Technology and MAX DataTurbocharge Your Data with Intel Optane Technology and MAX Data
Turbocharge Your Data with Intel Optane Technology and MAX Data
 
Redefining HCI: How to Go from Hyper Converged to Hybrid Cloud Infrastructure
Redefining HCI: How to Go from Hyper Converged to Hybrid Cloud InfrastructureRedefining HCI: How to Go from Hyper Converged to Hybrid Cloud Infrastructure
Redefining HCI: How to Go from Hyper Converged to Hybrid Cloud Infrastructure
 
Webinar: NetApp SaaS Backup
Webinar: NetApp SaaS BackupWebinar: NetApp SaaS Backup
Webinar: NetApp SaaS Backup
 
NetApp 2019 Perspectives
NetApp 2019 PerspectivesNetApp 2019 Perspectives
NetApp 2019 Perspectives
 
Künstliche Intelligenz ist in deutschen Unter- nehmen Chefsache
Künstliche Intelligenz ist in deutschen Unter- nehmen ChefsacheKünstliche Intelligenz ist in deutschen Unter- nehmen Chefsache
Künstliche Intelligenz ist in deutschen Unter- nehmen Chefsache
 
Iperconvergenza come migliora gli economics del tuo IT
Iperconvergenza come migliora gli economics del tuo ITIperconvergenza come migliora gli economics del tuo IT
Iperconvergenza come migliora gli economics del tuo IT
 
10 Good Reasons: NetApp for Artificial Intelligence / Deep Learning
10 Good Reasons: NetApp for Artificial Intelligence / Deep Learning10 Good Reasons: NetApp for Artificial Intelligence / Deep Learning
10 Good Reasons: NetApp for Artificial Intelligence / Deep Learning
 
NetApp IT’s Tiered Archive Approach for Active IQ
NetApp IT’s Tiered Archive Approach for Active IQNetApp IT’s Tiered Archive Approach for Active IQ
NetApp IT’s Tiered Archive Approach for Active IQ
 

Kürzlich hochgeladen

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Kürzlich hochgeladen (20)

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

NetApp’s Open Solution for Hadoop

  • 1. NetApp’s Open Solution for Hadoop Technology Insight Series John Webster January, 2012 Eva lua t o r Gr oup
  • 2. NetApp’s Open Solution for Hadoop   Page 1 of 6  Copyright 2012, Evaluator Group, Inc.       
  • 3. NetApp’s Open Solution for Hadoop   Introduction  Apache Hadoop has gained considerable attention from the enterprise IT community as a data analytics  alternative to traditional BI systems and data warehousing. And while this is not the only alternative  currently available, it has become highly visible.  However, with heightened visibility comes heightened scrutiny. Hadoop’s shortcomings have also  become more visible to enterprise IT administrators who have expressed concern over data integrity,  system resiliency, ease of use, and maintainability.  Now, a growing number of enterprise IT‐centric  vendors are responding to the opportunity to offer a Hadoop‐based data analytics solution that  conforms to the demands of a production data center environment. Here we review one such solution  that has resulted from a partnership between NetApp and Cloudera, the commercial face of Apache  Hadoop.  Target Market  The NetApp Open Solution for Hadoop consists of at least two NetApp storage arrays—the E2660 which  provides hardware RAID storage for Hadoop Data Nodes and the FAS2000 which offers system resilience  and metadata protection capabilities to the Hadoop Name Node. The SANtricity Storage manager is also  required. As part of the solution, Cloudera Enterprise including the Cloudera Enterprise Manage Suite  can be included. Hadoop servers, clients, SAS HBAs, and network switches are not included. Therefore,  this is not an appliance offering in the same way that EMC Greenplum and IBM Netezza are offered as  pre‐integrated and installable solutions that include all componentry.  The NetApp Open Solution for Hadoop augments traditional BI systems by allowing the BI users to  embrace a much greater range of data types and data set sizes as well as perform reiterative queries in  real time. And, unlike many of the early Hadoop implementations, care has been taken by the  Cloudera/NetApp partners to help users flatten the Hadoop learning curve, and accelerate time to  production.    But perhaps more importantly, it is aimed at enterprise data center administrators looking for a Hadoop  platform that can be managed in ways that are more consistent with production data center policies and  practices. The objective is to provide an operational model that is tuned, tested, more stable and easier  to maintain over time.  We have been asked recently by storage administrators who are also NetApp users to suggest ways they  can help to make emerging Hadoop environments more stable and consistent with enterprise data  management policies regarding application availability, data protection, archive, compliance, security,  and audit. The NetApp Open Solution for Hadoop addresses these requirements and offers a blue print  for integrating NetApp storage arrays with Hadoop clusters that preserves Hadoop’s “shared nothing”  architecture.  The Shared Nothing Imperative  Apache Hadoop users typically build their own parallelized computing clusters from commodity servers,  each with server‐internal storage, typically in the form of a small JBOD disk array. These are commonly  Page 2 of 6  Copyright 2012, Evaluator Group, Inc.   
  • 4. NetApp’s Open Solution for Hadoop   referred to as “shared nothing” architectures because all processing is done in parallel by servers in the  cluster that are self‐contained processing units. They communicate with one another over a common  network but otherwise do not share any other computing resources in the cluster including memory and  storage. SAN and NAS storage, while scalable and resilient, is typically seen as lacking the kind of I/O  performance these clusters need to rise above the capabilities of the standard data warehouse.  Therefore, Hadoop storage is DAS.   The practitioners of New Data Analytics processes are generally hostile to shared storage. They prefer  direct‐attached storage (DAS) in its various forms from solid state disk (SSD) to high capacity SATA disk  buried inside parallel processing nodes. The perception of shared storage architectures—SAN and NAS— is that they are relatively slow, complex, and above all, expensive. These qualities are not consistent  with New Data Analytics systems that thrive on system performance, commodity infrastructure, and low  cost.  Real or near‐real time information delivery is one of the defining characteristics of New Data Analytics.  Latency is therefore avoided whenever and wherever possible. Data in memory is good. Data on  spinning disk at the other end of a FC SAN connection is not.   NetApp for Hadoop  The first thing to note about the NetApp Open Solution for Hadoop is that it preserves the shared  nothing architectural model. It provides DAS storage in the form of a NetApp E2660 array to each Data  Node within the Hadoop cluster.  The E2660’s house a total of 60 disks per enclosure.  Configured as  four volumes of DAS, each Data Node has its own non‐shared set of disks and each Data Node “sees”  only its share of disk (see graphic below). Each Data Node is allocated fourteen disks within the E2660  array as well as “array intelligence” – dual array controllers w/ hardware assisted computation of RAID  parity.  Page 3 of 6  Copyright 2012, Evaluator Group, Inc.       
  • 5. NetApp’s Open Solution for Hadoop     Figure 1. NetApp Open Solution for Hadoop configuration (courtesy NetApp)  The E2660 operates as four completely separate and independent storage modules that are co‐located  in the same 4U chassis. A single enclosure contains a total of sixty (60) two‐ or three‐TB, 7.2 K RPM  Near‐line SAS drives.  Each module consists of 14 disks configured by the user as either RAID 5 (13 data  +1 parity) or RAID 6 (12 data + 2 parity). The remaining four drives are available as global hot spares.  The NetApp FAS2000, including its Data ONTAP operating system, provides NFS‐based storage for the  Hadoop Name Node server. The FAS system offers production data center quality storage for Hadoop  system metadata—a critical component to the overall functioning and resiliency of the Hadoop cluster.  The integration level between the FAS system and the name node server is described by NetApp as  modest in the first release of this solution. Later releases will use more ONTAP functionality and be  more tightly into the Hadoop code base.  Problems the Solution Addresses:  At the Name Node Level  The Hadoop Name Node is a well‐known single point of failure that can shut down the cluster when not  functioning. The FAS2040 is used as storage for the Name Node, mitigating loss of cluster metadata due  to Name Node failure. It functions as a single, unified repository of cluster metadata that supports faster  recovery from disk failure.  It also serves as a repository for other cluster software including scripts and  as such can be used to simplify cluster deployment, updates, and ongoing maintenance.  Page 4 of 6  Copyright 2012, Evaluator Group, Inc.   
  • 6. NetApp’s Open Solution for Hadoop   At the Data Node Level  Standard Hadoop clusters typically use Data Node‐based software to provide data protection and  system resilience. Hadoop uses a distributed, host software‐based multiple data mirroring scheme that  functions across all Data Nodes in a cluster. Upon data ingest, users typically specify that two additional  copies of the original data be written to two other Data Nodes in the cluster 1  resulting in having three  copies of data contained within the cluster. This provides both a degree of resilience in case of a failure  and balanced access (load balancing) to data across the data nodes in the cluster.   However, using a replication count of three, every TB of data ingested yields three TBs stored. In  addition, the copy process consumes cluster processing resources and internal communications  bandwidth that detracts from making those same resources available to analytic processes.  NetApp moves data protection processes, and the creation of data replicas needed for adverse event  recovery purposes, off of the Hadoop cluster and on to storage arrays that are designed to accomplish  these tasks far more efficiently. Triple mirroring within the cluster consumes server and network  bandwidth. Instead, NetApp allows admins to mirror data to a direct attached NetApp E2660 array via  6GB/s SAS connections. Doing so replaces the triple mirror implemented in software that runs at the  Data Node level with hardware RAID at that runs at the array level.   The net result is that the Hadoop Data Nodes can be protected from the risk of disk failures that result in  job failures. Support for non‐disruptive, simultaneous rebuild of logical volumes means that disk failures  can be handled without disrupting the cluster and without requiring administrator intervention.  And  the use of enterprise‐grade disk by NetApp in the E2660 array will result in fewer disk failures over time.  Use of the E2660 can also increase overall cluster performance—even when JBOD disk used within the  Data Nodes is replaced by the E2660—by reducing the HDFS replica count and allowing the storage  array to process that workload. In addition, the use of hardware RAID combined with caching at the  E2660 array level will add an additional margin of performance.   Conclusion  As mentioned earlier, the NetApp Open Solution for Hadoop differs from the data analytics appliance  vendors in that it does not include Hadoop server and client hardware. This means that customers for  this solution are free to source their own at the best price they can negotiate. Additionally, Cloudera’s  Distribution including Apache Hadoop (CDH) is available as a free download from Cloudera. Zaloni is one  of NetApp’s partners that offers the solution while adding custom services and support with the NetApp  Open Solution for Hadoop. Hadoop is now emerging in enterprise production data centers as a new BI  tool that in some cases augments already established data warehousing systems and in other cases,  delivers functionality that is beyond the reach of the traditional data warehouse. We believe that these  early Hadoop deployments will grow in size and importance over time. Therefore it is important to start  with an implementation that offers production data center quality resilience and data integrity as can be                                                               1  Replication count is user controllable. Maintaining three copies of data has become standard practice. However,  to improve performance for large bulk data loads, users can and often do reduce the replication count to one or  two, and increase the count later.  Page 5 of 6  Copyright 2012, Evaluator Group, Inc.       
  • 7. NetApp’s Open Solution for Hadoop   scaled upward in time by adding internal storage capacity rather than adding mode Data Nodes each  time more storage capacity is needed. We also note that the ability to integrate an archival storage  component for security and compliance reasons will also become more critical as time goes on. The  NetApp Open Solution for Hadoop addresses these requirements by delivering enterprise data center  quality storage platforms, integrated with Hadoop, that are well known and understood by enterprise IT  administrators.  About Evaluator Group  Evaluator Group Inc. is dedicated to helping IT professionals and vendors create and implement strategies that make the most of  the value of their storage and digital information. Evaluator Group services deliver in‐depth, unbiased analysis on storage  architectures, infrastructures and management for IT professionals.  Since 1997 Evaluator Group has provided services for  thousands of end users and vendor professionals through product and market evaluations, competitive analysis and education.   www.evaluatorgroup.com Follow us on Twitter @evaluator_group    Copyright 2012 Evaluator Group, Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying  and recording, or stored in a database or retrieval system for any purpose without the express written consent of Evaluator Group Inc. The  information contained in this document is subject to change without notice. Evaluator Group assumes no responsibility for errors or omissions.  Evaluator Group makes no expressed or implied warranties in this document relating to the use or operation of the products described herein.  In no event shall Evaluator Group be liable for any indirect, special, inconsequential or incidental damages arising out of or associated with any  aspect of this publication, even if advised of the possibility of such damages. The Evaluator Series is a trademark of Evaluator Group, Inc. All  other trademarks are the property of their respective companies.  Page 6 of 6  Copyright 2012, Evaluator Group, Inc.   
  • 8. NetApp’s Open Solution for Hadoop     Page 7 of 6  Copyright 2012, Evaluator Group, Inc.