SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Performance Comparison of Intel® Enterprise Edition for Lustre* 
software and HDFS for MapReduce Applications 
Rekha Singhal, Gabriele Pacciucci and Mukesh Gangadhar 
* Other names and brands may be claimed as the property of others.
Hadoop 
Introduc-on 
§ Open 
source 
MapReduce 
framework 
for 
data-­‐intensive 
compu7ng 
§ Simple 
programming 
model 
– 
two 
func7ons: 
Map 
and 
Reduce 
§ Map: 
Transforms 
input 
into 
a 
list 
of 
key 
value 
pairs 
– Map(D) 
→ 
List[Ki 
, 
Vi] 
§ Reduce: 
Given 
a 
key 
and 
all 
associated 
values, 
produces 
result 
in 
the 
form 
of 
a 
list 
of 
values 
– Reduce(Ki 
, 
List[Vi]) 
→ 
List[Vo] 
§ Parallelism 
hidden 
by 
framework 
– Highly 
scalable: 
can 
be 
applied 
to 
large 
datasets 
(Big 
Data) 
and 
run 
on 
commodity 
clusters 
§ Comes 
with 
its 
own 
user-­‐space 
distributed 
file 
system 
(HDFS) 
based 
on 
the 
local 
storage 
of 
cluster 
nodes 
2
Hadoop 
Introduc-on 
(cont.) 
§ Framework 
handles 
most 
of 
the 
execu7on 
§ Splits 
input 
logically 
and 
feeds 
mappers 
§ Par77ons 
and 
sorts 
map 
outputs 
(Collect) 
§ Transports 
map 
outputs 
to 
reducers 
(Shuffle) 
§ Merges 
output 
obtained 
from 
each 
mapper 
(Merge) 
3 
Shuffle 
Input Split Map 
Input Split Map 
Input Split Map 
Reduce 
Reduce 
Output 
Output
MapReduce Application Processing
Intel® Enterprise Edition for 
Lustre* software 
Hadoop Dist. File System 
Node Node Node 
Network Switch 
Disk Disk Disk 
OSS OSS OSS Network Switch 
OST OST OST 
Node Node Node 
* Other names and brands may be claimed as the property of others.
• Clustered, distributed 
computing and storage 
• No data replication 
• No local storage 
• Widely used for HPC 
applications 
• Data moves to the 
computation 
• Data replication 
• Local storage 
• Widely used for MR 
applications 
6 
Intel® Enterprise Edition for 
Lustre* software 
Hadoop Dist. File System 
* Other names and brands may be claimed as the property of others.
Motivation 
q Could HPC and MR co-exist? 
q Need to evaluate use of Lustre software for MR application 
processing 
* Other names and brands may be claimed as the property of others.
Using Intel® Enterprise Edition for Lustre* software with Hadoop 
HADOOP ‘ADAPTER’ FOR LUSTRE 
8 * Other names and brands may be claimed as the property of others.
Hadoop 
over 
Intel 
EE 
for 
Lustre* 
Implementa-on 
§ Hadoop 
uses 
pluggable 
extensions 
to 
work 
with 
different 
file 
system 
types 
§ Lustre 
is 
POSIX 
compliant: 
org.apache.hadoop.fs 
– Use 
Hadoop’s 
built-­‐in 
LocalFileSystem 
class 
– Uses 
na7ve 
file 
system 
support 
in 
Java 
§ Extend 
and 
override 
default 
behavior: 
LustreFileSystem 
– Defines 
new 
URL 
scheme 
for 
Lustre 
– 
lustre:/// 
– Controls 
Lustre 
striping 
info 
– Resolves 
absolute 
paths 
to 
user-­‐defined 
directory 
– Leaves 
room 
for 
future 
enhancements 
§ Allow 
Hadoop 
9 
to 
find 
it 
in 
config 
files 
FileSystem 
RawLocalFileSystem 
LustreFileSystem 
* Other names and brands may be claimed as the property of others.
MR Processing in Intel® EE for Lustre* and HDFS 
Input Output 
LUSTRE 
(Global Namespace) 
Spill File : Reducer à N : 1 
Sort Merge 
Map 
O/P 
Spill 
Map 
Shuffle Merge 
Map O/P 
Map O/P 
Map O/P Map O/P 
Input 
Reduce 
Output 
Spill 
Local Disk Local Disk 
HDFS 
Repetitive 
* Other names and brands may be claimed as the property of others.
Conclusions from Existing Evaluations 
q TestDFSIO: 100% better throughput 
q TeraSort: 10-15% better performance 
q High Speed connecting Network Needed 
q Same BOM, HDFS is better for WordCount and BigMapOutput applications 
q Large number of compute nodes may challenge Enterprise Edition for Lustre* for 
software performance 
* Other names and brands may be claimed as the property of others.
Problem Definition 
Performance comparison of Lustre and HDFS file systems for MR 
implementation of FSI workload using HPDD cluster hosted in the Intel 
BigData Lab in Swindon (UK) using Intel® Enterprise Edition for Lustre* 
software 
Audit Trail System part of FINRA security specifications (publicly 
available) is used as a representative application. 
* Other names and brands may be claimed as the property of others.
EXPERIMENTAL SETUP 
13
Hadoop + HDFS Setup 
• 1 cluster manager, 1 Name node 
(NN), 8 Data nodes (DN) 
including NN. 
• 8 nodes, each of Intel(R) Xeon(R) 
CPU E5-2695 v2 @ 2.40GHz, 
320GB cluster RAM 
• 27 TB of cluster storage 
• 10 GB network among compute 
nodes 
• Red Hat 6.5, CDH 5.0.2 and 
HDFS 
14
Hadoop + Intel EE for Lustre* software - Setup 
• 1 Resource manager (RM), 1 History 
server (HS), 8 Node managers (NM) 
including RM and HS. 
• 8 nodes, each of Intel(R) Xeon(R) 
CPU E5-2695 v2 @ 2.40GHz, 320GB 
cluster RAM 
• 165TB of usable Lustre storage 
• 10 GB network among compute 
nodes 
• Red Hat 6.5, CDH 5.0.2, Intel® 
Enterprise Edition for Lustre* 
software 2.0 
* Other names and brands may be claimed as the property of others. 15
Intel® Enterprise Edition for Lustre* 2.0 Setup 
q Four OSS, One MDS, 16 OSTs, 1 MDT. 
q OSS Node 
o CPU- Intel(R) Xeon(R) CPU E5-2637 v2 @ 3.50GHz , Memory - 
128GB DDr3 1600mhz 
o Disk subsystem 
• 4 only LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] 
(rev 05) 
• 4 only 4TB SATA drives per controller raid 5 configuration per raid 
set 
o 4 OST per OSS node. 
* Other names and brands may be claimed as the property of others.
Cluster Parameters 
q Number of Compute nodes = 8 
q Map slots = 24 
q Reduce slots = 7 
q Rest of parameters such as Shuffle percent, Merge Percent, Sort 
Buffer are all kept as default 
q HDFS 
§ Replication Factor = 3 
q Intel® EE for Lustre* software 
§ stripe count = 1,4,16. 
§ stripe size = 4MB 
* Other names and brands may be claimed as the property of others.
Job Configuration Parameters 
q Map Split size= 1GB 
q Block size = 128MB 
q Input Data is NOT compressed 
q Output Data is NOT compressed
Workload 
q Consolidated Audit Trail System (part of FINRA application) DB 
Schema 
§ Single table with 12 columns related to share order. 
q Data consolidation query 
§ Print share order details for share orders during a date range. 
§ SELECT issue_symbol,orf_order_id, orf_order_received_ts FROM 
default.rt_query_extract WHERE issue_symbol like 'XLP' AND 
from_unixtime(cast((orf_order_received_ts/1000) as BIGINT),'yyyy-MM-ddhh:ii:ss') 
>= "2014-06-26 23:00:00" AND from_unixtime(cast((orf_order_received_ts/1000) as 
BIGINT),'yyyy-MM-ddhh:ii:ss') <= "2014-06-27 11:00:00";
Workload Implementation 
q DB is a flat file with columns separated using a token 
q Data generator to generate data for the DB 
q Tool to run queries concurrently 
q Query is implemented as Map and Reduce functions
Workload Size 
q Concurrency Tests: 
§ Query in isolation, concurrency =1 
§ Query in concurrent workload, concurrency =5 
§ Thinktime = 10% of query execution time in isolation. 
q Data Size: 
§ 100GB , 500GB, 1TB and 7TB
Performance Metric 
q MR job execution time in isolation 
q MR job average execution time in concurrent workload 
q CPU, Disk and Memory Utilization of the cluster
Performance Measurement 
q SAR data is collected from all nodes in the cluster. 
q MapReduce job log files are used for performance analysis 
q Intel® EE for Lustre* software nodes performance data is collected 
using Intel Manager 
q Hadoop performance data is collected using Intel Manager 
* Other names and brands may be claimed as the property of others.
Benchmarking Steps 
24 
Generate Data of given size 
Copy data to HDFS 
Start MR job of query on Name node 
with given concurrency 
On completion of job, collect Logs and 
performance data 
For different 
Concurrency 
levels 
For different 
Data sizes
RESULT ANALYSIS 
25
26 
Degree of Concurrency = 1 
Intel® EE for Lustre* performs better on large stripe count 
* Other names and brands may be claimed as the property of others.
27 
Degree of Concurrency = 1 
Intel® EE for Lustre* delivered 3X HDFS for optimal SC settings 
* Other names and brands may be claimed as the property of others.
28 
Degree of Concurrency = 1 
Intel® EE for Lustre* optimal SC gives 70% improvement over HDFS 
* Other names and brands may be claimed as the property of others.
Hadoop + HDFS Setup Hadoop + Intel® EE for 
Lustre* software - Setup 
29 
Nodes = 8 
Nodes = 8+5 = 13 
Performance Linear extrapolation for 
Nodes =13 
* Other names and brands may be claimed as the property of others.
30 
Number of Compute Servers = 13 
Intel® EE for Lustre* 2X better than HDFS for same BOM 
* Other names and brands may be claimed as the property of others.
31 
Degree of Concurrency = 5 
Intel® EE for Lustre* was 5.5 times better than HDFS on 7 TB data size 
* Other names and brands may be claimed as the property of others.
32 
Degree of Concurrency = 5 
Intel® EE for Lustre* was 5.5 times better than HDFS on 7 TB data size 
* Other names and brands may be claimed as the property of others.
33 
퐶표푛푐푢푟푟푒푛푡 퐽표푏 퐴푣푒푟푎푔푒 퐸푥푒푐푢푡푖표푛 푇푖푚푒/푆푖푛푔푙푒 퐽표푏 퐸푥푒푐푢푡푖표푛 푇푖푚푒  = 2.5
34 
= 4.5
35 
Intel® EE for Lustre* software > HDFS for concurrency 
* Other names and brands may be claimed as the property of others.
Conclusion 
q Increase in Stripe count improves Enterprise Edition for Lustre* 
software performance 
q Intel® EE for Lustre shows better performance for concurrent 
workload 
q Intel® EE for Lustre software = 3 X HDFS for single job 
q Intel® EE for Lustre software = 5.5 X HDFS for concurrent workload 
q Future work 
§ Impact of large number of compute nodes (i.e. OSSs <<<< Nodes) 
* Other names and brands may be claimed as the property of others.
Thank You 
rekha.singhal@tcs.com

Weitere ähnliche Inhalte

Was ist angesagt?

Database server comparison: Dell PowerEdge R630 vs. Lenovo ThinkServer RD550
Database server comparison: Dell PowerEdge R630 vs. Lenovo ThinkServer RD550Database server comparison: Dell PowerEdge R630 vs. Lenovo ThinkServer RD550
Database server comparison: Dell PowerEdge R630 vs. Lenovo ThinkServer RD550Principled Technologies
 
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAccelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAlluxio, Inc.
 
EMC Isilon Best Practices for Hadoop Data Storage
EMC Isilon Best Practices for Hadoop Data StorageEMC Isilon Best Practices for Hadoop Data Storage
EMC Isilon Best Practices for Hadoop Data StorageEMC
 
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersHadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersDataWorks Summit/Hadoop Summit
 
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataHadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataWANdisco Plc
 
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Programinside-BigData.com
 
Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...
Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...
Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...Principled Technologies
 
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...Red_Hat_Storage
 
FAQ on Dedupe NetApp
FAQ on Dedupe NetAppFAQ on Dedupe NetApp
FAQ on Dedupe NetAppAshwin Pawar
 
INTEL® XEON® SCALABLE PROCESSORS
INTEL® XEON® SCALABLE PROCESSORSINTEL® XEON® SCALABLE PROCESSORS
INTEL® XEON® SCALABLE PROCESSORSTyrone Systems
 
Spectrum Scale final
Spectrum Scale finalSpectrum Scale final
Spectrum Scale finalJoe Krotz
 
Get insight from document-based distributed MongoDB databases sooner and have...
Get insight from document-based distributed MongoDB databases sooner and have...Get insight from document-based distributed MongoDB databases sooner and have...
Get insight from document-based distributed MongoDB databases sooner and have...Principled Technologies
 
clusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheetclusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheetAndrei Khurshudov
 
Consolidating Oracle database servers onto Dell PowerEdge R920 running Oracle...
Consolidating Oracle database servers onto Dell PowerEdge R920 running Oracle...Consolidating Oracle database servers onto Dell PowerEdge R920 running Oracle...
Consolidating Oracle database servers onto Dell PowerEdge R920 running Oracle...Principled Technologies
 
The IBM eX5 Memory Advantage How Additional Memory Capacity on eX5 Can Benefi...
The IBM eX5 Memory Advantage How Additional Memory Capacity on eX5 Can Benefi...The IBM eX5 Memory Advantage How Additional Memory Capacity on eX5 Can Benefi...
The IBM eX5 Memory Advantage How Additional Memory Capacity on eX5 Can Benefi...IBM India Smarter Computing
 
Virtualization and Open Virtualization Format (OVF)
Virtualization and Open Virtualization Format (OVF)Virtualization and Open Virtualization Format (OVF)
Virtualization and Open Virtualization Format (OVF)rajsandhu1989
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 

Was ist angesagt? (20)

Database server comparison: Dell PowerEdge R630 vs. Lenovo ThinkServer RD550
Database server comparison: Dell PowerEdge R630 vs. Lenovo ThinkServer RD550Database server comparison: Dell PowerEdge R630 vs. Lenovo ThinkServer RD550
Database server comparison: Dell PowerEdge R630 vs. Lenovo ThinkServer RD550
 
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAccelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
 
EMC Isilon Best Practices for Hadoop Data Storage
EMC Isilon Best Practices for Hadoop Data StorageEMC Isilon Best Practices for Hadoop Data Storage
EMC Isilon Best Practices for Hadoop Data Storage
 
HDFS Tiered Storage
HDFS Tiered StorageHDFS Tiered Storage
HDFS Tiered Storage
 
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersHadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
 
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataHadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big Data
 
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
 
Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...
Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...
Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...
 
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
 
FAQ on Dedupe NetApp
FAQ on Dedupe NetAppFAQ on Dedupe NetApp
FAQ on Dedupe NetApp
 
INTEL® XEON® SCALABLE PROCESSORS
INTEL® XEON® SCALABLE PROCESSORSINTEL® XEON® SCALABLE PROCESSORS
INTEL® XEON® SCALABLE PROCESSORS
 
Spectrum Scale final
Spectrum Scale finalSpectrum Scale final
Spectrum Scale final
 
Get insight from document-based distributed MongoDB databases sooner and have...
Get insight from document-based distributed MongoDB databases sooner and have...Get insight from document-based distributed MongoDB databases sooner and have...
Get insight from document-based distributed MongoDB databases sooner and have...
 
clusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheetclusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheet
 
IBM GPFS
IBM GPFSIBM GPFS
IBM GPFS
 
Consolidating Oracle database servers onto Dell PowerEdge R920 running Oracle...
Consolidating Oracle database servers onto Dell PowerEdge R920 running Oracle...Consolidating Oracle database servers onto Dell PowerEdge R920 running Oracle...
Consolidating Oracle database servers onto Dell PowerEdge R920 running Oracle...
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
The IBM eX5 Memory Advantage How Additional Memory Capacity on eX5 Can Benefi...
The IBM eX5 Memory Advantage How Additional Memory Capacity on eX5 Can Benefi...The IBM eX5 Memory Advantage How Additional Memory Capacity on eX5 Can Benefi...
The IBM eX5 Memory Advantage How Additional Memory Capacity on eX5 Can Benefi...
 
Virtualization and Open Virtualization Format (OVF)
Virtualization and Open Virtualization Format (OVF)Virtualization and Open Virtualization Format (OVF)
Virtualization and Open Virtualization Format (OVF)
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 

Ähnlich wie Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapReduce Application

Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated ArchitectureImproving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated ArchitectureDatabricks
 
Breaking IO Performance Barriers: Scalable Parallel File System for AWS
Breaking IO Performance Barriers: Scalable Parallel File System for AWSBreaking IO Performance Barriers: Scalable Parallel File System for AWS
Breaking IO Performance Barriers: Scalable Parallel File System for AWSAmazon Web Services
 
Light-weighted HDFS disaster recovery
Light-weighted HDFS disaster recoveryLight-weighted HDFS disaster recovery
Light-weighted HDFS disaster recoveryDataWorks Summit
 
Data proliferation and machine learning: The case for upgrading your servers ...
Data proliferation and machine learning: The case for upgrading your servers ...Data proliferation and machine learning: The case for upgrading your servers ...
Data proliferation and machine learning: The case for upgrading your servers ...Principled Technologies
 
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...Alluxio, Inc.
 
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data AnalyticsSupersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analyticsmason_s
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Etu Solution
 
Getting started with Hadoop, Hive, Spark and Kafka
Getting started with Hadoop, Hive, Spark and KafkaGetting started with Hadoop, Hive, Spark and Kafka
Getting started with Hadoop, Hive, Spark and KafkaEdelweiss Kammermann
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Sumeet Singh
 
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Sumeet Singh
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successArchitecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successDataWorks Summit
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentationAmrut Patil
 
Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingSam Ng
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data trainingagiamas
 

Ähnlich wie Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapReduce Application (20)

Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated ArchitectureImproving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 
Breaking IO Performance Barriers: Scalable Parallel File System for AWS
Breaking IO Performance Barriers: Scalable Parallel File System for AWSBreaking IO Performance Barriers: Scalable Parallel File System for AWS
Breaking IO Performance Barriers: Scalable Parallel File System for AWS
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Light-weighted HDFS disaster recovery
Light-weighted HDFS disaster recoveryLight-weighted HDFS disaster recovery
Light-weighted HDFS disaster recovery
 
Data proliferation and machine learning: The case for upgrading your servers ...
Data proliferation and machine learning: The case for upgrading your servers ...Data proliferation and machine learning: The case for upgrading your servers ...
Data proliferation and machine learning: The case for upgrading your servers ...
 
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
 
DDN Product Update from SC13
DDN Product Update from SC13DDN Product Update from SC13
DDN Product Update from SC13
 
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data AnalyticsSupersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
 
Getting started with Hadoop, Hive, Spark and Kafka
Getting started with Hadoop, Hive, Spark and KafkaGetting started with Hadoop, Hive, Spark and Kafka
Getting started with Hadoop, Hive, Spark and Kafka
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successArchitecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
 
Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data Processing
 
Data Science
Data ScienceData Science
Data Science
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
 

Mehr von inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

Mehr von inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Kürzlich hochgeladen

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Kürzlich hochgeladen (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapReduce Application

  • 1. Performance Comparison of Intel® Enterprise Edition for Lustre* software and HDFS for MapReduce Applications Rekha Singhal, Gabriele Pacciucci and Mukesh Gangadhar * Other names and brands may be claimed as the property of others.
  • 2. Hadoop Introduc-on § Open source MapReduce framework for data-­‐intensive compu7ng § Simple programming model – two func7ons: Map and Reduce § Map: Transforms input into a list of key value pairs – Map(D) → List[Ki , Vi] § Reduce: Given a key and all associated values, produces result in the form of a list of values – Reduce(Ki , List[Vi]) → List[Vo] § Parallelism hidden by framework – Highly scalable: can be applied to large datasets (Big Data) and run on commodity clusters § Comes with its own user-­‐space distributed file system (HDFS) based on the local storage of cluster nodes 2
  • 3. Hadoop Introduc-on (cont.) § Framework handles most of the execu7on § Splits input logically and feeds mappers § Par77ons and sorts map outputs (Collect) § Transports map outputs to reducers (Shuffle) § Merges output obtained from each mapper (Merge) 3 Shuffle Input Split Map Input Split Map Input Split Map Reduce Reduce Output Output
  • 5. Intel® Enterprise Edition for Lustre* software Hadoop Dist. File System Node Node Node Network Switch Disk Disk Disk OSS OSS OSS Network Switch OST OST OST Node Node Node * Other names and brands may be claimed as the property of others.
  • 6. • Clustered, distributed computing and storage • No data replication • No local storage • Widely used for HPC applications • Data moves to the computation • Data replication • Local storage • Widely used for MR applications 6 Intel® Enterprise Edition for Lustre* software Hadoop Dist. File System * Other names and brands may be claimed as the property of others.
  • 7. Motivation q Could HPC and MR co-exist? q Need to evaluate use of Lustre software for MR application processing * Other names and brands may be claimed as the property of others.
  • 8. Using Intel® Enterprise Edition for Lustre* software with Hadoop HADOOP ‘ADAPTER’ FOR LUSTRE 8 * Other names and brands may be claimed as the property of others.
  • 9. Hadoop over Intel EE for Lustre* Implementa-on § Hadoop uses pluggable extensions to work with different file system types § Lustre is POSIX compliant: org.apache.hadoop.fs – Use Hadoop’s built-­‐in LocalFileSystem class – Uses na7ve file system support in Java § Extend and override default behavior: LustreFileSystem – Defines new URL scheme for Lustre – lustre:/// – Controls Lustre striping info – Resolves absolute paths to user-­‐defined directory – Leaves room for future enhancements § Allow Hadoop 9 to find it in config files FileSystem RawLocalFileSystem LustreFileSystem * Other names and brands may be claimed as the property of others.
  • 10. MR Processing in Intel® EE for Lustre* and HDFS Input Output LUSTRE (Global Namespace) Spill File : Reducer à N : 1 Sort Merge Map O/P Spill Map Shuffle Merge Map O/P Map O/P Map O/P Map O/P Input Reduce Output Spill Local Disk Local Disk HDFS Repetitive * Other names and brands may be claimed as the property of others.
  • 11. Conclusions from Existing Evaluations q TestDFSIO: 100% better throughput q TeraSort: 10-15% better performance q High Speed connecting Network Needed q Same BOM, HDFS is better for WordCount and BigMapOutput applications q Large number of compute nodes may challenge Enterprise Edition for Lustre* for software performance * Other names and brands may be claimed as the property of others.
  • 12. Problem Definition Performance comparison of Lustre and HDFS file systems for MR implementation of FSI workload using HPDD cluster hosted in the Intel BigData Lab in Swindon (UK) using Intel® Enterprise Edition for Lustre* software Audit Trail System part of FINRA security specifications (publicly available) is used as a representative application. * Other names and brands may be claimed as the property of others.
  • 14. Hadoop + HDFS Setup • 1 cluster manager, 1 Name node (NN), 8 Data nodes (DN) including NN. • 8 nodes, each of Intel(R) Xeon(R) CPU E5-2695 v2 @ 2.40GHz, 320GB cluster RAM • 27 TB of cluster storage • 10 GB network among compute nodes • Red Hat 6.5, CDH 5.0.2 and HDFS 14
  • 15. Hadoop + Intel EE for Lustre* software - Setup • 1 Resource manager (RM), 1 History server (HS), 8 Node managers (NM) including RM and HS. • 8 nodes, each of Intel(R) Xeon(R) CPU E5-2695 v2 @ 2.40GHz, 320GB cluster RAM • 165TB of usable Lustre storage • 10 GB network among compute nodes • Red Hat 6.5, CDH 5.0.2, Intel® Enterprise Edition for Lustre* software 2.0 * Other names and brands may be claimed as the property of others. 15
  • 16. Intel® Enterprise Edition for Lustre* 2.0 Setup q Four OSS, One MDS, 16 OSTs, 1 MDT. q OSS Node o CPU- Intel(R) Xeon(R) CPU E5-2637 v2 @ 3.50GHz , Memory - 128GB DDr3 1600mhz o Disk subsystem • 4 only LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05) • 4 only 4TB SATA drives per controller raid 5 configuration per raid set o 4 OST per OSS node. * Other names and brands may be claimed as the property of others.
  • 17. Cluster Parameters q Number of Compute nodes = 8 q Map slots = 24 q Reduce slots = 7 q Rest of parameters such as Shuffle percent, Merge Percent, Sort Buffer are all kept as default q HDFS § Replication Factor = 3 q Intel® EE for Lustre* software § stripe count = 1,4,16. § stripe size = 4MB * Other names and brands may be claimed as the property of others.
  • 18. Job Configuration Parameters q Map Split size= 1GB q Block size = 128MB q Input Data is NOT compressed q Output Data is NOT compressed
  • 19. Workload q Consolidated Audit Trail System (part of FINRA application) DB Schema § Single table with 12 columns related to share order. q Data consolidation query § Print share order details for share orders during a date range. § SELECT issue_symbol,orf_order_id, orf_order_received_ts FROM default.rt_query_extract WHERE issue_symbol like 'XLP' AND from_unixtime(cast((orf_order_received_ts/1000) as BIGINT),'yyyy-MM-ddhh:ii:ss') >= "2014-06-26 23:00:00" AND from_unixtime(cast((orf_order_received_ts/1000) as BIGINT),'yyyy-MM-ddhh:ii:ss') <= "2014-06-27 11:00:00";
  • 20. Workload Implementation q DB is a flat file with columns separated using a token q Data generator to generate data for the DB q Tool to run queries concurrently q Query is implemented as Map and Reduce functions
  • 21. Workload Size q Concurrency Tests: § Query in isolation, concurrency =1 § Query in concurrent workload, concurrency =5 § Thinktime = 10% of query execution time in isolation. q Data Size: § 100GB , 500GB, 1TB and 7TB
  • 22. Performance Metric q MR job execution time in isolation q MR job average execution time in concurrent workload q CPU, Disk and Memory Utilization of the cluster
  • 23. Performance Measurement q SAR data is collected from all nodes in the cluster. q MapReduce job log files are used for performance analysis q Intel® EE for Lustre* software nodes performance data is collected using Intel Manager q Hadoop performance data is collected using Intel Manager * Other names and brands may be claimed as the property of others.
  • 24. Benchmarking Steps 24 Generate Data of given size Copy data to HDFS Start MR job of query on Name node with given concurrency On completion of job, collect Logs and performance data For different Concurrency levels For different Data sizes
  • 26. 26 Degree of Concurrency = 1 Intel® EE for Lustre* performs better on large stripe count * Other names and brands may be claimed as the property of others.
  • 27. 27 Degree of Concurrency = 1 Intel® EE for Lustre* delivered 3X HDFS for optimal SC settings * Other names and brands may be claimed as the property of others.
  • 28. 28 Degree of Concurrency = 1 Intel® EE for Lustre* optimal SC gives 70% improvement over HDFS * Other names and brands may be claimed as the property of others.
  • 29. Hadoop + HDFS Setup Hadoop + Intel® EE for Lustre* software - Setup 29 Nodes = 8 Nodes = 8+5 = 13 Performance Linear extrapolation for Nodes =13 * Other names and brands may be claimed as the property of others.
  • 30. 30 Number of Compute Servers = 13 Intel® EE for Lustre* 2X better than HDFS for same BOM * Other names and brands may be claimed as the property of others.
  • 31. 31 Degree of Concurrency = 5 Intel® EE for Lustre* was 5.5 times better than HDFS on 7 TB data size * Other names and brands may be claimed as the property of others.
  • 32. 32 Degree of Concurrency = 5 Intel® EE for Lustre* was 5.5 times better than HDFS on 7 TB data size * Other names and brands may be claimed as the property of others.
  • 33. 33 퐶표푛푐푢푟푟푒푛푡 퐽표푏 퐴푣푒푟푎푔푒 퐸푥푒푐푢푡푖표푛 푇푖푚푒/푆푖푛푔푙푒 퐽표푏 퐸푥푒푐푢푡푖표푛 푇푖푚푒  = 2.5
  • 35. 35 Intel® EE for Lustre* software > HDFS for concurrency * Other names and brands may be claimed as the property of others.
  • 36. Conclusion q Increase in Stripe count improves Enterprise Edition for Lustre* software performance q Intel® EE for Lustre shows better performance for concurrent workload q Intel® EE for Lustre software = 3 X HDFS for single job q Intel® EE for Lustre software = 5.5 X HDFS for concurrent workload q Future work § Impact of large number of compute nodes (i.e. OSSs <<<< Nodes) * Other names and brands may be claimed as the property of others.