SlideShare ist ein Scribd-Unternehmen logo
1 von 64
Downloaden Sie, um offline zu lesen
WSCAD 2016 - XVII Simpósio em Sistemas
Computacionais de Alto Desempenho
Aracaju - Sergipe – Brazil
October, 7th - 2016
Igor Freitas
igor.freitas@intel.com
WSCAD 2016
2
Big Data Analytics
HPC != Big Data ?
*Other brands and names are the property of their respective owners.
FORTRAN / C++ Applications
MPI
High Performance
Java, Python, Go, etc.*
Applications
Hadoop*
Simple to Use
SLURM
Supports large scale startup
YARN*
More resilient of hardware failures
Lustre*
Remote Storage
HDFS*, SPARK*
Local Storage
Compute & Memory Focused
High Performance Components
Storage Focused
Standard Server Components
Server Storage
SSDs
Switch
Fabric
Infrastructure
Modelo de
Programação
Resource
Manager
Sistema de
arquivos
Hardware
Server Storage
HDDs
Switch
Ethernet
Infrastructure
WSCAD 2016
Varied Resource Needs
Typical HPC
Workloads
Typical
Big Data
Workloads
3
Big Data Analytics
HPC in real time
Small Data + Small
Compute
e.g. Data analysis
Big Data +
Small Compute
e.g. Search, Streaming,
Data Preconditioning
Small Data +
Big Compute
e.g. Mechanical Design, Multi-physics
Data
Compute
High
Frequency
Trading
Numeric
Weather
Simulation
Oil & Gas
Seismic
Systemcostbalance
Video Survey Traffic
Monitor
Personal
Digital Health
Systemcostbalance
Processor Memory Interconnect Storage
WSCAD 2016
4
Trends in HPC + Big Data
Standards
Business viability
Performance
Code Modernization
(Vector instructions)
Many-core
FPGA
Usability
Faster time-to-market
Lower costs (HPC at Cloud ? )
Better products
Easy to mantain HW & SW
Portability
Open
Commom
Environments
Integrated solutions:
Storage + Network +
Processing + Memory
Public investments
WSCAD 2016
5
Business viability
WSCAD 2016
HPCisFoundationaltoInsight
Aerospace Biology Brain Modeling Chemistry/Chemical Engineering Climate Computer Aided Engineering Cosmology Cybersecurity Defense
Pharmacology Particle Physics Metallurgy Manufacturing / Design Life Sciences Government Lab Geosciences / Oil & Gas Genomics Fluid Dynamics
1Source: IDC HPC and ROI Study Update (September 2015)
2Source: IDC 2015 Q1 World Wide x86 Sever Tracker vs IDC 2015 Q1 World Wide HPC Sever Tracker
DigitalContentCreationEDAEconomics/FinancialServicesFraudDetection
SocialSciences;Literature,linguistics,marketingUniversityAcademicWeather
Business
Innovation
A New
Science
Paradigm
Fundamental
Discovery
High ROI:
$515
Average Return Per $1 of HPC
Investment1
Advancing
Science
And Our Understanding
of the Universe
Data-Driven
Analytics
Joins Theory, Experimentation, and
Computational Science
6
WSCAD 2016
Growing Challenges in HPC
“The Walls”
System Bottlenecks
Memory | I/O | Storage
Energy Efficient Performance
Space | Resiliency |
Unoptimized Software
Divergent
Infrastructure
Barriers to
Extending Usage
Resources Split Among
Modeling and Simulation | Big
Data Analytics | Machine
Learning | Visualization
HPC
Optimized
Democratization at Every
Scale | Cloud Access |
Exploration of New Parallel
Programming Models
Big
Datahpc
Machine learning
visualization
7
WSCAD 2016
HPC & the Competitiveness of Industry & Science
of USA
Public Investments
8
• Executive order from Obamas’s president for the ‘national program of
Supercomputing’
• HPC as“Top priority” to leverage USA competitiveness
”In order to maximize the benefits
of HPC for economic competitiveness
and scientific discovery, the United
States Government must create a
coordinated Federal strategy in HPC
research, development, and
deployment”
Executive Order, Barack Obama
Fonte: The White House
Office of the Press Secretary
WSCAD 2016
HPC & the Competitiveness of Industry & Science
of USA
Public Investments
9
• U.S. makes a Top 10 supercomputer available to anyone who can
'boost' America*
• Boost American competitiveness.
• Accelerate advances in science and technology.
• Develop the country's skilled high-performance computing (HPC)
workforce.
Fonte: The White House
Office of the Press Secretary
WSCAD 2016
10
China’s New Supercomputer Puts the US Even
Further Behind*
Public Investments
• Sunway TaihuLight officially became the
fastest supercomputer in the world
• What is really means for HPC:
• Innovation throught HPC
• Gov recognition of HPC competitiveness
• Software is the key !
• Performance
• Productivity
• Programmability
*Source: https://www.wired.com/2016/06/fastest-supercomputer-sunway-taihulight/
WSCAD 2016
SIMD
Vector instructions are back
11
Performance
WSCAD 2016
12
Democratizing HPC for Big Data workloads
Performance: Vector instructions
• In the 70s and 80s Vector Machines was the rule
• Why in 90s it was ‘old stuff’ ?
• By Eugene D. Brooks the reason was simple: it was customs machines*
• The near future ?
• Vectors again ! But in general purpose CPUs
• Affordable
• Easy to code
• Associated with Multi-thread programming
*Source: https://www.hpcwire.com/2016/09/26/vectors-old-became-new-supercomputing/
Gigaflops from Vector Machines vs
Parallel Machines*
WSCAD 2016
13
Vectorization solved in 1978 ?*
Performance: Vector instructions
Source: http://lotsofcores.com/sites/lotsofcores.com/files/201404300900%20SGIUG%20Reinders%20Intel%20as%20presented.pdf
WSCAD 2016
14
Vectorization and Threading Critical on Modern Hardware
Performance: Vector instructions
Vectorized
& Threaded
Threaded
Vectorized
Serial
Key:
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark,
are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should
consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with
other products. For more information go to http://www.intel.com/performance Configurations at the end of this presentation.
WSCAD 2016
Frameworks , Libs
Bringing HPC performance to Big Data
15
Performance
WSCAD 2016
Intel® DAAL Overview
Industry leading performance, C++/Java/Python library for machine
learning and deep learning optimized for Intel® Architectures.
(De-)Compression
PCA
Statistical moments
Variance matrix
QR, SVD, Cholesky
Apriori
Linear regression
Naïve Bayes
SVM
Classifier boosting
Kmeans
EM GMM
Collaborative filtering
Neural Networks
Pre-processing Transformation Analysis Modeling Decision Making
Scientific/Engineering
Web/Social
Business
Validation
WSCAD 2016
Python* Landscape
Challenge#1:
Domain specialists are not professional software
programmers.
Adoption of Python
continues to grow among
domain specialists and
developers for its
productivity benefits
Challenge#2:
Python performance limits migration to
production systems
WSCAD 2016
Python* Landscape
Challenge#1:
Domain specialists are not professional software
programmers.
Adoption of Python
continues to grow among
domain specialists and
developers for its
productivity benefits
Challenge#2:
Python performance limits migration to
production systems
Intel’s solution is to…
 Accelerate Python performance
 Enable easy access
 Empower the community
WSCAD 2016
Example Performance: Intel® DAAL vs. Spark* MLLib
19
WSCAD 2016
PCA Performance Boosts Using Intel® DAAL vs. Spark* MLLib
on Intel® Architectures
20
4X
6X 6X
7X 7X
0
2
4
6
8
1M x 200 1M x 400 1M x 600 1M x 800 1M x 1000
Speedup
Table size
PCA (correlation method) on an 8-node Hadoop* cluster based on
Intel® Xeon® Processors E5-2697 v3
Configuration Info - Versions: Intel® Data Analytics Acceleration Library 2016, CDH v5.3.1, Apache Spark* v1.2.0; Hardware: Intel® Xeon® Processor E5-2699 v3, 2 Eighteen-core CPUs (45MB LLC, 2.3GHz), 128GB of
RAM per node; Operating System: CentOS 6.6 x86_64.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific
computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in
fully evaluating your contemplated purchases, including the performance of that product when combined with other products. * Other brands and names are the property of their respective owners. Benchmark
Source: Intel Corporation
Optimization Notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include
SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel.
Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors.
Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 .
WSCAD 2016
21
What’s New: Intel® DAAL 2017
• Neural Networks
• Python* API (a.k.a. PyDAAL)
– Easy installation through Anaconda or pip
• New data source connector for KDB+
• Open source project on GitHub
Fork me on GitHub:
https://github.com/01org/daal
WSCAD 2016
 Profile Python* and Mixed Python / C++ / Fortran*
 Tune latest Intel® Xeon Phi™ processors
 Quickly see three keys to HPC performance
 Optimize memory access
 Storage analysis: I/O bound or CPU bound?
 Enhanced OpenCL* and GPU profiling
 Easier remote and command line usage
 Add custom counters to the timeline
 Preview: Application and storage performance snapshots
 Intel® Advisor: Optimize vectorization for Intel® AVX-512
(with or without hardware)
New for 2017: Python*, FLOPS, Storage, and More…
Intel® VTune™ Amplifier Performance Profiler
22
New!
WSCAD 2016
23
Optimize Memory Access
Memory Access Analysis: Intel® VTune™ Amplifier 2017
Tune data structures for performance
 Attribute cache misses to data structures
(not just the code causing the miss)
 Support for custom memory allocators
Optimize NUMA latency and scalability
 True and false sharing optimization
 Auto detect max system bandwidth
 Easier tuning of inter-socket bandwidth
Easier install, latest processors
 No special drivers required on Linux*
 Intel® Xeon Phi™ processor MCDRAM (high-
bandwidth memory) analysis
Improved!
WSCAD 2016
Are you I/O bound or CPU bound?
 Explore imbalance between I/O operations
(async and sync) and compute.
 Storage accesses mapped to
the source code.
 See when CPU is waiting for I/O.
 Measure bus bandwidth to storage.
Latency analysis
 Tune storage accesses with
latency histogram.
 Distribution of I/O over multiple devices.
24
Storage Device Analysis (HDD, SATA, or NVMe SSD)
Intel® VTune™ Amplifier
New!
Slow task
with I/O Wait
Sliders set
thresholds for
I/O Queue Depth
WSCAD 2016
25
Intel® Performance Snapshots
Three Fast Ways to Discover Untapped Performance
Is your application making good use of modern
computer hardware?
 Run a test case during your coffee break.
 High-level summary shows which apps can
benefit most from code modernization and
faster storage.
Pick a performance snapshot:
 Application: For non-MPI apps
 MPI: For MPI apps
 Storage: For systems, servers, and
workstations with directly attached storage.
New!
New!
Free download: http://www.intel.com/performance-snapshot
Also included with Intel® Parallel Studio and Intel® VTune™ Amplifier products.
WSCAD 2016
26
Stick closely with DAAL’s overall
design
– Object-oriented, namespace hierarchy,
plug&play
Seamless interfacing with NumPy
Anaconda package
– http://anaconda.org/intel/
Co-exists with the proprietary version
Apache 2.0 license
Lives on github.com
Python API (a.k.a. PyDAAL)
...
# Create a Numpy array as our input
a = np.array([[1,2,4],
[2,1,23],
[4,23,1]])
# create a DAAL Matrix using our numpy array
m = daal.Matrix(a)
# Create algorithm objects for cholesky decomposition
computing using default method
algorithm = cholesky.Batch()
# Set input arguments of the algorithm
algorithm.input.set(cholesky.data, m)
# Compute Cholesky decomposition
res = algorithm.compute()
# Get computed Cholesky decomposition
tbl = res.get(choleskyFactor)
# get and print the numpy array
print tbl.getArray()
New
WSCAD 2016
Integrated solutions
Memory + Processor + Network + Storage
27
Performance
WSCAD 2016
28
Growing Need for New Class of Memory
Performance & Lower costs: Integrated solutions
Virtualization
Big Data & Cloud
In-Memory DB
OLTP
Workstation
Supply Chain
Mgmt
Enterprise
ERP
Database
Storage
HPC
“Give me a faster
storage interface”
“Allow in-memory
data to survive soft
reset or hard reboot”
“Minimal latency for
huge memory”
“Make large memory servers
less expensive”
WSCAD 2016
4
Bridging the Memory-Storage Gap
Intel® Optane™ Technology Based on 3D XPoint™
SSD
Intel® Optane™ SSDs 5-7x Current Flagship
NAND-Based SSDs (IOPS)1
DRAM-like performance
Intel® DIMMs Based on 3D-XPoint™
1,000x Faster than NAND1
1,000x the Endurance of NAND2
Hard drive capacities
10x More Dense than Conventional
Memory3
1Performancedifferencebasedoncomparisonbetween3DXPoint™Technologyandotherindustry NAND
2 Density differencebasedoncomparisonbetween3DXPoint™Technologyandotherindustry DRAM
2 Endurancedifferencebasedoncomparisonbetween3DXPoint™Technologyandother industryNAND
Intel® Scalable
System Framework
WSCAD 2016
CPU
DDR
INTEL®DIMMS
Intel®Optane™SSD
NAND SSD
Hard Disk Drives
1000Xfaster
Than NAND1
1000Xendurance
Of NAND2
10Xdenser
Than DRAM3
30
Intel® Scalable
System Framework
Bridging the Memory-Storage Gap
Intel® Optane™ Technology
Performance & Lower costs: Integrated solutions
1Performancedifferencebasedoncomparisonbetween3DXPoint™Technologyandotherindustry NAND
2 Density differencebasedoncomparisonbetween3DXPoint™Technologyandotherindustry DRAM
2 Endurancedifferencebasedoncomparisonbetween3DXPoint™Technologyandother industryNAND
Data granularity:
64B cacheline
WSCAD 2016
Yesterday Today Near Future
31
Storage Evolution
Performance & Lower costs: Integrated solutions
Memory
&
Storage
Storage
NAND based Intel P3700
(Fultondale) for NVMe
3D XPoint™ based
Coldstream SSD for NVMe
3D XPoint™ based
Apache Pass (AEP) for DDR4
Revolutionary
Storage
Class Memory
World’s
Fastest
NVMe SSD
3D XPoint enables world’s fastest NVMe SSD and
revolutionary storage class memory
WSCAD 2016
32
Code Modernization
Democratizing HPC performance for Big Data workloads
Easy of use
Fine tuning
Vectors
Intel® Math Kernel Library
Array Notation: Intel® Cilk™ Plus
Auto vectorization
Semi-auto vectorization:
#pragma (vector, ivdep, simd)
C/C++ Vector Classes
(F32vec16, F64vec8)
Intel® Data Analytics Acceleration Library
Coprocessor
Fabric
Memory
Memory Bandwidth
~500 GB/s STREAM
Memory Capacity
Over 25x* KNC
Resiliency
Systems scalable to >100 PF
Power Efficiency
Over 25% better than card1
I/O
Up to 100 GB/s with int fabric
Cost
Less costly than discrete
parts1
Flexibility
Limitless configurations
Density
3+ KNL with fabric in 1U3
Knights Landing
*Comparison to 1st Generation Intel® Xeon Phi™ 7120P Coprocessor (formerly codenamed Knights Corner)
1Results based on internal Intel analysis using estimated power consumption and projected component pricing in the 2015timeframe. This analysis is
provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance.
2Comparison to a discrete Knights Landing processor and discrete fabric component.
3Theoretical density for air-cooled system; other cooling solutions and configurations will enable lower or higher density.
Server Processor
WSCAD 2016
KNL and KNL-F Processors:
 Knights Landing IS the host processor
 Boots standard off-the-shelf OS’s
Benefits:
 Higher performance density for highly
parallel applications2
 Reduced system power consumption2
 Higher perf/Watt & perf/$$3
Knights Landing Coprocessor:
 Solution for general purpose servers
and workstations
Benefits:
 Targeted for applications with larger
sections of serial work1
 Upgrade path from Knights Corner as
PCIe card
Knights Landing Processor
“Self-boot” Intel® Xeon Phi™
processor platform
1 Projections based on early product definition and as compared to prior generation Intel® Xeon Phi™ Coprocessors
2 Based on Intel internal analysis.. Lower power based on power consumption estimates between (2) HCAs compared to 15W additional power for KNL-F. Higher density based on removal of PCIe slots and associated
HCAs populated in those slots.
3 Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. 2
Results based on internal Intel analysis using estimated theoretical Flops/s for KNL processors, along with estimated system power consumption and component pricing in the 2015 timeframe. See backup for complete
system configurations.
KNL-
F
KNL
Three Knights Landing Products
Knights Landing Coprocessor
Requires Intel® Xeon® processor
host
Adams Pass
Platform
KNL
Coprocessor
WSCAD 2016
DDR4
x4 DMI2 to PCH
36 Lanes PCIe* Gen3 (x16, x16, x4)
MCDRAM MCDRAM
MCDRAM MCDRAM
DDR4
TILE:
(up to
36)
Tile IMC (integrated memory controller)EDC (embedded DRAM controller) IIO (integrated I/O controller)
KNL
Package
Enhanced Intel® Atom™ cores based on
Silvermont Microarchitecture
 2D Mesh Architecture
 Out-of-Order Cores
 3X single-thread vs. KNC
ISA
Intel® Xeon® Processor Binary-Compatible (w/Broadwell)
On-package memory
Up to 16GB, ~465 GB/s STREAM at launch
Fixed Bottlenecks
Platform Memory
Up to 384GB (6ch DDR4-2400 MHz)
2VPU
Core
2VPU
Core
1MB
L2
HUB
KNL Architecture Overview
Bi-directional
tile connections
(same bit width
as Xeon core
interconnect)
34
WSCAD 2016
F
CONNECTOR
 Lower cost: cost adder expected to be lower than (2) adapters or on-board controllers
 Lower power: only ~15W TDP adder, which expected to be less than (2) adapters
 Higher density: enables denser form factor – no slots, adapters, on-board controllers
 Future-ready: sets stage for future hetero clusters (future Intel® Xeon® processor w/ int fabric)
1. KNL with TWO Fabric Adapters
(2) x16 PCIe slots
(2) x16 PCIe lanes
2. KNL w/ TWO on-board controllers
3. KNL-F with Storm Lake
Fabric
controller
Fabric
controller
QSPF
connector
QSPF
connector
1 Based on Intel internal estimates. Lower cost based on expected price delta between KNL and KNL-F processor, compared to two InfiniBand* or Storm Lake HCA via PCIe Express slots. Lower power based on power
consumption estimates between (2) HCAs (~20W)compared to 15W additional power for KNL-F over a comparable KNL processor.. Higher density based on removal of PCIe slots and HCAs populated in those slots.
QSPF
connector
QSPF
connector
QSFP
module
Same socket for KNL and KNL-F
Design common platform with keep-out zone
and to support additional 15W TDP.
KNL-F Benefits:1
Why KNL-F? (Integrated Fabric)
Dual-Port
100 GB/s bi-
directional
35
WSCAD 2016
Integrated Fabric CPU Requirements
Components required to support CPU with Integrated Fabric
with two ports
 (1) IFP Cable [supporting two ports]
 (1) 2-port “Carrier card” (two main options)
– PCB that plugs into a PCIe slot (aka “PCIe carrier card”)
– Custom OEM PCB with power and sideband cables
 PCIe “Carrier card” implementation requires:
– PCB, (2) IFT connectors, (2) IFT cages, sideband cable
PCIe carrier board, 2-port version
(sideband cable and IFT connectors and
cages on underside of the card)
(2) Internal-to-Faceplate
Processor (IFP) cable
supporting two-ports
(1) Internal Faceplate
Transition (IFT)
Connector
EACH port requires:
(1) IFT Cage
36
IFT Carrier Card design kit (including BOM and
design guide) is now posted on IBL (Doc#558210)
WSCAD 2016
Tighter Component Integration
Benefits
Bandwidth
Density
Latency
Power
Cost
Cores
Graphics
Fabric
FPGA
I/O
Memory
Intel® Scalable
System Framework
37
WSCAD 2016
Source: IDC 2014 (Worldwide High-Performance Systems Revenue by Applications) and https://software.intel.com/en-us/file/xeonphi-catalogpdf/download
CAE
Geosciences
Weather
Other
Mechanical Design
DCC & Distrib
Defense
University /
Academic
Government Lab
Bio-Sciences
EDA / IT / ISV
Economics /
Financial
Chem
Engineering
Balanced ApplicationsMemory Bandwidth Intensive Compute Intensive
CAE
Altair RADIOSS*
Ansys* Mechanical
Matevo MinFE
SIMULIA Abaqus*
Financial Services
Binomial Options Pricing Model
Binomial SP and DP
BlackScholes Merton Formula
BlackScholes SP and DP
Monte Carlo European Options Pricing
Monte Carlo RND SP and DP
Monte Carlo SP and DP
STAC A2
Xcelerit
Bioinformatics
BLAST
Bowtie 2
Burrows Wheeler Alignment (BWA)
Cry-EM Technique
MPI-HMMER 2.3
Computational Chemistry
DiRAC Codes
GAMESS
Integral Calculation Library
NEURON
NWChem
Molecular Dynamics
AMBER
BUDE
DL_POLY
GROMACS
LAMMPS
NAMD
Geophysics
ELMER/Ice
SeisSol
SPECFEM3D Cartesian
UTBench
Climate/Weather
ADCIRC
CAMS
CFSv2
COSMO
ECHAM6
HARMONIE
HBM
MPAS
NOAA NIM
WRF
Digital Content Creation
EMBREE
Superresolution processing
Energy
Acceleware* AxRTM
DownUnder GeoSolutions
ISO3DFD
RTM Petrobras
TTI 3DFD
CFD
AVBP
FrontFlow/Blue code
LBS3D
NASA Overflow
OpenFOAM
OpenLB
ROTORSIM
SU2
TAU and TRACE
software.intel.com/XeonPhiCatalog
Intel® Xeon Phi™ Application Catalog
Over 100 applications to date listed as available or in-flight
38
WSCAD 2016
Developer Tools for Knights Landing Platform
Intel Parallel Studio XE Component Supported features in PSXE 2016 Gold
Intel ® C/C++ and Fortran compilers
16.0
1) -xMIC-AVX512 compiler option enables KNL specific optimizations, including
loop optimizations and vectorization
2) Use Intel® Fortran compiler to build for MCDRAM
Intel® Math Kernel Library 11.3 Partial optimizations for all major MKL domains (BLAS, FFT, Sparse BLAS, VML,
VSL) are delivered via AVX512 optimizations.
Intel® MPI 5.1.1 and ITAC 9.1 Support for KNL platform and initial performance tuning is part of Intel MPI 5.1.1
VTune Amplifier XE 2016 (NDA
package)
Collection on KNL targets: advanced hotspots and custom event collection
based on SEP and perf; User API;
Analysis types for KNL profiling: advanced hotspots with full OpenMP analysis,
custom events (core and uncore) Intel MPI spins, general exploration
HBM profiling on Xeon with KNL Bandwidth modeling
Advisor XE 2016 (NDA Package) Survey analysis for AVX512 (includes hotspot collection and compiler static
data)
Data Analytics Acceleration Library
2016
Includes KNL-specific performance optimizations
Intel® Integrated Performance
Primitives 9.0
More than 70% of hot list functions have AVX512 optimizations
WSCAD 2016
Integrated solutions
Xeon + FPGA
40
Performance
WSCAD 2016
41
Skylake + FPGA Target Workloads
Performance and Lower costs: FPGA
FPGA Activity Workload Examples
Compute intensive algorithms
 Visual Understanding/Deep Learning classification
 Compression/decompression
 Video Motion Estimation
 Genomics (Pair HMM, Smith Waterman)
 Memory copy routines
Latency sensitive pre-filtering &
processing for CPU
 Bump in the wire network processing
 FSI market data pre-filtering
 HPC Radar data pre-processing
 Automotive video input
 Security appliance, targeted Vswitch
Evolving algorithms or stable algorithms
on low latency and inline interconnect
 New compression algorithms
 High compression ratios
 Custom crypto algorithms
WSCAD 2016
42
Skylake + FPGA on Purley
Performance and Lower costs: FPGA
PCIe3.0x16
UPI2
Prog I/F
UPI0
PCIe3.0x16
UPI1
DMIx4
DDR4
PCIe 3.0 x8
PCIe 3.0 x8
HSSI
SKL FPGA
DDR4
DDR4
DDR4
DDR4
DDR4
Cores Up to 28C with Intel® HT Technology
FPGA Altera® Arria 10 GX 1150
Socket TDP
Shared socket TDP
Up to 165W SKL & Up to 90W FPGA
Socket Socket P
Scalability Up to 2S – with SKL-SP or SKL + FPGA SKUs
PCH
Lewisburg: DMI3 – 4 lanes; 14xUSB2 ports
Up to: 10xUSB3; 14xSATA3, 20xPCIe*3 New: Innovation
Engine, 4x10GbE ports, Intel® QuickAssist Technology
For CPU For FPGA
Memory
6 channels DDR4
RDIMM, LRDIMM,
Apache Pass DIMMs
Low latency access to
system memory via UPI &
PCIe interconnect2666 1DPC,
2133, 2400 2DPC
Intel® UPI
2 channels
(10.4, 9.6 GT/s)
1 channel
(9.6 GT/s)
PCIe*
PCIe* 3.0
(8.0, 5.0, 2.5 GT/s)
PCIe* 3.0
(8.0, 5.0, 2.5 GT/s)
32 lanes per CPU
Bifurcation support:
x16, x8, x4
16 lanes per FPGA
Bifurcation support:
x8
High Speed
Serial Interface
(Different board
design based on
HSSI config)
N/A
2xPCIe 3.0 x8
Direct Ethernet
(4x10 GbE, 2x40 GbE,
10x10 GbE, 2x25 GbE)
 Power for FPGA is drawn from socket & requires modified
Purley platform specs
 Platform Modifications include Stackup, Clock, Power
Delivery, Debug, Power up/down sequence, Misc. I/O pins
WSCAD 2016
43
Open standards
WSCAD 2016
CurrentStateofSystemSoftwareEffortsinHPCEcosystem
44
THE REALITY: We, the HPC ecosystem, will not be able to get to where we
want to go without a major change in system software development.
With system margins under
pressure, unwillingness to
invest in system software
A desire to get exascale
performance & speed up
software adoption of HW
innovation
Fragmented efforts across the
ecosystem – “Everyone
building their own solution.”
New complex workloads (ML,
Big Data, etc) drive more
complexity into the software
stack
WSCAD 2016
Stable HPC System Software that:
Fuels a vibrant and efficient HPC software ecosystem
Takes advantage of hardware innovation & drives
revolutionary technologies
Eases traditional HPC application development and
testing at scale
Extends to new workloads (ML, analytics, big data)
Accommodates new environments (i.e. cloud)
A Shared Repository
DesiredFutureState
2
WSCAD 2016
OfficialMembersasof6/1/2016
Goal: A common system software platform for the HPC community that works across
multiple segments and on which ecosystem partners can collaborate and innovate
46
WSCAD 2016
HPCsystemsoftwareStackComponentView
47
 Intra-stack APIs to allow for customization/differentiation
 External APIs to develop on and around the stack
WSCAD 2016
48
OpenHPCtoIntel®HPCOrchestratorsystemsoftwareproducts
Intel HPC
Orchestrator
products
• Premium
Software
• Advanced
testing
• Support
An open source community
for HPC software
Intel seeded the community
with a pre-integrated, pre-
tested and validated HPC
system software stack & will
continue contributions along
with other members of the
community
Intel will
offer Intel-
supported
products
based on the
open source
OpenHPC
software
Intel HPC Orchestrator products are the realization of the
software portion of Intel® Scalable System Framework
Intel® Scalable System Framework
WSCAD 2016
49
Open Source acellerating HPC + Big Data
Open Standards
• PBS Pro is now Open
• OpenHPC
• Cloud for HPC
• And how about Brazil ?
• Intel Innovation Center at Rio – partnership with AMT (www.amt.com.br)
Pay less + easy of use = democratizing HPC for Big Data
WSCAD 2016
50
What about Brazil ?
WSCAD 2016
Intel’s HPC initiatives in Brazil
Code Modernization – Open source softwares
51
• Modernizing applications to increase parallelism and
scalability
• Leverage cores, caches, threads, and vector capabilities of
microprocessors and coprocessors.
• Current centers in Brazil
WSCAD 2016
Intel Modern Code Partner program
52
Intel Modern Code Partners
Code Modernization – driving developers to develop modern code to modern hardware
• Create Faster Code…Faster
• High Performance Scalable Code
• C++, C, Fortran*, Python* and Java*
• Standards-driven parallel models:
• OpenMP*, MPI, and TBB
• To teach developers how to full exploit Xeon and
Xeon Phi performance: vectors + multi-treading
More at: http://software.intel.com/moderncode
Free HPC & Big Data Workshops accross Brazil
WSCAD 2016
Code Modernization initiatives in the
Brazilian HPC Ecosystem
Oil & Gas - Reservoir Simulator
at PETROBRAS
LNCC - National Laboratory for Scientific Computing
Largest HPC cluster in Latin America
INPE/CPTEC
Code Modernization of BRAMS
• Up to 10.5x performance
gains in their
Reservoir Simulator software¹
• Up to 30x performance gain
in Oil & Gas applications²
• Up to 3.4x speedup via
AVX (vector instructions)
• Link white-paper
• Initial results – white-paper link
Health & Life Sciences
• Up to 11x speedup in Molecular Dynamics –
NCC/UNESP & LNCC – white-paper link
• Xeon only:
• Original code vs Modernized code: up to 11x speedup
• Xeon + 1 Xeon Phi (same optimized code)
• 1.14x speedup
Article link
Authors:
¹CENPES team and Gilvan Vieira - gilvandsv@gmail.com
²LNCC - Frederico Cabral - fredluiscabral@gmail.com
³NCC/UNESP - Silvio Stanzani silvio.stanzani@gmail.com
WSCAD 2016
54
Conclusions
WSCAD 2016
55
Conclusions
• Likewise other products, technologies and services we have:
“Lower cost + Scale + Easy of usage”
would drive HPC to the masses
1st wave: near Bare-metal at the Cloud (lower cost + scale)
2nd wave: Frameworks offering “free performance” to unlock insights ( usability )
3rd wave: Even small and medium business will relay on HPC / Big Data to drive business
WSCAD 2016
56
Big Data Analytics
Integrated solutions: HPC && Big Data
*Other names and brands may be claimed as the property of others
HPC Big Data
FORTRAN / C++ Applications
MPI
High Performance
Python, Frameworks, Java* Applications,
Others
Hadoop* / Spark / Others
Simple to Use
Lustre* with Hadoop* Adapter
Remote Storage
Compute & Big Data Capable
Scalable Performance Components
Server
Storage
(SSDs and
Burst Buffers)
Intel®
Omni-Path
Architecture
Infrastructur
e
Programming
Model
Resource
Manager
File System
Hardware
HPC & Big Data-Aware Resource Manager
WSCAD 2016
Next steps for HPC & Big Data
New paradigm in memory and storage
Processor
Compute
Node
I/O Node
Remote
Storage
Compute
Today
Caches
Local Memory
SSD Storage
Parallel File System
(Hard Drive Storage)
HigherBandwidth.
LowerLatencyandCapacity
Some remote data moves
onto I/O node
I/O Node storage moves to
compute node
Local memory is now faster &
in processor package
Compute
Future
Caches
Non-Volatile Memory
Burst Buffer Storage
Parallel File System
(Hard Drive Storage)
In-Package High
Bandwidth Memory*
*cache, memory or hybrid mode
WSCAD 2016
Conclusions
A Holistic Architectural Approach is Required
Compute
Memory
Fabric
Storage
PERFORMANCEICAPABILITY
TIME
System
Software
Innovative Technologies Tighter Integration
Application
Modernized Code
Community
ISV
Proprietary
System
Memory
Cores
Graphics
Fabric
FPGA
I/O
58
WSCAD 2016
59
Links
WSCAD 2016
14
A Global Online Community
Intel® Modern Code Developer Community
Developer
zone
- Vectorization/Single Instruction, Multiple Data (SIMD)
- Multi-Threading
- Multi Node/Clustering
- Take Advantage of On-Package High-Bandwidth Memory
- Increase Memory and Power Efficiency
Topics
Experts
software.intel.com/moderncode
- Modern Code Zone
- Software Tools, Training Webinars
- How-to guides, Parallel Programming BKMs
- Remote Access to Hardware
- Support Forums
- Black Belts, & Intel Engineer Experts
- Technical Content, Training -Webinars, F2F, Forum Support
- Conference and Tradeshows: Keynotes, Presentations, BOFs,
Demos, Tutorials
WSCAD 2016
61
Machine/Deep Learning | Resources
Training Classes:
U.Oxford Class on Deep Learning
Stanford Class on Machine Learning
Google Class on Deep Learning
Intel Caffe Repo: (Support for Multi-node Training)
https://github.com/intelcaffe/caffe
Spark MLLib Repo:
http://spark.apache.org/mllib/
Intel Machine Learning Blog Posts:
Myth Busted - CPUs and Neural Network Training
Caffe Scoring on Xeon Processors
Caffe Training on Multi-node Distributed Memory Systems
Trusted Analytics Platform:
http://trustedanalytics.org/
Performance Libraries:
MKL for Neural Networks - Technical Preview
Math Kernel Library
MKL Community License
Data Analytics Acceleration Library
WSCAD 2016
62
Links
• Intel Modern Code: https://software.intel.com/pt-br/modern-code
• Intel Developer Zone
• Intel Compiler Reference 2016
• Intel Intrinsics reference
• Guide to Auto-vectorization
• Xeon Phi™ Home Page
• Xeon Phi™ CODE RECIPES
• Intel Parallel Computing Centers
WSCAD 2016
Legal Disclaimer & Optimization Notice
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO
ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND
INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR
WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT,
COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software,
operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information
and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product
when combined with other products.
Copyright © 2016, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are
trademarks of Intel Corporation in the U.S. and other countries.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel
microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the
availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent
optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture
are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the
specific instruction sets covered by this notice.
Notice revision #20110804
63
64

Weitere ähnliche Inhalte

Was ist angesagt?

Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)Spark Summit
 
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)Spark Summit
 
Apache Spark and future of advanced analytics
Apache Spark and future of advanced analyticsApache Spark and future of advanced analytics
Apache Spark and future of advanced analyticsMuralidhar Somisetty
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!DataWorks Summit/Hadoop Summit
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahDatabricks
 
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDatabricks
 
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...Databricks
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...inside-BigData.com
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionEtu Solution
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Precisely
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Big Data Spain
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataInMobi Technology
 
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...Sri Ambati
 
Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionDataWorks Summit
 
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data StreamsBlue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data StreamsDatabricks
 

Was ist angesagt? (20)

Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
 
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
 
Apache Spark and future of advanced analytics
Apache Spark and future of advanced analyticsApache Spark and future of advanced analytics
Apache Spark and future of advanced analytics
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
 
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
 
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
 
Practical advice to build a data driven company
Practical advice to build a data driven companyPractical advice to build a data driven company
Practical advice to build a data driven company
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
 
AI meets Big Data
AI meets Big DataAI meets Big Data
AI meets Big Data
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Spark meets Smart Meters
Spark meets Smart MetersSpark meets Smart Meters
Spark meets Smart Meters
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big Data
 
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
 
Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in Action
 
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data StreamsBlue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
 

Ähnlich wie Trends towards the merge of HPC + Big Data systems

Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...Igor José F. Freitas
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIBM Switzerland
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next DecadePaula Koziol
 
Inside story on Intel Data Center @ IDF 2013
Inside story on Intel Data Center @ IDF 2013Inside story on Intel Data Center @ IDF 2013
Inside story on Intel Data Center @ IDF 2013Intel IT Center
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciIntel® Software
 
IBM POWER - An ideal platform for scale-out deployments
IBM POWER - An ideal platform for scale-out deploymentsIBM POWER - An ideal platform for scale-out deployments
IBM POWER - An ideal platform for scale-out deploymentsthinkASG
 
IBM Power Systems: Designed for Data
IBM Power Systems: Designed for DataIBM Power Systems: Designed for Data
IBM Power Systems: Designed for DataIBM Power Systems
 
Intel xeon e5v3 y sdi
Intel xeon e5v3 y sdiIntel xeon e5v3 y sdi
Intel xeon e5v3 y sdiTelecomputer
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaIntel® Software
 
IBM Power leading Cognitive Systems
IBM Power leading Cognitive SystemsIBM Power leading Cognitive Systems
IBM Power leading Cognitive SystemsHugo Blanco
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsAli Hodroj
 
Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Intel® Software
 
IBM: The Linux Ecosystem
IBM: The Linux EcosystemIBM: The Linux Ecosystem
IBM: The Linux EcosystemKangaroot
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...tdc-globalcode
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors DataWorks Summit/Hadoop Summit
 
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Indrajit Poddar
 
Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Intel® Software
 
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors Rebekah Rodriguez
 
Architecting the Cloud Infrastructure for the Future with Intel
Architecting the Cloud Infrastructure for the Future with IntelArchitecting the Cloud Infrastructure for the Future with Intel
Architecting the Cloud Infrastructure for the Future with IntelIntel IT Center
 
GTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERGTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERAchronix
 

Ähnlich wie Trends towards the merge of HPC + Big Data systems (20)

Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next Decade
 
Inside story on Intel Data Center @ IDF 2013
Inside story on Intel Data Center @ IDF 2013Inside story on Intel Data Center @ IDF 2013
Inside story on Intel Data Center @ IDF 2013
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
 
IBM POWER - An ideal platform for scale-out deployments
IBM POWER - An ideal platform for scale-out deploymentsIBM POWER - An ideal platform for scale-out deployments
IBM POWER - An ideal platform for scale-out deployments
 
IBM Power Systems: Designed for Data
IBM Power Systems: Designed for DataIBM Power Systems: Designed for Data
IBM Power Systems: Designed for Data
 
Intel xeon e5v3 y sdi
Intel xeon e5v3 y sdiIntel xeon e5v3 y sdi
Intel xeon e5v3 y sdi
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
 
IBM Power leading Cognitive Systems
IBM Power leading Cognitive SystemsIBM Power leading Cognitive Systems
IBM Power leading Cognitive Systems
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)
 
IBM: The Linux Ecosystem
IBM: The Linux EcosystemIBM: The Linux Ecosystem
IBM: The Linux Ecosystem
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
 
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
 
Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture
 
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
 
Architecting the Cloud Infrastructure for the Future with Intel
Architecting the Cloud Infrastructure for the Future with IntelArchitecting the Cloud Infrastructure for the Future with Intel
Architecting the Cloud Infrastructure for the Future with Intel
 
GTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERGTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWER
 

Kürzlich hochgeladen

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 

Kürzlich hochgeladen (20)

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 

Trends towards the merge of HPC + Big Data systems

  • 1. WSCAD 2016 - XVII Simpósio em Sistemas Computacionais de Alto Desempenho Aracaju - Sergipe – Brazil October, 7th - 2016 Igor Freitas igor.freitas@intel.com
  • 2. WSCAD 2016 2 Big Data Analytics HPC != Big Data ? *Other brands and names are the property of their respective owners. FORTRAN / C++ Applications MPI High Performance Java, Python, Go, etc.* Applications Hadoop* Simple to Use SLURM Supports large scale startup YARN* More resilient of hardware failures Lustre* Remote Storage HDFS*, SPARK* Local Storage Compute & Memory Focused High Performance Components Storage Focused Standard Server Components Server Storage SSDs Switch Fabric Infrastructure Modelo de Programação Resource Manager Sistema de arquivos Hardware Server Storage HDDs Switch Ethernet Infrastructure
  • 3. WSCAD 2016 Varied Resource Needs Typical HPC Workloads Typical Big Data Workloads 3 Big Data Analytics HPC in real time Small Data + Small Compute e.g. Data analysis Big Data + Small Compute e.g. Search, Streaming, Data Preconditioning Small Data + Big Compute e.g. Mechanical Design, Multi-physics Data Compute High Frequency Trading Numeric Weather Simulation Oil & Gas Seismic Systemcostbalance Video Survey Traffic Monitor Personal Digital Health Systemcostbalance Processor Memory Interconnect Storage
  • 4. WSCAD 2016 4 Trends in HPC + Big Data Standards Business viability Performance Code Modernization (Vector instructions) Many-core FPGA Usability Faster time-to-market Lower costs (HPC at Cloud ? ) Better products Easy to mantain HW & SW Portability Open Commom Environments Integrated solutions: Storage + Network + Processing + Memory Public investments
  • 6. WSCAD 2016 HPCisFoundationaltoInsight Aerospace Biology Brain Modeling Chemistry/Chemical Engineering Climate Computer Aided Engineering Cosmology Cybersecurity Defense Pharmacology Particle Physics Metallurgy Manufacturing / Design Life Sciences Government Lab Geosciences / Oil & Gas Genomics Fluid Dynamics 1Source: IDC HPC and ROI Study Update (September 2015) 2Source: IDC 2015 Q1 World Wide x86 Sever Tracker vs IDC 2015 Q1 World Wide HPC Sever Tracker DigitalContentCreationEDAEconomics/FinancialServicesFraudDetection SocialSciences;Literature,linguistics,marketingUniversityAcademicWeather Business Innovation A New Science Paradigm Fundamental Discovery High ROI: $515 Average Return Per $1 of HPC Investment1 Advancing Science And Our Understanding of the Universe Data-Driven Analytics Joins Theory, Experimentation, and Computational Science 6
  • 7. WSCAD 2016 Growing Challenges in HPC “The Walls” System Bottlenecks Memory | I/O | Storage Energy Efficient Performance Space | Resiliency | Unoptimized Software Divergent Infrastructure Barriers to Extending Usage Resources Split Among Modeling and Simulation | Big Data Analytics | Machine Learning | Visualization HPC Optimized Democratization at Every Scale | Cloud Access | Exploration of New Parallel Programming Models Big Datahpc Machine learning visualization 7
  • 8. WSCAD 2016 HPC & the Competitiveness of Industry & Science of USA Public Investments 8 • Executive order from Obamas’s president for the ‘national program of Supercomputing’ • HPC as“Top priority” to leverage USA competitiveness ”In order to maximize the benefits of HPC for economic competitiveness and scientific discovery, the United States Government must create a coordinated Federal strategy in HPC research, development, and deployment” Executive Order, Barack Obama Fonte: The White House Office of the Press Secretary
  • 9. WSCAD 2016 HPC & the Competitiveness of Industry & Science of USA Public Investments 9 • U.S. makes a Top 10 supercomputer available to anyone who can 'boost' America* • Boost American competitiveness. • Accelerate advances in science and technology. • Develop the country's skilled high-performance computing (HPC) workforce. Fonte: The White House Office of the Press Secretary
  • 10. WSCAD 2016 10 China’s New Supercomputer Puts the US Even Further Behind* Public Investments • Sunway TaihuLight officially became the fastest supercomputer in the world • What is really means for HPC: • Innovation throught HPC • Gov recognition of HPC competitiveness • Software is the key ! • Performance • Productivity • Programmability *Source: https://www.wired.com/2016/06/fastest-supercomputer-sunway-taihulight/
  • 11. WSCAD 2016 SIMD Vector instructions are back 11 Performance
  • 12. WSCAD 2016 12 Democratizing HPC for Big Data workloads Performance: Vector instructions • In the 70s and 80s Vector Machines was the rule • Why in 90s it was ‘old stuff’ ? • By Eugene D. Brooks the reason was simple: it was customs machines* • The near future ? • Vectors again ! But in general purpose CPUs • Affordable • Easy to code • Associated with Multi-thread programming *Source: https://www.hpcwire.com/2016/09/26/vectors-old-became-new-supercomputing/ Gigaflops from Vector Machines vs Parallel Machines*
  • 13. WSCAD 2016 13 Vectorization solved in 1978 ?* Performance: Vector instructions Source: http://lotsofcores.com/sites/lotsofcores.com/files/201404300900%20SGIUG%20Reinders%20Intel%20as%20presented.pdf
  • 14. WSCAD 2016 14 Vectorization and Threading Critical on Modern Hardware Performance: Vector instructions Vectorized & Threaded Threaded Vectorized Serial Key: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance Configurations at the end of this presentation.
  • 15. WSCAD 2016 Frameworks , Libs Bringing HPC performance to Big Data 15 Performance
  • 16. WSCAD 2016 Intel® DAAL Overview Industry leading performance, C++/Java/Python library for machine learning and deep learning optimized for Intel® Architectures. (De-)Compression PCA Statistical moments Variance matrix QR, SVD, Cholesky Apriori Linear regression Naïve Bayes SVM Classifier boosting Kmeans EM GMM Collaborative filtering Neural Networks Pre-processing Transformation Analysis Modeling Decision Making Scientific/Engineering Web/Social Business Validation
  • 17. WSCAD 2016 Python* Landscape Challenge#1: Domain specialists are not professional software programmers. Adoption of Python continues to grow among domain specialists and developers for its productivity benefits Challenge#2: Python performance limits migration to production systems
  • 18. WSCAD 2016 Python* Landscape Challenge#1: Domain specialists are not professional software programmers. Adoption of Python continues to grow among domain specialists and developers for its productivity benefits Challenge#2: Python performance limits migration to production systems Intel’s solution is to…  Accelerate Python performance  Enable easy access  Empower the community
  • 19. WSCAD 2016 Example Performance: Intel® DAAL vs. Spark* MLLib 19
  • 20. WSCAD 2016 PCA Performance Boosts Using Intel® DAAL vs. Spark* MLLib on Intel® Architectures 20 4X 6X 6X 7X 7X 0 2 4 6 8 1M x 200 1M x 400 1M x 600 1M x 800 1M x 1000 Speedup Table size PCA (correlation method) on an 8-node Hadoop* cluster based on Intel® Xeon® Processors E5-2697 v3 Configuration Info - Versions: Intel® Data Analytics Acceleration Library 2016, CDH v5.3.1, Apache Spark* v1.2.0; Hardware: Intel® Xeon® Processor E5-2699 v3, 2 Eighteen-core CPUs (45MB LLC, 2.3GHz), 128GB of RAM per node; Operating System: CentOS 6.6 x86_64. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. * Other brands and names are the property of their respective owners. Benchmark Source: Intel Corporation Optimization Notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 .
  • 21. WSCAD 2016 21 What’s New: Intel® DAAL 2017 • Neural Networks • Python* API (a.k.a. PyDAAL) – Easy installation through Anaconda or pip • New data source connector for KDB+ • Open source project on GitHub Fork me on GitHub: https://github.com/01org/daal
  • 22. WSCAD 2016  Profile Python* and Mixed Python / C++ / Fortran*  Tune latest Intel® Xeon Phi™ processors  Quickly see three keys to HPC performance  Optimize memory access  Storage analysis: I/O bound or CPU bound?  Enhanced OpenCL* and GPU profiling  Easier remote and command line usage  Add custom counters to the timeline  Preview: Application and storage performance snapshots  Intel® Advisor: Optimize vectorization for Intel® AVX-512 (with or without hardware) New for 2017: Python*, FLOPS, Storage, and More… Intel® VTune™ Amplifier Performance Profiler 22 New!
  • 23. WSCAD 2016 23 Optimize Memory Access Memory Access Analysis: Intel® VTune™ Amplifier 2017 Tune data structures for performance  Attribute cache misses to data structures (not just the code causing the miss)  Support for custom memory allocators Optimize NUMA latency and scalability  True and false sharing optimization  Auto detect max system bandwidth  Easier tuning of inter-socket bandwidth Easier install, latest processors  No special drivers required on Linux*  Intel® Xeon Phi™ processor MCDRAM (high- bandwidth memory) analysis Improved!
  • 24. WSCAD 2016 Are you I/O bound or CPU bound?  Explore imbalance between I/O operations (async and sync) and compute.  Storage accesses mapped to the source code.  See when CPU is waiting for I/O.  Measure bus bandwidth to storage. Latency analysis  Tune storage accesses with latency histogram.  Distribution of I/O over multiple devices. 24 Storage Device Analysis (HDD, SATA, or NVMe SSD) Intel® VTune™ Amplifier New! Slow task with I/O Wait Sliders set thresholds for I/O Queue Depth
  • 25. WSCAD 2016 25 Intel® Performance Snapshots Three Fast Ways to Discover Untapped Performance Is your application making good use of modern computer hardware?  Run a test case during your coffee break.  High-level summary shows which apps can benefit most from code modernization and faster storage. Pick a performance snapshot:  Application: For non-MPI apps  MPI: For MPI apps  Storage: For systems, servers, and workstations with directly attached storage. New! New! Free download: http://www.intel.com/performance-snapshot Also included with Intel® Parallel Studio and Intel® VTune™ Amplifier products.
  • 26. WSCAD 2016 26 Stick closely with DAAL’s overall design – Object-oriented, namespace hierarchy, plug&play Seamless interfacing with NumPy Anaconda package – http://anaconda.org/intel/ Co-exists with the proprietary version Apache 2.0 license Lives on github.com Python API (a.k.a. PyDAAL) ... # Create a Numpy array as our input a = np.array([[1,2,4], [2,1,23], [4,23,1]]) # create a DAAL Matrix using our numpy array m = daal.Matrix(a) # Create algorithm objects for cholesky decomposition computing using default method algorithm = cholesky.Batch() # Set input arguments of the algorithm algorithm.input.set(cholesky.data, m) # Compute Cholesky decomposition res = algorithm.compute() # Get computed Cholesky decomposition tbl = res.get(choleskyFactor) # get and print the numpy array print tbl.getArray() New
  • 27. WSCAD 2016 Integrated solutions Memory + Processor + Network + Storage 27 Performance
  • 28. WSCAD 2016 28 Growing Need for New Class of Memory Performance & Lower costs: Integrated solutions Virtualization Big Data & Cloud In-Memory DB OLTP Workstation Supply Chain Mgmt Enterprise ERP Database Storage HPC “Give me a faster storage interface” “Allow in-memory data to survive soft reset or hard reboot” “Minimal latency for huge memory” “Make large memory servers less expensive”
  • 29. WSCAD 2016 4 Bridging the Memory-Storage Gap Intel® Optane™ Technology Based on 3D XPoint™ SSD Intel® Optane™ SSDs 5-7x Current Flagship NAND-Based SSDs (IOPS)1 DRAM-like performance Intel® DIMMs Based on 3D-XPoint™ 1,000x Faster than NAND1 1,000x the Endurance of NAND2 Hard drive capacities 10x More Dense than Conventional Memory3 1Performancedifferencebasedoncomparisonbetween3DXPoint™Technologyandotherindustry NAND 2 Density differencebasedoncomparisonbetween3DXPoint™Technologyandotherindustry DRAM 2 Endurancedifferencebasedoncomparisonbetween3DXPoint™Technologyandother industryNAND Intel® Scalable System Framework
  • 30. WSCAD 2016 CPU DDR INTEL®DIMMS Intel®Optane™SSD NAND SSD Hard Disk Drives 1000Xfaster Than NAND1 1000Xendurance Of NAND2 10Xdenser Than DRAM3 30 Intel® Scalable System Framework Bridging the Memory-Storage Gap Intel® Optane™ Technology Performance & Lower costs: Integrated solutions 1Performancedifferencebasedoncomparisonbetween3DXPoint™Technologyandotherindustry NAND 2 Density differencebasedoncomparisonbetween3DXPoint™Technologyandotherindustry DRAM 2 Endurancedifferencebasedoncomparisonbetween3DXPoint™Technologyandother industryNAND Data granularity: 64B cacheline
  • 31. WSCAD 2016 Yesterday Today Near Future 31 Storage Evolution Performance & Lower costs: Integrated solutions Memory & Storage Storage NAND based Intel P3700 (Fultondale) for NVMe 3D XPoint™ based Coldstream SSD for NVMe 3D XPoint™ based Apache Pass (AEP) for DDR4 Revolutionary Storage Class Memory World’s Fastest NVMe SSD 3D XPoint enables world’s fastest NVMe SSD and revolutionary storage class memory
  • 32. WSCAD 2016 32 Code Modernization Democratizing HPC performance for Big Data workloads Easy of use Fine tuning Vectors Intel® Math Kernel Library Array Notation: Intel® Cilk™ Plus Auto vectorization Semi-auto vectorization: #pragma (vector, ivdep, simd) C/C++ Vector Classes (F32vec16, F64vec8) Intel® Data Analytics Acceleration Library Coprocessor Fabric Memory Memory Bandwidth ~500 GB/s STREAM Memory Capacity Over 25x* KNC Resiliency Systems scalable to >100 PF Power Efficiency Over 25% better than card1 I/O Up to 100 GB/s with int fabric Cost Less costly than discrete parts1 Flexibility Limitless configurations Density 3+ KNL with fabric in 1U3 Knights Landing *Comparison to 1st Generation Intel® Xeon Phi™ 7120P Coprocessor (formerly codenamed Knights Corner) 1Results based on internal Intel analysis using estimated power consumption and projected component pricing in the 2015timeframe. This analysis is provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. 2Comparison to a discrete Knights Landing processor and discrete fabric component. 3Theoretical density for air-cooled system; other cooling solutions and configurations will enable lower or higher density. Server Processor
  • 33. WSCAD 2016 KNL and KNL-F Processors:  Knights Landing IS the host processor  Boots standard off-the-shelf OS’s Benefits:  Higher performance density for highly parallel applications2  Reduced system power consumption2  Higher perf/Watt & perf/$$3 Knights Landing Coprocessor:  Solution for general purpose servers and workstations Benefits:  Targeted for applications with larger sections of serial work1  Upgrade path from Knights Corner as PCIe card Knights Landing Processor “Self-boot” Intel® Xeon Phi™ processor platform 1 Projections based on early product definition and as compared to prior generation Intel® Xeon Phi™ Coprocessors 2 Based on Intel internal analysis.. Lower power based on power consumption estimates between (2) HCAs compared to 15W additional power for KNL-F. Higher density based on removal of PCIe slots and associated HCAs populated in those slots. 3 Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. 2 Results based on internal Intel analysis using estimated theoretical Flops/s for KNL processors, along with estimated system power consumption and component pricing in the 2015 timeframe. See backup for complete system configurations. KNL- F KNL Three Knights Landing Products Knights Landing Coprocessor Requires Intel® Xeon® processor host Adams Pass Platform KNL Coprocessor
  • 34. WSCAD 2016 DDR4 x4 DMI2 to PCH 36 Lanes PCIe* Gen3 (x16, x16, x4) MCDRAM MCDRAM MCDRAM MCDRAM DDR4 TILE: (up to 36) Tile IMC (integrated memory controller)EDC (embedded DRAM controller) IIO (integrated I/O controller) KNL Package Enhanced Intel® Atom™ cores based on Silvermont Microarchitecture  2D Mesh Architecture  Out-of-Order Cores  3X single-thread vs. KNC ISA Intel® Xeon® Processor Binary-Compatible (w/Broadwell) On-package memory Up to 16GB, ~465 GB/s STREAM at launch Fixed Bottlenecks Platform Memory Up to 384GB (6ch DDR4-2400 MHz) 2VPU Core 2VPU Core 1MB L2 HUB KNL Architecture Overview Bi-directional tile connections (same bit width as Xeon core interconnect) 34
  • 35. WSCAD 2016 F CONNECTOR  Lower cost: cost adder expected to be lower than (2) adapters or on-board controllers  Lower power: only ~15W TDP adder, which expected to be less than (2) adapters  Higher density: enables denser form factor – no slots, adapters, on-board controllers  Future-ready: sets stage for future hetero clusters (future Intel® Xeon® processor w/ int fabric) 1. KNL with TWO Fabric Adapters (2) x16 PCIe slots (2) x16 PCIe lanes 2. KNL w/ TWO on-board controllers 3. KNL-F with Storm Lake Fabric controller Fabric controller QSPF connector QSPF connector 1 Based on Intel internal estimates. Lower cost based on expected price delta between KNL and KNL-F processor, compared to two InfiniBand* or Storm Lake HCA via PCIe Express slots. Lower power based on power consumption estimates between (2) HCAs (~20W)compared to 15W additional power for KNL-F over a comparable KNL processor.. Higher density based on removal of PCIe slots and HCAs populated in those slots. QSPF connector QSPF connector QSFP module Same socket for KNL and KNL-F Design common platform with keep-out zone and to support additional 15W TDP. KNL-F Benefits:1 Why KNL-F? (Integrated Fabric) Dual-Port 100 GB/s bi- directional 35
  • 36. WSCAD 2016 Integrated Fabric CPU Requirements Components required to support CPU with Integrated Fabric with two ports  (1) IFP Cable [supporting two ports]  (1) 2-port “Carrier card” (two main options) – PCB that plugs into a PCIe slot (aka “PCIe carrier card”) – Custom OEM PCB with power and sideband cables  PCIe “Carrier card” implementation requires: – PCB, (2) IFT connectors, (2) IFT cages, sideband cable PCIe carrier board, 2-port version (sideband cable and IFT connectors and cages on underside of the card) (2) Internal-to-Faceplate Processor (IFP) cable supporting two-ports (1) Internal Faceplate Transition (IFT) Connector EACH port requires: (1) IFT Cage 36 IFT Carrier Card design kit (including BOM and design guide) is now posted on IBL (Doc#558210)
  • 37. WSCAD 2016 Tighter Component Integration Benefits Bandwidth Density Latency Power Cost Cores Graphics Fabric FPGA I/O Memory Intel® Scalable System Framework 37
  • 38. WSCAD 2016 Source: IDC 2014 (Worldwide High-Performance Systems Revenue by Applications) and https://software.intel.com/en-us/file/xeonphi-catalogpdf/download CAE Geosciences Weather Other Mechanical Design DCC & Distrib Defense University / Academic Government Lab Bio-Sciences EDA / IT / ISV Economics / Financial Chem Engineering Balanced ApplicationsMemory Bandwidth Intensive Compute Intensive CAE Altair RADIOSS* Ansys* Mechanical Matevo MinFE SIMULIA Abaqus* Financial Services Binomial Options Pricing Model Binomial SP and DP BlackScholes Merton Formula BlackScholes SP and DP Monte Carlo European Options Pricing Monte Carlo RND SP and DP Monte Carlo SP and DP STAC A2 Xcelerit Bioinformatics BLAST Bowtie 2 Burrows Wheeler Alignment (BWA) Cry-EM Technique MPI-HMMER 2.3 Computational Chemistry DiRAC Codes GAMESS Integral Calculation Library NEURON NWChem Molecular Dynamics AMBER BUDE DL_POLY GROMACS LAMMPS NAMD Geophysics ELMER/Ice SeisSol SPECFEM3D Cartesian UTBench Climate/Weather ADCIRC CAMS CFSv2 COSMO ECHAM6 HARMONIE HBM MPAS NOAA NIM WRF Digital Content Creation EMBREE Superresolution processing Energy Acceleware* AxRTM DownUnder GeoSolutions ISO3DFD RTM Petrobras TTI 3DFD CFD AVBP FrontFlow/Blue code LBS3D NASA Overflow OpenFOAM OpenLB ROTORSIM SU2 TAU and TRACE software.intel.com/XeonPhiCatalog Intel® Xeon Phi™ Application Catalog Over 100 applications to date listed as available or in-flight 38
  • 39. WSCAD 2016 Developer Tools for Knights Landing Platform Intel Parallel Studio XE Component Supported features in PSXE 2016 Gold Intel ® C/C++ and Fortran compilers 16.0 1) -xMIC-AVX512 compiler option enables KNL specific optimizations, including loop optimizations and vectorization 2) Use Intel® Fortran compiler to build for MCDRAM Intel® Math Kernel Library 11.3 Partial optimizations for all major MKL domains (BLAS, FFT, Sparse BLAS, VML, VSL) are delivered via AVX512 optimizations. Intel® MPI 5.1.1 and ITAC 9.1 Support for KNL platform and initial performance tuning is part of Intel MPI 5.1.1 VTune Amplifier XE 2016 (NDA package) Collection on KNL targets: advanced hotspots and custom event collection based on SEP and perf; User API; Analysis types for KNL profiling: advanced hotspots with full OpenMP analysis, custom events (core and uncore) Intel MPI spins, general exploration HBM profiling on Xeon with KNL Bandwidth modeling Advisor XE 2016 (NDA Package) Survey analysis for AVX512 (includes hotspot collection and compiler static data) Data Analytics Acceleration Library 2016 Includes KNL-specific performance optimizations Intel® Integrated Performance Primitives 9.0 More than 70% of hot list functions have AVX512 optimizations
  • 40. WSCAD 2016 Integrated solutions Xeon + FPGA 40 Performance
  • 41. WSCAD 2016 41 Skylake + FPGA Target Workloads Performance and Lower costs: FPGA FPGA Activity Workload Examples Compute intensive algorithms  Visual Understanding/Deep Learning classification  Compression/decompression  Video Motion Estimation  Genomics (Pair HMM, Smith Waterman)  Memory copy routines Latency sensitive pre-filtering & processing for CPU  Bump in the wire network processing  FSI market data pre-filtering  HPC Radar data pre-processing  Automotive video input  Security appliance, targeted Vswitch Evolving algorithms or stable algorithms on low latency and inline interconnect  New compression algorithms  High compression ratios  Custom crypto algorithms
  • 42. WSCAD 2016 42 Skylake + FPGA on Purley Performance and Lower costs: FPGA PCIe3.0x16 UPI2 Prog I/F UPI0 PCIe3.0x16 UPI1 DMIx4 DDR4 PCIe 3.0 x8 PCIe 3.0 x8 HSSI SKL FPGA DDR4 DDR4 DDR4 DDR4 DDR4 Cores Up to 28C with Intel® HT Technology FPGA Altera® Arria 10 GX 1150 Socket TDP Shared socket TDP Up to 165W SKL & Up to 90W FPGA Socket Socket P Scalability Up to 2S – with SKL-SP or SKL + FPGA SKUs PCH Lewisburg: DMI3 – 4 lanes; 14xUSB2 ports Up to: 10xUSB3; 14xSATA3, 20xPCIe*3 New: Innovation Engine, 4x10GbE ports, Intel® QuickAssist Technology For CPU For FPGA Memory 6 channels DDR4 RDIMM, LRDIMM, Apache Pass DIMMs Low latency access to system memory via UPI & PCIe interconnect2666 1DPC, 2133, 2400 2DPC Intel® UPI 2 channels (10.4, 9.6 GT/s) 1 channel (9.6 GT/s) PCIe* PCIe* 3.0 (8.0, 5.0, 2.5 GT/s) PCIe* 3.0 (8.0, 5.0, 2.5 GT/s) 32 lanes per CPU Bifurcation support: x16, x8, x4 16 lanes per FPGA Bifurcation support: x8 High Speed Serial Interface (Different board design based on HSSI config) N/A 2xPCIe 3.0 x8 Direct Ethernet (4x10 GbE, 2x40 GbE, 10x10 GbE, 2x25 GbE)  Power for FPGA is drawn from socket & requires modified Purley platform specs  Platform Modifications include Stackup, Clock, Power Delivery, Debug, Power up/down sequence, Misc. I/O pins
  • 44. WSCAD 2016 CurrentStateofSystemSoftwareEffortsinHPCEcosystem 44 THE REALITY: We, the HPC ecosystem, will not be able to get to where we want to go without a major change in system software development. With system margins under pressure, unwillingness to invest in system software A desire to get exascale performance & speed up software adoption of HW innovation Fragmented efforts across the ecosystem – “Everyone building their own solution.” New complex workloads (ML, Big Data, etc) drive more complexity into the software stack
  • 45. WSCAD 2016 Stable HPC System Software that: Fuels a vibrant and efficient HPC software ecosystem Takes advantage of hardware innovation & drives revolutionary technologies Eases traditional HPC application development and testing at scale Extends to new workloads (ML, analytics, big data) Accommodates new environments (i.e. cloud) A Shared Repository DesiredFutureState 2
  • 46. WSCAD 2016 OfficialMembersasof6/1/2016 Goal: A common system software platform for the HPC community that works across multiple segments and on which ecosystem partners can collaborate and innovate 46
  • 47. WSCAD 2016 HPCsystemsoftwareStackComponentView 47  Intra-stack APIs to allow for customization/differentiation  External APIs to develop on and around the stack
  • 48. WSCAD 2016 48 OpenHPCtoIntel®HPCOrchestratorsystemsoftwareproducts Intel HPC Orchestrator products • Premium Software • Advanced testing • Support An open source community for HPC software Intel seeded the community with a pre-integrated, pre- tested and validated HPC system software stack & will continue contributions along with other members of the community Intel will offer Intel- supported products based on the open source OpenHPC software Intel HPC Orchestrator products are the realization of the software portion of Intel® Scalable System Framework Intel® Scalable System Framework
  • 49. WSCAD 2016 49 Open Source acellerating HPC + Big Data Open Standards • PBS Pro is now Open • OpenHPC • Cloud for HPC • And how about Brazil ? • Intel Innovation Center at Rio – partnership with AMT (www.amt.com.br) Pay less + easy of use = democratizing HPC for Big Data
  • 51. WSCAD 2016 Intel’s HPC initiatives in Brazil Code Modernization – Open source softwares 51 • Modernizing applications to increase parallelism and scalability • Leverage cores, caches, threads, and vector capabilities of microprocessors and coprocessors. • Current centers in Brazil
  • 52. WSCAD 2016 Intel Modern Code Partner program 52 Intel Modern Code Partners Code Modernization – driving developers to develop modern code to modern hardware • Create Faster Code…Faster • High Performance Scalable Code • C++, C, Fortran*, Python* and Java* • Standards-driven parallel models: • OpenMP*, MPI, and TBB • To teach developers how to full exploit Xeon and Xeon Phi performance: vectors + multi-treading More at: http://software.intel.com/moderncode Free HPC & Big Data Workshops accross Brazil
  • 53. WSCAD 2016 Code Modernization initiatives in the Brazilian HPC Ecosystem Oil & Gas - Reservoir Simulator at PETROBRAS LNCC - National Laboratory for Scientific Computing Largest HPC cluster in Latin America INPE/CPTEC Code Modernization of BRAMS • Up to 10.5x performance gains in their Reservoir Simulator software¹ • Up to 30x performance gain in Oil & Gas applications² • Up to 3.4x speedup via AVX (vector instructions) • Link white-paper • Initial results – white-paper link Health & Life Sciences • Up to 11x speedup in Molecular Dynamics – NCC/UNESP & LNCC – white-paper link • Xeon only: • Original code vs Modernized code: up to 11x speedup • Xeon + 1 Xeon Phi (same optimized code) • 1.14x speedup Article link Authors: ¹CENPES team and Gilvan Vieira - gilvandsv@gmail.com ²LNCC - Frederico Cabral - fredluiscabral@gmail.com ³NCC/UNESP - Silvio Stanzani silvio.stanzani@gmail.com
  • 55. WSCAD 2016 55 Conclusions • Likewise other products, technologies and services we have: “Lower cost + Scale + Easy of usage” would drive HPC to the masses 1st wave: near Bare-metal at the Cloud (lower cost + scale) 2nd wave: Frameworks offering “free performance” to unlock insights ( usability ) 3rd wave: Even small and medium business will relay on HPC / Big Data to drive business
  • 56. WSCAD 2016 56 Big Data Analytics Integrated solutions: HPC && Big Data *Other names and brands may be claimed as the property of others HPC Big Data FORTRAN / C++ Applications MPI High Performance Python, Frameworks, Java* Applications, Others Hadoop* / Spark / Others Simple to Use Lustre* with Hadoop* Adapter Remote Storage Compute & Big Data Capable Scalable Performance Components Server Storage (SSDs and Burst Buffers) Intel® Omni-Path Architecture Infrastructur e Programming Model Resource Manager File System Hardware HPC & Big Data-Aware Resource Manager
  • 57. WSCAD 2016 Next steps for HPC & Big Data New paradigm in memory and storage Processor Compute Node I/O Node Remote Storage Compute Today Caches Local Memory SSD Storage Parallel File System (Hard Drive Storage) HigherBandwidth. LowerLatencyandCapacity Some remote data moves onto I/O node I/O Node storage moves to compute node Local memory is now faster & in processor package Compute Future Caches Non-Volatile Memory Burst Buffer Storage Parallel File System (Hard Drive Storage) In-Package High Bandwidth Memory* *cache, memory or hybrid mode
  • 58. WSCAD 2016 Conclusions A Holistic Architectural Approach is Required Compute Memory Fabric Storage PERFORMANCEICAPABILITY TIME System Software Innovative Technologies Tighter Integration Application Modernized Code Community ISV Proprietary System Memory Cores Graphics Fabric FPGA I/O 58
  • 60. WSCAD 2016 14 A Global Online Community Intel® Modern Code Developer Community Developer zone - Vectorization/Single Instruction, Multiple Data (SIMD) - Multi-Threading - Multi Node/Clustering - Take Advantage of On-Package High-Bandwidth Memory - Increase Memory and Power Efficiency Topics Experts software.intel.com/moderncode - Modern Code Zone - Software Tools, Training Webinars - How-to guides, Parallel Programming BKMs - Remote Access to Hardware - Support Forums - Black Belts, & Intel Engineer Experts - Technical Content, Training -Webinars, F2F, Forum Support - Conference and Tradeshows: Keynotes, Presentations, BOFs, Demos, Tutorials
  • 61. WSCAD 2016 61 Machine/Deep Learning | Resources Training Classes: U.Oxford Class on Deep Learning Stanford Class on Machine Learning Google Class on Deep Learning Intel Caffe Repo: (Support for Multi-node Training) https://github.com/intelcaffe/caffe Spark MLLib Repo: http://spark.apache.org/mllib/ Intel Machine Learning Blog Posts: Myth Busted - CPUs and Neural Network Training Caffe Scoring on Xeon Processors Caffe Training on Multi-node Distributed Memory Systems Trusted Analytics Platform: http://trustedanalytics.org/ Performance Libraries: MKL for Neural Networks - Technical Preview Math Kernel Library MKL Community License Data Analytics Acceleration Library
  • 62. WSCAD 2016 62 Links • Intel Modern Code: https://software.intel.com/pt-br/modern-code • Intel Developer Zone • Intel Compiler Reference 2016 • Intel Intrinsics reference • Guide to Auto-vectorization • Xeon Phi™ Home Page • Xeon Phi™ CODE RECIPES • Intel Parallel Computing Centers
  • 63. WSCAD 2016 Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright © 2016, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 63
  • 64. 64