SlideShare a Scribd company logo
1 of 60
Download to read offline
Azure Brain: 4th paradigm, scientific
discovery & (really) big data (REC201)
Gabriel Antoniu
Senior Research Scientist, Inria
Head of the KerData Project-Team, Inria Rennes – Bretagne Atlantique
Radu Tudoran
PhD student, ENS Cachan – Brittany
KerData Project-Team, Inria Rennes – Bretagne Atlantique
Feb. 12, 2013
INRIA’s strategy in Cloud Computing
INRIA is among the leaders in Europe in the area of distributed computing and HPC
• Long history of researches around distributed systems, HPC, Grids
• Now several activities virtualized environments/cloud infrastructures
• Culture of multidisciplinary research
• Culture of exploration tools (owner of massively parallel machines since 1987, large scale
testbeds such as Grid’5000)
• Strong involvement in national, European and international collaborative projects
• Strong collaboration history with industry (Joint Microsoft Research – Inria Centre, IBM,
EDF, Bull, etc.)
- 2
Clouds: where within Inria ?
1
2
Networks, Systems and Services,
Distributed Computing3
Perception, Cognition, Interaction
4
5
Applied Mathematics, Computation
and Simulation
Algorithmics, Programming,
Software and Architecture
Computational Sciences
for Biology, Medicine and
the Environment
- 3
Some project-teams involved in Cloud Computing
INRIA Nancy
Grand Est
INRIA Grenoble
Rhône-Alpes
INRIA Sophia Antipolis
Méditerranée
INRIA Rennes
Bretagne Atlantique
INRIA Bordeaux
Sud-Ouest
INRIA Lille
Nord Europe
INRIA
Saclay
Île-de-France
INRIA Paris
Rocquencourt
KERDATA: Data Storage and Processing
MYRIADS: Autonomous Distributed Systems
ASCOLA: Languages and virtualization
CEPAGE: task management
AVALON: middleware & programming
MESCAL: models & tools
REGAL: Large Scale dist. systems
ALGORILLE: algorithms & models
OASIS: programming
ZENITH: Scientific Data Management
- 4
Initiatives to support Cloud Computing and HPC within Inria
Why dedicated initiatives to support HPC/Clouds ?
• Project-teams are geographically dispersed
• Project-teams belong to different domains
• Researchers from scientific computing need access to the latest research results
related to tools, libraries, runtime systems, …
• Researchers from “computer science” need access to applications to test their
ideas as well as to find new ideas !
Concept of “Inria Large Scale Initiatives”
• Enable ambitious projects linked with the strategic plan
• Promote an interdisciplinary approach
• Mobilizing expertise of Inria researchers around key challenges
- 5
CLOUD COMPUTING@
INRIA RENNES BRETAGNE ATLANTIQUE
- 6
Some Research Focus Areas
Software architecture and infrastructure for cloud computing
• Autonomic service management, resource management, SLA, sky
computing: Myriads
• Big Data storage and management, MapReduce: KerData
• Hybrid Cloud and P2P systems, privacy: ASAP
Advanced usage for specific application communities
• Bioinformatics: GENSCALE
• Cloud for medical imaging: EasyMed project (IRT B-Com): Visages
- 7
Some Research Focus Areas
Software architecture and infrastructure for cloud computing
• Autonomic service management, resource management, SLA, sky
computing: Myriads
• Big Data storage and management, MapReduce: KerData
• Hybrid Cloud and P2P systems, privacy: ASAP
Advanced usage for specific application communities
• Bioinformatics: GENSCALE
• Cloud for medical imaging: EasyMed project (IRT B-Com): Visages
- 8
Contrail EU project
Goal: develop an integrated approach to virtualization offering
• services for federating IaaS clouds
• elastic PaaS services on top of federated clouds
Overview: provide tools for
• managing federation of multiple heterogeneous IaaS clouds
• offering a secure yet usable platform for end users through federated identity management
• supporting SLAs and quality of service (QoS) for satisfying stringent business requirements for
using the cloud
Resource
Provider
Federa&on)API)
+)Fed.)core)
Resource
Provider
Storage(
Provider( Public(
Cloud(
Storage(
Provider(Network(
Provider(
A) A)A) A)
Applica&on)
Applica&on)
Applica&on)
Federa&on)API)
+)Fed.)core)
Federa&on)API)
+)Fed.)core)
Contrail is an open source cloud computing
software stack compliant with cloud
standards
http://contrail-project.eu
- 9
Contrail EU project
http://contrail-project.eu
http://contrail.projects.ow2.org/xwiki/bin/view/Main/WebHome
- 10
Open source software under the GNU GPLv2
license
http://snooze.inria.fr
Other Research Activities on Cloud Computing
Snooze: an autonomic energy-
efficient IaaS management system
Scalability
• Distributed VM management system
• Self-organizing & self-healing hierarchy
Energy conservation
• Idle nodes in power-saving mode
• Holistic approach to favor idle nodes
VM management algorithms
• Energy-efficient VM placement
• Under-load / overload mitigation
• Automatic node power-cycling and wake-
up
Resilin: Elastic MapReduce on
multiple clouds (sky computing)
Goals
• Creation of MapReduce execution
platforms on top of multiple clouds
• Elasticity of the platforms
• Support all kinds of Hadoop jobs
• Support different Hadoop versions
Interfaces
• Amazon EMR for users
• Libcloud with underlying IaaS providers
Open source software under GNU Affero
GPL license
http://resilin.inria.fr
- 11
KerData: Dealing with the Data Deluge
Deliver the capability to mine,
search and analyze this data in
near real time
Science itself is evolving
Credits: Microsoft
12- 12
Last
few decades
The Data Science:
The 4th Paradigm for Scientific Discovery
Thousand
years ago
Today and the
Future
Last few
hundred years
2
2
2.
3
4
a
cG
a
a










Simulation of
complex phenomena
Newton’s laws,
Maxwell’s equations…
Description of natural
phenomena
Unify theory, experiment
and simulation with
large multidisciplinary
Data
Using data exploration
and data mining
(from instruments,
sensors, humans…)
Distributed Communities
Crédits: Dennis Gannon
13
Last
few decades
The Data Science:
The 4th Paradigm for Scientific Discovery
Thousand
years ago
Today and the
Future
Last few
hundred years
2
2
2.
3
4
a
cG
a
a










Simulation of
complex phenomena
Newton’s laws,
Maxwell’s equations…
Description of natural
phenomena
Unify theory, experiment
and simulation with
large multidisciplinary
Data
Using data exploration
and data mining
(from instruments,
sensors, humans…)
Distributed Communities
14
Research Focus:
How to efficiently store, share and process data
for new-generation, data-intensive applications?
• Scientific challenges
• Massive data (1 object = 1 TB)
• Geographically distributed
• Fine-grain access (MB) for reading and writing
• High concurrency (10³ concurrent clients)
• Without locking
- Major goal: high-throughput under heavy concurrency
- Our contribution
Design and implementation of distributed algorithms
Validation with real apps on real platforms with real users
• Applications
• Massive data analysis: clouds (e.g. MapReduce)
• Post-Petascale HPC simulations: supercomputers
- 15
BlobSeer: A Software Platform for Scalable,
Distributed BLOB Management
Started in 2008, 6 PhD theses (Gilles Kahn/SPECIF PhD Thesis Award in 2011)
Main goal: optimized for concurrent accesses under heavy concurrency
Three key ideas
•Decentralized metadata management
•Lock-free concurrent writes (enabled by versioning)
- Write = create new version of the data
•Data and metadata “patching” rather than updating
A back-end for higher-level data management systems
•Short term: highly scalable distributed file systems
•Middle term: storage for cloud services
Our approach
•Design and implementation of distributed algorithms
•Experiments on the Grid’5000 grid/cloud testbed
•Validation with “real” apps on “real” platforms: Nimbus, Azure, OpenNebula clouds…
http://blobseer.gforge.inria.fr/
16- 16
Impact of BlobSeer: MapReduce
BlobSeer improves Hadoop
• Gain (execution time) : 35%
ANR MapReduce Project (2010-2014)
• Lead: G. Antoniu (KerData)
• Partners: INRIA (AVALON), Argonne National Lab, U. Illinois Urbana-Champaign, IBM,
JLPC, IBCP, MEDIT
• Strong collaboration with the Nimbus team from Argonne National Lab
- BlobSeer integrated with the Nimbus cloud toolkit
- BlobSeer used for efficient VM deployment and snapshotting
• Validation : Grid’5000 with Nimbus, FutureGrid (USA), Open Cirrus (USA)
http://mapreduce.inria.fr
- 17
The A-Brain Project: Data-Intensive Processing on
Microsoft Azure Clouds
Application
• Large-scale joint genetic and
neuroimaging data analysis
Goal
• Assess and understand the
variability between individuals
Approach
• Optimized data processing on
Microsoft’s Azure clouds
Inria teams involved
• KerData (Rennes)
• Parietal(Saclay)
Framework
• Joint MSR-Inria Research Center
• MS involvement: Azure teams,
EMIC
18
Genetic information: SNPs
G G
T G
T T
T G
G G
MRI brain images
Clinical / behaviour
The Imaging Genetics Challenge:
Comparing Heterogeneous Information
THere we focus
on this link
- 19
Neuroimaging-genetics: The Problem
 Several brain diseases have a genetic
origin, or their occurrence/severity related
to genetic factors
 Genetics important to understand & predict
response to treatment
 Genetic variability captured in
DNA micro-array data
p( )|
Gene→Image
geneticimage
20
Imaging Genetics Methodological Issues
Genetic dataBrain image
Y
~105-106
~2000
X
~105-106
– Anatomical MRI
– Functional MRI
– Diffusion MRI
– DNA array (SNP/CNV)
– gene expression data
– others...
- 21
A BIG DATA Challenge …
Azure can help…
Data:
double
permutation
voxels
SNPs
5%-10%
useful
Computation:
Estimate timespan
on single machine
Estimation for A-Brain on Azure (350 cores)
Storage capacity estimations (350 cores)
Imaging Genetics Methodological Issues
 Multivariate methods:
predict brain characteristic with many
genetic variables
 Elastic net regularization:
combination of ℓ1 and ℓ2 penalties →
sparse loadings
 parameters setting:
internal cross-validation/bootstrap
 Performance evaluated using
permutations
23
A-Brain as Map-Reduce Processing
- 24
A-Brain as Map-Reduce Data Processing
25
Efficient Procedures for Statistics
Example : voxelwise Genome Wide Association Studies (vGWAS)
 740 subjects
 ~ 50,000 voxels
 ~ 500,000 SNPs
 10,000 permutations
→ ~ 12,000 hours of computation
→ ~ 1.8 Po of statistical scores
- 26
Efficient Procedures for Statistics
Example : Ridge regression with cross-validation loops
 Some costly computations
(SVD ~ 60 sec) are used 1-2
millions of times and cannot be
kept in memory.
~ 60-120 x 106 sec / SVD
(1.9-3.8 years / SVD)
→ An efficient distributed cache
can achieve huge speedup!
- 27
TomusBlobs approach
- 28
Requirements for a cloud storage / data management
High throughput under heavy concurrency
Fine grain access
Scalability / Elasticity
Data availability
Transparency
Design principles
Data locality – use the local storage
No modification on the cloud middleware
Loose coupling between storage and applications
Storage hierarchy
- 29
TomusBlobs - Architecture
- 30
Computation nodes
Architecture contd.
System components
Initiator
- Cloud specific
- Generic stub
- Properties: Scaling; Self configuration
Distributed Storage
- Aggregates the virtual disks
- Not depending on a specific solution
Client API
- Cloud specific API
- Expose the operation transparently
Initiator
Local
Disk
Application
Client API
TB
entity
VM snapshot
Customizable
Environment
- 31
TomusBlobs Evaluation
• Scenario: Single reader / writer
• Data transfer from memory to storage
• Metric: Client IO throughput
TomusBlobs Evaluation
Cumulative read throughput Cumulative write throughput
• Scenario: Multiple readers / writers
• Throughput limited by bandwidth
• Read 4X ; Write 5X
- 34
TomusBlobs as a Storage Backend for
Sharing Application Data in MapReduce
App
API
App App App App
API API API API
TomusBlobs
- 35
TomusMapReduce Evaluation
• Scenario: Increase the problem size
• Optimize computation by managing better intermediate data
- 36
Iterative MapReduce - Daytona
 Merge Step
 In-Memory Caching of static data
 Cache aware hybrid scheduling using Queues as well as
using a bulletin board (special table)
Reduce
Reduce
Merge
Add
Iteration?
No
Map Combine
Map Combine
Map Combine
Data
Cache
Yes
Hybrid scheduling of the new iteration
Job Start
Job Finish
Crédits: Dennis Gannon
- 37
Beyond MapReduce
• Unique result with parallel reduce
phase
• No central control entity
• No synchronization barrier
Map
Reducer
Map
Map
Map
Map
Reducer
- 38
Zoom on the Reduction Ratio
• Compute the minimum of a set of large matrixes (7.5 GB)
using 30 mappers
- 39
Azure integration
- 40
The Most Frequent Words benchmark
•Input data size varies from 3.2 GB to 32 GB
•ReductionRatio = 5
- 41
Execution times for A-Brain
•Increasing number of map jobs = increasing size of data
(5 GB to 50 GB)
- 42
Beyond Single Site
processing
• Data movements across geo-distributed
deployments is costly
• Minimize the size and number of transfers
• The overall aggregate must collaborate
towards reaching the goal
• The deployments work as independent
services
• The architecture can be used for scenarios
in which data is produced in different
locations
- 43
Towards a Geo-distributed
TomusBlobs approach
• TomusBlobs for intra-
deployment data management
• Public Storage (Azure
Blobs/Queues) for intra-
deployment communication
• Iterative Reduce technique for
minimizing number of transfers
(and data size)
• Balance the network bottleneck
from single data center
- 44
Multi-Site MapReduce
• 3 deployments (NE,WE,NUS)
• 1000 CPUs
• ABrain execution across multiple sites
- 45
Beyond MapReduce -
Workflow Processing
- 46
Data access patterns for workflows [1]
[1] Vairavanathan et al.
A Workflow-Aware Storage
System: An Opportunity Study
http://ece.ubc.ca/~matei/paper
s/ccgrid2012.pdf
Pipeline
Caching
Data informed workflow
Input
Output
Broadcast
Replication
Data size
Input
Output
Reduce/Gather
Co-placement of all data
Data informed workflow
Input
Output
Scatter
File size awareness
Data informed workflow
Input
Output
- 47
eScience Central
(Newcastle University)
- 48
Generic Worker Walkthrough
(Microsoft ATLE)
Local
storage
Client
code
Researcher
Job
Management
Service
Algorithm
HD
GW Driver
Pluggable
Runtime
Environment
Runtime
Business Logic
Job Details
Table
Job Index
Table
Notification Listeners
(Accounting, Status
Change, etc..)
BLOB Storage
Notification
Service
Scaling
Service
OGF
BES VM
SOAP WS–*
Use of interoperable standard
protocols and data schemas!
OGF
JSDL
Application Code
GW Services & SDKs
Existing Components
Input Files
Output Files
Shared
Storage
- 49
Credits: Microsoft
Defining the scope
ID
Files
Batch
jobs
Assumptions
about the
workflows
Workflows are
composed of
batch jobs with
well-defined
data passing
schemas
The input and
the output of the
batch jobs are
files The batch jobs
and their inputs
and outputs can
be uniquely
identified in the
system
Most workflows fit in this
subclass
Idea: manage files inside
the deployment
- 50
The Concept
File Name Locations
F1 VM1
F2 VM1,VM2
F3 VM2
VM 1 VM 2
Local Disk Local Disk
F1F2 F3
Transfer
Module
File Metadata Registry
(1) Register (F1,VM1)
(2) GetLocation(F1)
(3) DownloadFile(F1)
F1
F2
• Metadata Registry
• Transfer Module
Components
Transfer
Module
- 51
Characteristics of the components
Metadata Registry Transfer Module
Role
• Transfer files from one node to
another
Data type
• Files
Accessibility
• Each VM has such a module
• Applications access the local module
• The modules interact across nodes
Solutions
• FTP; Torrent; InMemory, HTTP etc.
Role
•Hold the location of files within the
deployment
Data type
•Key-value pairs –
(file identification; retrieval information)
Accessibility
•Accessible by all nodes
Solutions
•Azure Caching Preview, Azure Tables,
InMemory DB
Idea:
Adopt multiple transfer solutions
Adapt to the context: select the one that fits best
- 52
Transfer methods
Method Observations
InMemory • Caching data
• InMemory data offers fast access
• GBs of memory capacity per
deployment
• Small files
BitTorrent • Replicas for file dissemination
• Collaborative reads
• New way of stage-in data
FTP • TCP transfer
• Medium and large files
• Potential of inter-operability
- 53
VM Snapshot
VMMemory
MetaDataRegistry
Adaptive
Storage
FTP
Torrent
InMemory
Transfer Module Services
Replication Queue
Replication
FTP
Tracker
Peer
Local
Disk
- 54
F1
Azure Caching
Adaptive
Storage
Adaptive
Storage
App App
F1
Create
Upload(F1)
GetMetadata
Read(F1)
Memory Memory
Local Storage Local Storage
Read (F1)
WriteMetadata
Write(F1)
Download (F1)
API
API
- 55
Scenario 2 – Large files ; replication enabled
0
10
20
30
40
50
60
70
80
50 100 150 200 250
Time(sec)
Size of a single file (MB)
DirectLink Torrent Adaptive AzureBlobs
• Torrents are superior for broadcast when replicas are used
• DirectLink is faster for pipeline (reduction tree)
• Adaptive storage can chose each time the best strategy
- 56
NCBI Blast for Azure
Seamless Experience
• Evaluate data and invoke computational models
from Excel.
• Computationally heavy analysis done close to
large database of curated data.
• Scalable for large, surge computationally heavy
analysis.
selects DBs and
input sequence
Web Role Input Splitter
Worker Role
BLAST
Execution
Worker
Role #n….
Combiner
Worker Role
Genome
DB 1
Genome
DB K
BLAST DB
Configuration
Azure Blob
Storage
BLAST
Execution
Worker
Role #1
Crédits: Dennis Gannon
- 57
BLAST analysis – data management
component
0
30
60
90
120
5 10 15 25 35 40 50 60
Time(sec)
Number of BLAST jobs
Download Adaptive Download AzureBlobs Upload Adaptive Upload AzureBlobs
• Database files – 1.6 GB
• Input size – 800 MB
• 50 nodes
- 58
Scalable Storage on Clouds: Open Issues
Understanding price-performance trade-offs
• Consistency, availability, performance, cost, security,
quality of service, energy consumption
• Autonomy, adaptive consistency
• Dynamic elasticity
• Trade-offs exposed to the user
High performance variability
- Understand it, model it, cope with it
Deployment/application launching time is high
Latency of data accesses is still an issue
Data movements are expensive
Cope with tightly-coupled applications
Cope with various cloud programming models
Virtualization overhead
Benchmarking
Performance modeling
Self-optimization for cost reduction
- Elastic scale down
Security and privacy
- 59
Extreme scale does matter BUT not only
Other focus areas
– Affordability and usability of intermediate size systems
– Pervasiveness of usage across the entire industry, including Small and
Medium Enterprises (SMEs) and ISVs
– New HPC deployments (e.g. Big Data, HPC in Clouds)
– HPC and Cloud usage expansion, fostering the development of consultancy,
expertise and service business / end-user support
– Facilitating the creation of start-ups and the development of the SME sector
(hw/sw supply side)
– Education and training (inc. engineering skills for industry)
Cloud Computing@INRIA
Strategic Research Agenda
- 60
- 61
Azure Brain: 4th paradigm, scientific
discovery & (really) big data (REC201)
Gabriel Antoniu
Senior Research Scientist, Inria
Head of the KerData Project-Team, Inria Rennes – Bretagne Atlantique
Radu Tudoran
PhD student, ENS Cachan – Brittany
KerData Project-Team, Inria Rennes – Bretagne Atlantique
Contacts: Gabriel.Antoniu@inria.fr, Radu.Tudoran@inria.fr

More Related Content

What's hot

Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Frederic Desprez
 
Building an Outsourcing Ecosystem for Science
Building an Outsourcing Ecosystem for ScienceBuilding an Outsourcing Ecosystem for Science
Building an Outsourcing Ecosystem for ScienceEuroCloud
 
Bergman Enabling Computation for neuro ML external
Bergman Enabling Computation for neuro ML externalBergman Enabling Computation for neuro ML external
Bergman Enabling Computation for neuro ML externalazlefty
 
LambdaFabric for Machine Learning Acceleration
LambdaFabric for Machine Learning AccelerationLambdaFabric for Machine Learning Acceleration
LambdaFabric for Machine Learning AccelerationKnuEdge
 
The Pacific Research Platform Two Years In
The Pacific Research Platform Two Years InThe Pacific Research Platform Two Years In
The Pacific Research Platform Two Years InLarry Smarr
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and ComputationTal Lavian Ph.D.
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?Robert Grossman
 
Grid Computing by Mireille Raad
Grid Computing by Mireille RaadGrid Computing by Mireille Raad
Grid Computing by Mireille RaadArabNet ME
 
Tutorial on Hybrid Data Infrastructures: D4Science as a case study
Tutorial on Hybrid Data Infrastructures: D4Science as a case studyTutorial on Hybrid Data Infrastructures: D4Science as a case study
Tutorial on Hybrid Data Infrastructures: D4Science as a case studyBlue BRIDGE
 
Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Robert Grossman
 
Using e-Infrastructures for Biodiversity Conservation
Using e-Infrastructures for Biodiversity ConservationUsing e-Infrastructures for Biodiversity Conservation
Using e-Infrastructures for Biodiversity ConservationBlue BRIDGE
 
Inroduction to grid computing by gargi shankar verma
Inroduction to grid computing by gargi shankar vermaInroduction to grid computing by gargi shankar verma
Inroduction to grid computing by gargi shankar vermagargishankar1981
 
g-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionalityg-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking FunctionalityNicholas Loulloudes
 
Gridcomputingppt
GridcomputingpptGridcomputingppt
Gridcomputingpptnavjasser
 
SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...
SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...
SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...Frederic Desprez
 

What's hot (19)

Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
 
Virtualization for HPC at NCI
Virtualization for HPC at NCIVirtualization for HPC at NCI
Virtualization for HPC at NCI
 
Building an Outsourcing Ecosystem for Science
Building an Outsourcing Ecosystem for ScienceBuilding an Outsourcing Ecosystem for Science
Building an Outsourcing Ecosystem for Science
 
Bergman Enabling Computation for neuro ML external
Bergman Enabling Computation for neuro ML externalBergman Enabling Computation for neuro ML external
Bergman Enabling Computation for neuro ML external
 
LambdaFabric for Machine Learning Acceleration
LambdaFabric for Machine Learning AccelerationLambdaFabric for Machine Learning Acceleration
LambdaFabric for Machine Learning Acceleration
 
The Pacific Research Platform Two Years In
The Pacific Research Platform Two Years InThe Pacific Research Platform Two Years In
The Pacific Research Platform Two Years In
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and Computation
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
 
Grid Computing by Mireille Raad
Grid Computing by Mireille RaadGrid Computing by Mireille Raad
Grid Computing by Mireille Raad
 
Grid computing
Grid computingGrid computing
Grid computing
 
Tutorial on Hybrid Data Infrastructures: D4Science as a case study
Tutorial on Hybrid Data Infrastructures: D4Science as a case studyTutorial on Hybrid Data Infrastructures: D4Science as a case study
Tutorial on Hybrid Data Infrastructures: D4Science as a case study
 
Grid Presentation
Grid PresentationGrid Presentation
Grid Presentation
 
Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011
 
Using e-Infrastructures for Biodiversity Conservation
Using e-Infrastructures for Biodiversity ConservationUsing e-Infrastructures for Biodiversity Conservation
Using e-Infrastructures for Biodiversity Conservation
 
Cyberinfrastructure for Einstein's Equations and Beyond
Cyberinfrastructure for Einstein's Equations and BeyondCyberinfrastructure for Einstein's Equations and Beyond
Cyberinfrastructure for Einstein's Equations and Beyond
 
Inroduction to grid computing by gargi shankar verma
Inroduction to grid computing by gargi shankar vermaInroduction to grid computing by gargi shankar verma
Inroduction to grid computing by gargi shankar verma
 
g-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionalityg-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionality
 
Gridcomputingppt
GridcomputingpptGridcomputingppt
Gridcomputingppt
 
SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...
SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...
SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...
 

Viewers also liked (20)

Mastering Brain Power
Mastering Brain PowerMastering Brain Power
Mastering Brain Power
 
Thetahealing
ThetahealingThetahealing
Thetahealing
 
an introduction to Distem
an introduction to Disteman introduction to Distem
an introduction to Distem
 
A New Paradigm: Finding Knowledge Within
A New Paradigm: Finding Knowledge WithinA New Paradigm: Finding Knowledge Within
A New Paradigm: Finding Knowledge Within
 
2013 ia summit
2013 ia summit2013 ia summit
2013 ia summit
 
Telomeres: The Real Biologic Clock
Telomeres: The Real Biologic ClockTelomeres: The Real Biologic Clock
Telomeres: The Real Biologic Clock
 
Paradigms
ParadigmsParadigms
Paradigms
 
Data Science and the Fourth Paradigm by Torben Bach Pedersen
Data Science and the Fourth Paradigm by Torben Bach PedersenData Science and the Fourth Paradigm by Torben Bach Pedersen
Data Science and the Fourth Paradigm by Torben Bach Pedersen
 
The fourth paradigm in safety
The fourth paradigm in safetyThe fourth paradigm in safety
The fourth paradigm in safety
 
Paradigms
ParadigmsParadigms
Paradigms
 
Telomere seminar dec2012_new
Telomere seminar dec2012_newTelomere seminar dec2012_new
Telomere seminar dec2012_new
 
Train the Mind Virtue as medicine through meditation
Train the Mind Virtue as medicine through meditationTrain the Mind Virtue as medicine through meditation
Train the Mind Virtue as medicine through meditation
 
Sound: BINAURAL AUDIO
Sound: BINAURAL AUDIOSound: BINAURAL AUDIO
Sound: BINAURAL AUDIO
 
Binaural Audio
Binaural AudioBinaural Audio
Binaural Audio
 
Linkedin Copy An Introduction To Silva Um Training
Linkedin Copy An Introduction To Silva Um TrainingLinkedin Copy An Introduction To Silva Um Training
Linkedin Copy An Introduction To Silva Um Training
 
ThetaHealing®Workshops
ThetaHealing®WorkshopsThetaHealing®Workshops
ThetaHealing®Workshops
 
Introduction to binaural beats
Introduction to binaural beatsIntroduction to binaural beats
Introduction to binaural beats
 
The beta obsession
The beta obsessionThe beta obsession
The beta obsession
 
Ch7 alteredstates Reg. Psych
Ch7 alteredstates  Reg. PsychCh7 alteredstates  Reg. Psych
Ch7 alteredstates Reg. Psych
 
Miracle grow your brain
Miracle grow your brainMiracle grow your brain
Miracle grow your brain
 

Similar to Azure Brain: 4th paradigm, scientific discovery & (really) big data

NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGGeoffrey Fox
 
Opening the Path to Technical Excellence
Opening the Path to Technical ExcellenceOpening the Path to Technical Excellence
Opening the Path to Technical ExcellenceNETWAYS
 
OpenNebulaConf 2013 - Keynote: Opening the Path to Technical Excellence by Jo...
OpenNebulaConf 2013 - Keynote: Opening the Path to Technical Excellence by Jo...OpenNebulaConf 2013 - Keynote: Opening the Path to Technical Excellence by Jo...
OpenNebulaConf 2013 - Keynote: Opening the Path to Technical Excellence by Jo...OpenNebula Project
 
Progress of the Helix Nebula Science Cloud PCP Project
Progress of the Helix Nebula Science Cloud PCP ProjectProgress of the Helix Nebula Science Cloud PCP Project
Progress of the Helix Nebula Science Cloud PCP ProjectHelix Nebula The Science Cloud
 
Challenges for Standardization Cloud Computing and Big Data IOT
Challenges for Standardization Cloud Computing and Big Data IOTChallenges for Standardization Cloud Computing and Big Data IOT
Challenges for Standardization Cloud Computing and Big Data IOTSubha421414
 
information system.pptx
information system.pptxinformation system.pptx
information system.pptxAmarSalih4
 
SILECS: Super Infrastructure for Large-scale Experimental Computer Science
SILECS: Super Infrastructure for Large-scale Experimental Computer ScienceSILECS: Super Infrastructure for Large-scale Experimental Computer Science
SILECS: Super Infrastructure for Large-scale Experimental Computer ScienceFrederic Desprez
 
OpenNebulaConf2015 2.06 OpenNebula in the Wild - Ander Astudillo
OpenNebulaConf2015 2.06 OpenNebula in the Wild - Ander AstudilloOpenNebulaConf2015 2.06 OpenNebula in the Wild - Ander Astudillo
OpenNebulaConf2015 2.06 OpenNebula in the Wild - Ander AstudilloOpenNebula Project
 
Cloud and Grid Computing
Cloud and Grid ComputingCloud and Grid Computing
Cloud and Grid ComputingLeen Blom
 
Cloud and grid computing by Leen Blom, Centric
Cloud and grid computing by Leen Blom, CentricCloud and grid computing by Leen Blom, Centric
Cloud and grid computing by Leen Blom, CentricCentric
 
Practitioner's perspective on High Performance Computing services for innovat...
Practitioner's perspective on High Performance Computing services for innovat...Practitioner's perspective on High Performance Computing services for innovat...
Practitioner's perspective on High Performance Computing services for innovat...Huawei Enterprise Hong Kong
 
Grid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudGrid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudAdianto Wibisono
 
Adoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchAdoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchYehia El-khatib
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...David Wallom
 
Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Dr. Anita Goel
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud ComputingDavid Wallom
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
 

Similar to Azure Brain: 4th paradigm, scientific discovery & (really) big data (20)

NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWG
 
DGterzo
DGterzoDGterzo
DGterzo
 
Session19 Globus
Session19 GlobusSession19 Globus
Session19 Globus
 
Opening the Path to Technical Excellence
Opening the Path to Technical ExcellenceOpening the Path to Technical Excellence
Opening the Path to Technical Excellence
 
OpenNebulaConf 2013 - Keynote: Opening the Path to Technical Excellence by Jo...
OpenNebulaConf 2013 - Keynote: Opening the Path to Technical Excellence by Jo...OpenNebulaConf 2013 - Keynote: Opening the Path to Technical Excellence by Jo...
OpenNebulaConf 2013 - Keynote: Opening the Path to Technical Excellence by Jo...
 
Progress of the Helix Nebula Science Cloud PCP Project
Progress of the Helix Nebula Science Cloud PCP ProjectProgress of the Helix Nebula Science Cloud PCP Project
Progress of the Helix Nebula Science Cloud PCP Project
 
Challenges for Standardization Cloud Computing and Big Data IOT
Challenges for Standardization Cloud Computing and Big Data IOTChallenges for Standardization Cloud Computing and Big Data IOT
Challenges for Standardization Cloud Computing and Big Data IOT
 
information system.pptx
information system.pptxinformation system.pptx
information system.pptx
 
SILECS: Super Infrastructure for Large-scale Experimental Computer Science
SILECS: Super Infrastructure for Large-scale Experimental Computer ScienceSILECS: Super Infrastructure for Large-scale Experimental Computer Science
SILECS: Super Infrastructure for Large-scale Experimental Computer Science
 
OpenNebulaConf2015 2.06 OpenNebula in the Wild - Ander Astudillo
OpenNebulaConf2015 2.06 OpenNebula in the Wild - Ander AstudilloOpenNebulaConf2015 2.06 OpenNebula in the Wild - Ander Astudillo
OpenNebulaConf2015 2.06 OpenNebula in the Wild - Ander Astudillo
 
Cloud and Grid Computing
Cloud and Grid ComputingCloud and Grid Computing
Cloud and Grid Computing
 
Cloud and grid computing by Leen Blom, Centric
Cloud and grid computing by Leen Blom, CentricCloud and grid computing by Leen Blom, Centric
Cloud and grid computing by Leen Blom, Centric
 
Practitioner's perspective on High Performance Computing services for innovat...
Practitioner's perspective on High Performance Computing services for innovat...Practitioner's perspective on High Performance Computing services for innovat...
Practitioner's perspective on High Performance Computing services for innovat...
 
Grid computing
Grid computingGrid computing
Grid computing
 
Grid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudGrid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the Cloud
 
Adoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchAdoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific Research
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...
 
Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud Computing
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 

More from Microsoft Technet France

Automatisez, visualisez et améliorez vos processus d’entreprise avec Nintex
Automatisez, visualisez et améliorez vos processus d’entreprise avec Nintex Automatisez, visualisez et améliorez vos processus d’entreprise avec Nintex
Automatisez, visualisez et améliorez vos processus d’entreprise avec Nintex Microsoft Technet France
 
Comment réussir votre déploiement de Windows 10
Comment réussir votre déploiement de Windows 10Comment réussir votre déploiement de Windows 10
Comment réussir votre déploiement de Windows 10Microsoft Technet France
 
Fusion, Acquisition - Optimisez la migration et la continuité des outils col...
 Fusion, Acquisition - Optimisez la migration et la continuité des outils col... Fusion, Acquisition - Optimisez la migration et la continuité des outils col...
Fusion, Acquisition - Optimisez la migration et la continuité des outils col...Microsoft Technet France
 
Wavestone déploie son portail Powell 365 en 5 semaines
Wavestone déploie son portail Powell 365 en 5 semainesWavestone déploie son portail Powell 365 en 5 semaines
Wavestone déploie son portail Powell 365 en 5 semainesMicrosoft Technet France
 
Retour d’expérience sur le monitoring et la sécurisation des identités Azure
Retour d’expérience sur le monitoring et la sécurisation des identités AzureRetour d’expérience sur le monitoring et la sécurisation des identités Azure
Retour d’expérience sur le monitoring et la sécurisation des identités AzureMicrosoft Technet France
 
Scénarios de mobilité couverts par Enterprise Mobility + Security
Scénarios de mobilité couverts par Enterprise Mobility + SecurityScénarios de mobilité couverts par Enterprise Mobility + Security
Scénarios de mobilité couverts par Enterprise Mobility + SecurityMicrosoft Technet France
 
SharePoint Framework : le développement SharePoint nouvelle génération
SharePoint Framework : le développement SharePoint nouvelle générationSharePoint Framework : le développement SharePoint nouvelle génération
SharePoint Framework : le développement SharePoint nouvelle générationMicrosoft Technet France
 
Stockage Cloud : il y en aura pour tout le monde
Stockage Cloud : il y en aura pour tout le mondeStockage Cloud : il y en aura pour tout le monde
Stockage Cloud : il y en aura pour tout le mondeMicrosoft Technet France
 
Bien appréhender le concept de Windows As a Service
Bien appréhender le concept de Windows As a ServiceBien appréhender le concept de Windows As a Service
Bien appréhender le concept de Windows As a ServiceMicrosoft Technet France
 
Protéger vos données avec le chiffrement dans Azure et Office 365
Protéger vos données avec le chiffrement dans Azure et Office 365Protéger vos données avec le chiffrement dans Azure et Office 365
Protéger vos données avec le chiffrement dans Azure et Office 365Microsoft Technet France
 
Protéger votre patrimoine informationnel dans un monde hybride avec Azure Inf...
Protéger votre patrimoine informationnel dans un monde hybride avec Azure Inf...Protéger votre patrimoine informationnel dans un monde hybride avec Azure Inf...
Protéger votre patrimoine informationnel dans un monde hybride avec Azure Inf...Microsoft Technet France
 
Comprendre la stratégie identité de Microsoft
Comprendre la stratégie identité de MicrosoftComprendre la stratégie identité de Microsoft
Comprendre la stratégie identité de MicrosoftMicrosoft Technet France
 
Vous avez dit « authentification sans mot de passe » : une illustration avec ...
Vous avez dit « authentification sans mot de passe » : une illustration avec ...Vous avez dit « authentification sans mot de passe » : une illustration avec ...
Vous avez dit « authentification sans mot de passe » : une illustration avec ...Microsoft Technet France
 
Déploiement hybride, la téléphonie dans le cloud
Déploiement hybride, la téléphonie dans le cloudDéploiement hybride, la téléphonie dans le cloud
Déploiement hybride, la téléphonie dans le cloudMicrosoft Technet France
 
Supervisez la qualité des appels Skype for Business Online à l'aide de Call Q...
Supervisez la qualité des appels Skype for Business Online à l'aide de Call Q...Supervisez la qualité des appels Skype for Business Online à l'aide de Call Q...
Supervisez la qualité des appels Skype for Business Online à l'aide de Call Q...Microsoft Technet France
 
SharePoint 2016 : architecture, déploiement et topologies hybrides
SharePoint 2016 : architecture, déploiement et topologies hybridesSharePoint 2016 : architecture, déploiement et topologies hybrides
SharePoint 2016 : architecture, déploiement et topologies hybridesMicrosoft Technet France
 
Gestion de Windows 10 et des applications dans l'entreprise moderne
Gestion de Windows 10 et des applications dans l'entreprise moderneGestion de Windows 10 et des applications dans l'entreprise moderne
Gestion de Windows 10 et des applications dans l'entreprise moderneMicrosoft Technet France
 
Office 365 dans votre Système d'Informations
Office 365 dans votre Système d'InformationsOffice 365 dans votre Système d'Informations
Office 365 dans votre Système d'InformationsMicrosoft Technet France
 

More from Microsoft Technet France (20)

Automatisez, visualisez et améliorez vos processus d’entreprise avec Nintex
Automatisez, visualisez et améliorez vos processus d’entreprise avec Nintex Automatisez, visualisez et améliorez vos processus d’entreprise avec Nintex
Automatisez, visualisez et améliorez vos processus d’entreprise avec Nintex
 
Comment réussir votre déploiement de Windows 10
Comment réussir votre déploiement de Windows 10Comment réussir votre déploiement de Windows 10
Comment réussir votre déploiement de Windows 10
 
OMS log search au quotidien
OMS log search au quotidienOMS log search au quotidien
OMS log search au quotidien
 
Fusion, Acquisition - Optimisez la migration et la continuité des outils col...
 Fusion, Acquisition - Optimisez la migration et la continuité des outils col... Fusion, Acquisition - Optimisez la migration et la continuité des outils col...
Fusion, Acquisition - Optimisez la migration et la continuité des outils col...
 
Wavestone déploie son portail Powell 365 en 5 semaines
Wavestone déploie son portail Powell 365 en 5 semainesWavestone déploie son portail Powell 365 en 5 semaines
Wavestone déploie son portail Powell 365 en 5 semaines
 
Retour d’expérience sur le monitoring et la sécurisation des identités Azure
Retour d’expérience sur le monitoring et la sécurisation des identités AzureRetour d’expérience sur le monitoring et la sécurisation des identités Azure
Retour d’expérience sur le monitoring et la sécurisation des identités Azure
 
Scénarios de mobilité couverts par Enterprise Mobility + Security
Scénarios de mobilité couverts par Enterprise Mobility + SecurityScénarios de mobilité couverts par Enterprise Mobility + Security
Scénarios de mobilité couverts par Enterprise Mobility + Security
 
SharePoint Framework : le développement SharePoint nouvelle génération
SharePoint Framework : le développement SharePoint nouvelle générationSharePoint Framework : le développement SharePoint nouvelle génération
SharePoint Framework : le développement SharePoint nouvelle génération
 
Stockage Cloud : il y en aura pour tout le monde
Stockage Cloud : il y en aura pour tout le mondeStockage Cloud : il y en aura pour tout le monde
Stockage Cloud : il y en aura pour tout le monde
 
Bien appréhender le concept de Windows As a Service
Bien appréhender le concept de Windows As a ServiceBien appréhender le concept de Windows As a Service
Bien appréhender le concept de Windows As a Service
 
Protéger vos données avec le chiffrement dans Azure et Office 365
Protéger vos données avec le chiffrement dans Azure et Office 365Protéger vos données avec le chiffrement dans Azure et Office 365
Protéger vos données avec le chiffrement dans Azure et Office 365
 
Protéger votre patrimoine informationnel dans un monde hybride avec Azure Inf...
Protéger votre patrimoine informationnel dans un monde hybride avec Azure Inf...Protéger votre patrimoine informationnel dans un monde hybride avec Azure Inf...
Protéger votre patrimoine informationnel dans un monde hybride avec Azure Inf...
 
Comprendre la stratégie identité de Microsoft
Comprendre la stratégie identité de MicrosoftComprendre la stratégie identité de Microsoft
Comprendre la stratégie identité de Microsoft
 
Vous avez dit « authentification sans mot de passe » : une illustration avec ...
Vous avez dit « authentification sans mot de passe » : une illustration avec ...Vous avez dit « authentification sans mot de passe » : une illustration avec ...
Vous avez dit « authentification sans mot de passe » : une illustration avec ...
 
Sécurité des données
Sécurité des donnéesSécurité des données
Sécurité des données
 
Déploiement hybride, la téléphonie dans le cloud
Déploiement hybride, la téléphonie dans le cloudDéploiement hybride, la téléphonie dans le cloud
Déploiement hybride, la téléphonie dans le cloud
 
Supervisez la qualité des appels Skype for Business Online à l'aide de Call Q...
Supervisez la qualité des appels Skype for Business Online à l'aide de Call Q...Supervisez la qualité des appels Skype for Business Online à l'aide de Call Q...
Supervisez la qualité des appels Skype for Business Online à l'aide de Call Q...
 
SharePoint 2016 : architecture, déploiement et topologies hybrides
SharePoint 2016 : architecture, déploiement et topologies hybridesSharePoint 2016 : architecture, déploiement et topologies hybrides
SharePoint 2016 : architecture, déploiement et topologies hybrides
 
Gestion de Windows 10 et des applications dans l'entreprise moderne
Gestion de Windows 10 et des applications dans l'entreprise moderneGestion de Windows 10 et des applications dans l'entreprise moderne
Gestion de Windows 10 et des applications dans l'entreprise moderne
 
Office 365 dans votre Système d'Informations
Office 365 dans votre Système d'InformationsOffice 365 dans votre Système d'Informations
Office 365 dans votre Système d'Informations
 

Recently uploaded

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Azure Brain: 4th paradigm, scientific discovery & (really) big data

  • 1. Azure Brain: 4th paradigm, scientific discovery & (really) big data (REC201) Gabriel Antoniu Senior Research Scientist, Inria Head of the KerData Project-Team, Inria Rennes – Bretagne Atlantique Radu Tudoran PhD student, ENS Cachan – Brittany KerData Project-Team, Inria Rennes – Bretagne Atlantique Feb. 12, 2013
  • 2. INRIA’s strategy in Cloud Computing INRIA is among the leaders in Europe in the area of distributed computing and HPC • Long history of researches around distributed systems, HPC, Grids • Now several activities virtualized environments/cloud infrastructures • Culture of multidisciplinary research • Culture of exploration tools (owner of massively parallel machines since 1987, large scale testbeds such as Grid’5000) • Strong involvement in national, European and international collaborative projects • Strong collaboration history with industry (Joint Microsoft Research – Inria Centre, IBM, EDF, Bull, etc.) - 2
  • 3. Clouds: where within Inria ? 1 2 Networks, Systems and Services, Distributed Computing3 Perception, Cognition, Interaction 4 5 Applied Mathematics, Computation and Simulation Algorithmics, Programming, Software and Architecture Computational Sciences for Biology, Medicine and the Environment - 3
  • 4. Some project-teams involved in Cloud Computing INRIA Nancy Grand Est INRIA Grenoble Rhône-Alpes INRIA Sophia Antipolis Méditerranée INRIA Rennes Bretagne Atlantique INRIA Bordeaux Sud-Ouest INRIA Lille Nord Europe INRIA Saclay Île-de-France INRIA Paris Rocquencourt KERDATA: Data Storage and Processing MYRIADS: Autonomous Distributed Systems ASCOLA: Languages and virtualization CEPAGE: task management AVALON: middleware & programming MESCAL: models & tools REGAL: Large Scale dist. systems ALGORILLE: algorithms & models OASIS: programming ZENITH: Scientific Data Management - 4
  • 5. Initiatives to support Cloud Computing and HPC within Inria Why dedicated initiatives to support HPC/Clouds ? • Project-teams are geographically dispersed • Project-teams belong to different domains • Researchers from scientific computing need access to the latest research results related to tools, libraries, runtime systems, … • Researchers from “computer science” need access to applications to test their ideas as well as to find new ideas ! Concept of “Inria Large Scale Initiatives” • Enable ambitious projects linked with the strategic plan • Promote an interdisciplinary approach • Mobilizing expertise of Inria researchers around key challenges - 5
  • 6. CLOUD COMPUTING@ INRIA RENNES BRETAGNE ATLANTIQUE - 6
  • 7. Some Research Focus Areas Software architecture and infrastructure for cloud computing • Autonomic service management, resource management, SLA, sky computing: Myriads • Big Data storage and management, MapReduce: KerData • Hybrid Cloud and P2P systems, privacy: ASAP Advanced usage for specific application communities • Bioinformatics: GENSCALE • Cloud for medical imaging: EasyMed project (IRT B-Com): Visages - 7
  • 8. Some Research Focus Areas Software architecture and infrastructure for cloud computing • Autonomic service management, resource management, SLA, sky computing: Myriads • Big Data storage and management, MapReduce: KerData • Hybrid Cloud and P2P systems, privacy: ASAP Advanced usage for specific application communities • Bioinformatics: GENSCALE • Cloud for medical imaging: EasyMed project (IRT B-Com): Visages - 8
  • 9. Contrail EU project Goal: develop an integrated approach to virtualization offering • services for federating IaaS clouds • elastic PaaS services on top of federated clouds Overview: provide tools for • managing federation of multiple heterogeneous IaaS clouds • offering a secure yet usable platform for end users through federated identity management • supporting SLAs and quality of service (QoS) for satisfying stringent business requirements for using the cloud Resource Provider Federa&on)API) +)Fed.)core) Resource Provider Storage( Provider( Public( Cloud( Storage( Provider(Network( Provider( A) A)A) A) Applica&on) Applica&on) Applica&on) Federa&on)API) +)Fed.)core) Federa&on)API) +)Fed.)core) Contrail is an open source cloud computing software stack compliant with cloud standards http://contrail-project.eu - 9
  • 11. Open source software under the GNU GPLv2 license http://snooze.inria.fr Other Research Activities on Cloud Computing Snooze: an autonomic energy- efficient IaaS management system Scalability • Distributed VM management system • Self-organizing & self-healing hierarchy Energy conservation • Idle nodes in power-saving mode • Holistic approach to favor idle nodes VM management algorithms • Energy-efficient VM placement • Under-load / overload mitigation • Automatic node power-cycling and wake- up Resilin: Elastic MapReduce on multiple clouds (sky computing) Goals • Creation of MapReduce execution platforms on top of multiple clouds • Elasticity of the platforms • Support all kinds of Hadoop jobs • Support different Hadoop versions Interfaces • Amazon EMR for users • Libcloud with underlying IaaS providers Open source software under GNU Affero GPL license http://resilin.inria.fr - 11
  • 12. KerData: Dealing with the Data Deluge Deliver the capability to mine, search and analyze this data in near real time Science itself is evolving Credits: Microsoft 12- 12
  • 13. Last few decades The Data Science: The 4th Paradigm for Scientific Discovery Thousand years ago Today and the Future Last few hundred years 2 2 2. 3 4 a cG a a           Simulation of complex phenomena Newton’s laws, Maxwell’s equations… Description of natural phenomena Unify theory, experiment and simulation with large multidisciplinary Data Using data exploration and data mining (from instruments, sensors, humans…) Distributed Communities Crédits: Dennis Gannon 13
  • 14. Last few decades The Data Science: The 4th Paradigm for Scientific Discovery Thousand years ago Today and the Future Last few hundred years 2 2 2. 3 4 a cG a a           Simulation of complex phenomena Newton’s laws, Maxwell’s equations… Description of natural phenomena Unify theory, experiment and simulation with large multidisciplinary Data Using data exploration and data mining (from instruments, sensors, humans…) Distributed Communities 14
  • 15. Research Focus: How to efficiently store, share and process data for new-generation, data-intensive applications? • Scientific challenges • Massive data (1 object = 1 TB) • Geographically distributed • Fine-grain access (MB) for reading and writing • High concurrency (10³ concurrent clients) • Without locking - Major goal: high-throughput under heavy concurrency - Our contribution Design and implementation of distributed algorithms Validation with real apps on real platforms with real users • Applications • Massive data analysis: clouds (e.g. MapReduce) • Post-Petascale HPC simulations: supercomputers - 15
  • 16. BlobSeer: A Software Platform for Scalable, Distributed BLOB Management Started in 2008, 6 PhD theses (Gilles Kahn/SPECIF PhD Thesis Award in 2011) Main goal: optimized for concurrent accesses under heavy concurrency Three key ideas •Decentralized metadata management •Lock-free concurrent writes (enabled by versioning) - Write = create new version of the data •Data and metadata “patching” rather than updating A back-end for higher-level data management systems •Short term: highly scalable distributed file systems •Middle term: storage for cloud services Our approach •Design and implementation of distributed algorithms •Experiments on the Grid’5000 grid/cloud testbed •Validation with “real” apps on “real” platforms: Nimbus, Azure, OpenNebula clouds… http://blobseer.gforge.inria.fr/ 16- 16
  • 17. Impact of BlobSeer: MapReduce BlobSeer improves Hadoop • Gain (execution time) : 35% ANR MapReduce Project (2010-2014) • Lead: G. Antoniu (KerData) • Partners: INRIA (AVALON), Argonne National Lab, U. Illinois Urbana-Champaign, IBM, JLPC, IBCP, MEDIT • Strong collaboration with the Nimbus team from Argonne National Lab - BlobSeer integrated with the Nimbus cloud toolkit - BlobSeer used for efficient VM deployment and snapshotting • Validation : Grid’5000 with Nimbus, FutureGrid (USA), Open Cirrus (USA) http://mapreduce.inria.fr - 17
  • 18. The A-Brain Project: Data-Intensive Processing on Microsoft Azure Clouds Application • Large-scale joint genetic and neuroimaging data analysis Goal • Assess and understand the variability between individuals Approach • Optimized data processing on Microsoft’s Azure clouds Inria teams involved • KerData (Rennes) • Parietal(Saclay) Framework • Joint MSR-Inria Research Center • MS involvement: Azure teams, EMIC 18
  • 19. Genetic information: SNPs G G T G T T T G G G MRI brain images Clinical / behaviour The Imaging Genetics Challenge: Comparing Heterogeneous Information THere we focus on this link - 19
  • 20. Neuroimaging-genetics: The Problem  Several brain diseases have a genetic origin, or their occurrence/severity related to genetic factors  Genetics important to understand & predict response to treatment  Genetic variability captured in DNA micro-array data p( )| Gene→Image geneticimage 20
  • 21. Imaging Genetics Methodological Issues Genetic dataBrain image Y ~105-106 ~2000 X ~105-106 – Anatomical MRI – Functional MRI – Diffusion MRI – DNA array (SNP/CNV) – gene expression data – others... - 21
  • 22. A BIG DATA Challenge … Azure can help… Data: double permutation voxels SNPs 5%-10% useful Computation: Estimate timespan on single machine Estimation for A-Brain on Azure (350 cores) Storage capacity estimations (350 cores)
  • 23. Imaging Genetics Methodological Issues  Multivariate methods: predict brain characteristic with many genetic variables  Elastic net regularization: combination of ℓ1 and ℓ2 penalties → sparse loadings  parameters setting: internal cross-validation/bootstrap  Performance evaluated using permutations 23
  • 24. A-Brain as Map-Reduce Processing - 24
  • 25. A-Brain as Map-Reduce Data Processing 25
  • 26. Efficient Procedures for Statistics Example : voxelwise Genome Wide Association Studies (vGWAS)  740 subjects  ~ 50,000 voxels  ~ 500,000 SNPs  10,000 permutations → ~ 12,000 hours of computation → ~ 1.8 Po of statistical scores - 26
  • 27. Efficient Procedures for Statistics Example : Ridge regression with cross-validation loops  Some costly computations (SVD ~ 60 sec) are used 1-2 millions of times and cannot be kept in memory. ~ 60-120 x 106 sec / SVD (1.9-3.8 years / SVD) → An efficient distributed cache can achieve huge speedup! - 27
  • 29. Requirements for a cloud storage / data management High throughput under heavy concurrency Fine grain access Scalability / Elasticity Data availability Transparency Design principles Data locality – use the local storage No modification on the cloud middleware Loose coupling between storage and applications Storage hierarchy - 29
  • 30. TomusBlobs - Architecture - 30 Computation nodes
  • 31. Architecture contd. System components Initiator - Cloud specific - Generic stub - Properties: Scaling; Self configuration Distributed Storage - Aggregates the virtual disks - Not depending on a specific solution Client API - Cloud specific API - Expose the operation transparently Initiator Local Disk Application Client API TB entity VM snapshot Customizable Environment - 31
  • 32. TomusBlobs Evaluation • Scenario: Single reader / writer • Data transfer from memory to storage • Metric: Client IO throughput
  • 33. TomusBlobs Evaluation Cumulative read throughput Cumulative write throughput • Scenario: Multiple readers / writers • Throughput limited by bandwidth • Read 4X ; Write 5X - 34
  • 34. TomusBlobs as a Storage Backend for Sharing Application Data in MapReduce App API App App App App API API API API TomusBlobs - 35
  • 35. TomusMapReduce Evaluation • Scenario: Increase the problem size • Optimize computation by managing better intermediate data - 36
  • 36. Iterative MapReduce - Daytona  Merge Step  In-Memory Caching of static data  Cache aware hybrid scheduling using Queues as well as using a bulletin board (special table) Reduce Reduce Merge Add Iteration? No Map Combine Map Combine Map Combine Data Cache Yes Hybrid scheduling of the new iteration Job Start Job Finish Crédits: Dennis Gannon - 37
  • 37. Beyond MapReduce • Unique result with parallel reduce phase • No central control entity • No synchronization barrier Map Reducer Map Map Map Map Reducer - 38
  • 38. Zoom on the Reduction Ratio • Compute the minimum of a set of large matrixes (7.5 GB) using 30 mappers - 39
  • 40. The Most Frequent Words benchmark •Input data size varies from 3.2 GB to 32 GB •ReductionRatio = 5 - 41
  • 41. Execution times for A-Brain •Increasing number of map jobs = increasing size of data (5 GB to 50 GB) - 42
  • 42. Beyond Single Site processing • Data movements across geo-distributed deployments is costly • Minimize the size and number of transfers • The overall aggregate must collaborate towards reaching the goal • The deployments work as independent services • The architecture can be used for scenarios in which data is produced in different locations - 43
  • 43. Towards a Geo-distributed TomusBlobs approach • TomusBlobs for intra- deployment data management • Public Storage (Azure Blobs/Queues) for intra- deployment communication • Iterative Reduce technique for minimizing number of transfers (and data size) • Balance the network bottleneck from single data center - 44
  • 44. Multi-Site MapReduce • 3 deployments (NE,WE,NUS) • 1000 CPUs • ABrain execution across multiple sites - 45
  • 45. Beyond MapReduce - Workflow Processing - 46
  • 46. Data access patterns for workflows [1] [1] Vairavanathan et al. A Workflow-Aware Storage System: An Opportunity Study http://ece.ubc.ca/~matei/paper s/ccgrid2012.pdf Pipeline Caching Data informed workflow Input Output Broadcast Replication Data size Input Output Reduce/Gather Co-placement of all data Data informed workflow Input Output Scatter File size awareness Data informed workflow Input Output - 47
  • 48. Generic Worker Walkthrough (Microsoft ATLE) Local storage Client code Researcher Job Management Service Algorithm HD GW Driver Pluggable Runtime Environment Runtime Business Logic Job Details Table Job Index Table Notification Listeners (Accounting, Status Change, etc..) BLOB Storage Notification Service Scaling Service OGF BES VM SOAP WS–* Use of interoperable standard protocols and data schemas! OGF JSDL Application Code GW Services & SDKs Existing Components Input Files Output Files Shared Storage - 49 Credits: Microsoft
  • 49. Defining the scope ID Files Batch jobs Assumptions about the workflows Workflows are composed of batch jobs with well-defined data passing schemas The input and the output of the batch jobs are files The batch jobs and their inputs and outputs can be uniquely identified in the system Most workflows fit in this subclass Idea: manage files inside the deployment - 50
  • 50. The Concept File Name Locations F1 VM1 F2 VM1,VM2 F3 VM2 VM 1 VM 2 Local Disk Local Disk F1F2 F3 Transfer Module File Metadata Registry (1) Register (F1,VM1) (2) GetLocation(F1) (3) DownloadFile(F1) F1 F2 • Metadata Registry • Transfer Module Components Transfer Module - 51
  • 51. Characteristics of the components Metadata Registry Transfer Module Role • Transfer files from one node to another Data type • Files Accessibility • Each VM has such a module • Applications access the local module • The modules interact across nodes Solutions • FTP; Torrent; InMemory, HTTP etc. Role •Hold the location of files within the deployment Data type •Key-value pairs – (file identification; retrieval information) Accessibility •Accessible by all nodes Solutions •Azure Caching Preview, Azure Tables, InMemory DB Idea: Adopt multiple transfer solutions Adapt to the context: select the one that fits best - 52
  • 52. Transfer methods Method Observations InMemory • Caching data • InMemory data offers fast access • GBs of memory capacity per deployment • Small files BitTorrent • Replicas for file dissemination • Collaborative reads • New way of stage-in data FTP • TCP transfer • Medium and large files • Potential of inter-operability - 53
  • 53. VM Snapshot VMMemory MetaDataRegistry Adaptive Storage FTP Torrent InMemory Transfer Module Services Replication Queue Replication FTP Tracker Peer Local Disk - 54
  • 54. F1 Azure Caching Adaptive Storage Adaptive Storage App App F1 Create Upload(F1) GetMetadata Read(F1) Memory Memory Local Storage Local Storage Read (F1) WriteMetadata Write(F1) Download (F1) API API - 55
  • 55. Scenario 2 – Large files ; replication enabled 0 10 20 30 40 50 60 70 80 50 100 150 200 250 Time(sec) Size of a single file (MB) DirectLink Torrent Adaptive AzureBlobs • Torrents are superior for broadcast when replicas are used • DirectLink is faster for pipeline (reduction tree) • Adaptive storage can chose each time the best strategy - 56
  • 56. NCBI Blast for Azure Seamless Experience • Evaluate data and invoke computational models from Excel. • Computationally heavy analysis done close to large database of curated data. • Scalable for large, surge computationally heavy analysis. selects DBs and input sequence Web Role Input Splitter Worker Role BLAST Execution Worker Role #n…. Combiner Worker Role Genome DB 1 Genome DB K BLAST DB Configuration Azure Blob Storage BLAST Execution Worker Role #1 Crédits: Dennis Gannon - 57
  • 57. BLAST analysis – data management component 0 30 60 90 120 5 10 15 25 35 40 50 60 Time(sec) Number of BLAST jobs Download Adaptive Download AzureBlobs Upload Adaptive Upload AzureBlobs • Database files – 1.6 GB • Input size – 800 MB • 50 nodes - 58
  • 58. Scalable Storage on Clouds: Open Issues Understanding price-performance trade-offs • Consistency, availability, performance, cost, security, quality of service, energy consumption • Autonomy, adaptive consistency • Dynamic elasticity • Trade-offs exposed to the user High performance variability - Understand it, model it, cope with it Deployment/application launching time is high Latency of data accesses is still an issue Data movements are expensive Cope with tightly-coupled applications Cope with various cloud programming models Virtualization overhead Benchmarking Performance modeling Self-optimization for cost reduction - Elastic scale down Security and privacy - 59
  • 59. Extreme scale does matter BUT not only Other focus areas – Affordability and usability of intermediate size systems – Pervasiveness of usage across the entire industry, including Small and Medium Enterprises (SMEs) and ISVs – New HPC deployments (e.g. Big Data, HPC in Clouds) – HPC and Cloud usage expansion, fostering the development of consultancy, expertise and service business / end-user support – Facilitating the creation of start-ups and the development of the SME sector (hw/sw supply side) – Education and training (inc. engineering skills for industry) Cloud Computing@INRIA Strategic Research Agenda - 60
  • 60. - 61 Azure Brain: 4th paradigm, scientific discovery & (really) big data (REC201) Gabriel Antoniu Senior Research Scientist, Inria Head of the KerData Project-Team, Inria Rennes – Bretagne Atlantique Radu Tudoran PhD student, ENS Cachan – Brittany KerData Project-Team, Inria Rennes – Bretagne Atlantique Contacts: Gabriel.Antoniu@inria.fr, Radu.Tudoran@inria.fr