SlideShare a Scribd company logo
1 of 24
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 1
Memory Speed Big Data
AnalyticsAlluxio vs Apache Ignite
Irfan Elahi
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 2
• Working as a Consultant in Deloitte (Analytics Service Line)
• 4+ years of experience in Big Data and Machine Learning in multiple verticals
• Recent Deloitte projects in Australia’s biggest Telecom company:
• Architecting one of the largest Hadoop deployments in cloud employing in-
memory computation technologies
• Developing enterprise-grade stream processing system based on Hadoop stack
employing NoSQL data-stores
• Premium Udemy Instructor with 12,000+ students from 131 countries
• Technical Reviewer of an upcoming Hadoop book published by APress
• Lets connect: ielahi@deloitte.com.au | linkedin.com/in/irfanelahi | @elahi_irfan
About Me
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 3
• Big Data – The Evolution and Beyond
• In-Memory Computation Trend – Overview
• Unaddressed Challenges in Big Data
• Introduction and Deep Dive Comparison of
Alluxio and Apache Ignite
Agenda
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 4
The Evolution and Beyond
Big Data
Scale-up
• Make individual expensive
machines bigger
• Challenges:
• No collocated processing
• Limited Scalability
Scale-out
• Add more machines
• Challenges:
• Programming complexity
• Partial Failures
Hadoop @ MapReduce
• Scalable , economical and fault
tolerant processing and storage
• Fit for Offline computation
• Disk I/O bound
Hadoop @ Spark
• Distributed in-memory
computation framework
• Fit for Offline + Online
computation
• Many challenges remain
unaddressed
Memory Centric Platforms
Enabling advanced caching and
memory management features to
optimize performance and address
many challenges
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 5
Name of the game:
Memory is the new disk!
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 6
Driving Factor: Economics
• Memory is much faster than
disk (approx. 3000x)
• Cost of memory decreasing
• Memory per node increasing
Challenges:
• Memory is still expensive than
disk (approx. 80x)
• Memory is still limited
• Not all data is memory-worthy
and that’s not all…
Driving Factor: Traditional
paradigms’ Limitations
Intermittent disk I/O and
serialization cost in traditional
computing platforms (e.g.
MapReduce) causes:
• High Latency
• In-efficiency in iterative
algorithms execution in
analytics (e.g. machine
learning, graph and network
analysis)
• In-efficiency in interactive data
mining
• Infeasibility for innovative use-
cases like stream processing
Impact: Innovative
technologies and processing
patterns
• New processing patterns:
Batch -> Event Driven
• New processing technologies:
Map Reduce -> Spark
Hive -> Impala
• New storage technologies:
HDFS -> Alluxio | IGFS
• New Use-cases:
Real-time stream processing
IoT
Overview
In-Memory Trend
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 7
Overview
Un-addressed Challenges:
• On-Heap memory in memory-
centric platforms (e.g. Spark)
is limited thus causing
resource pressure
• Data resilience is
compromised in the event of
application crashes and
causes expensive disk I/O
• Inter-process data/state
sharing still relies on HDFS
I/O thereby causing
performance issues
On-Heap Memory
Constraints:
• Managing increasing number of compute and storage
platforms increases complexity
• Adding/Removing respective systems require
application changes thus impacting DevOps lifecycle
• Data locality gets compromised
Many Compute to Storage Integration
Paradox:
Many leading Big data platforms still don’t support:
• ACID compliant Transactions
• ANSI SQL compliance
• Indexing
• In-place mutation
Missing SQL and Transactional
Support on Hadoop
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 8
Potential Missing Pieces of Puzzle:
• Alluxio
• Apache Ignite
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 9
Alluxio
• Launched in 2012 by UC Berkeley
AMPLab
• Formerly known as Tachyon
• Licensed under Apache License 2.0
• Approximately 500 contributors
• Deployed in Yahoo, Baidu, Intel,
Samsung to name a few
A distributed and scalable storage virtualized across multiple storage systems under
unified namespace to facilitate data access at memory speed
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 10
Ignite
• First release in early 2015
• August 2015: second fastest project to
graduate after Spark
• Licensed under Apache License 2.0
• Approximately 120 contributors
• Deployed in IBM, Siemens, Citibank,
Barclays, Nielsen to name a few
An distributed key-value store and scalable in-memory computing platform with
powerful SQL, key-value and processing APIs
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 11
Deep Dive Comparison
• Alluxio
• Apache Ignite
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 12
Alluxio Apache Ignite
Master Nodes:
• Manage File System Metadata
• Can be Primary or Secondary
• HA supported via ZooKeeper
Worker Nodes:
• Store data in the form of
blocks
• No rebalancing of blocks upon addition of new nodes
• Send heartbeats to Master Nodes
Require Under File System (UFS) (e.g. HDFS, S3) for operation
Architecture
master node(s):
group: B
worker node(s):
group: A
servers:
Optional Node Roles:
• Servers (Default | Equal by design | Multiple servers on one host)
• Clients (Explicitly defined | Connect to servers for computation)
Logical Grouping:
• User configurable node roles via attributes registered by nodes at
start-up
• Registered attributes can be leveraged for dynamic logical grouping
based on predicates (e.g. CPU Utilization > 50%) for localizing
processes and jobs
No Name-Node Architecture:
• When used as IGFS, no centralized metadata management (e.g. like
HDFS NameNode or Alluxio’s Master Nodes) is needed
• Hashing is used for data locality determination
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 13
Alluxio Apache Ignite
Configuration:
• Requires explicitly specifying:
• Master Node(s)
• Worker Node(s)
in the configuration files
• Addition of nodes requires restarting cluster
Interfaces:
• Alluxio Shell
• Web interface (also enables to browse Alluxio FS)
Configuration:
• Doesn’t require explicit specification of nodes in configuration files
• Nodes discover themselves automatically when started
• Supported methods for nodes discovery:
• Multi-cast
• Static IP based
• Cluster can be scaled without restarting
• Supported deployment modes: Shared or Embedded
Interfaces:
Visor CommandLine:
For viewing topology, node metrics, cache statistics and
administrating cluster
Web Interface:
Needs to be installed separately
Architecture (Continued)
group: Bgroup: A
servers:master node(s):
worker node(s):
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 14
Alluxio Apache Ignite
Architecture (Continued)
Alluxio Shell
Visor CommandLine
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 15
Alluxio Apache Ignite
Architecture (Continued)
Alluxio Web Interface
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 16
Alluxio Apache Ignite
• Provides two modes of persistence in addition to in-memory:
• Native Persistence (disk only)
• 3rd party Persistence (pluggable)
Native persistence:
• Treats disk for persisting super-set of data
• Supports SSD, Flash, 3d Xpoint storage
• Features like ACID compliance, SQL are supported only in this mode
3rd Party Persistence:
• Data stores like HDFS, Cassandra and JDBC based are pluggable
• Involves implementing CacheStore interface for read/write through
• Supports write-behind caching for improved performance
Integration with Data Stores
Alluxio (exposed as file system)
Spark
Map
Reduce
Flink
…
S3 HDFS Blob…
Ignite (In-memory)
Disk HDFS Cassandra…
native
persistence
3rd party
persistence
Spark/ Map
Reduce/ Flink SQL Streaming
…
Machine
Learning
Compute
• Enables processing frameworks to interact with data from
different data stores with unified namespace and API
• Conveniently supports the following data stores as UFS:
• HDFS, Blob, S3, GCS, Minio, Ceph, Swift, MapR-FS to
name a few
• Process involves mounting different UFS at different mount
points in Alluxio namespace and then accessing seamlessly in
applications
• Addition of more UFS storage is configurable
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 17
Alluxio Apache Ignite
• Alluxio is provides Hadoop compatible file system APIs and
thus data can be read/written via Spark RDD’s file system
related APIs
• Enables to read/write data from different data stores
(configured as UFS) via Alluxio’s unified Namespace and API
• Automatically manages movement of data persisted in Alluxio
or UFS
Two ways to integrate with Spark:
• As stand-alone IGFS or caching layer on HDFS:
• Ignite is exposed as HDFS and thus data can be read/written
via Spark RDD’s File System related APIs
• As Distributed Cache via IgniteContext:
• Provides implementation of Spark RDDs (supporting all
transformations and actions)
• Mutable RDDs (view over distributed cache’s content)
• Configurable lifespan depending upon Ignite’s deployment
mode
Integration with Spark
Alluxio (exposed as file system)
S3 HDFS Blob
RDD’s save to
file system API
IgniteContext
IgniteRDD [Tuple2]
transformations
RDD’s read from FileSystem API
transformations
Disk…
savePairs Action
Ignite IGFS Ignite (Distributed Cache)
RDD’s read/write
to/from FileSystem API
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 18
Alluxio Apache Ignite
//reading data:
val textRdd =
sc.textFile(“alluxio://masternode:19998/path”)
//transformations:
val textRdd2=textRdd.filter(_.contains(“deloitte”))
//writing data:
textRdd2.saveAsTextFile(“alluxio://masternode:19998/desti
nation_path”)
//creating IgniteContext
val igniteContext = new IgniteContext(sparkContext,() => new
IgniteConfiguration())
//creating IgniteRDD
val
cacheRdd:org.apache.ignite.spark.IgniteRDD[Integer,String]=
igniteContext.fromCache(“deloitte_cache")
//transformations:
val cacheRdd2=cacheRdd.filter(_._2.contains(“deloitte”))
//writing data:
cacheRdd2.savePairs()
Integration with Spark (Continued)
Alluxio (exposed as file system)
S3 HDFS Blob
RDD’s save to
file system API
IgniteContext
IgniteRDD [Tuple2]
transformations
RDD’s read from FileSystem API
transformations
Disk…
savePairs Action
Ignite IGFS Ignite (Distributed Cache)
RDD’s read/write
to/from FileSystem API
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 19
Alluxio Apache Ignite
Memory Architecture
MEM
SSD
HDD
Alluxio Tiers
Performance
Alluxio storage is divided into three ordered tiers as follows:
• MEM (memory)
• SSD
• HDD
• Allows to store data greater than the available Memory in
cluster
• Automatically manages data between tiers
• Data is written to top tier by default
Memory
Disk
…
Memory Region Memory Region Memory Region
Memory Segment
Data
Page
B+
Tree
Index
Page
Index
Page
Data
Page
In Native Persistence, data and index storage is divided into:
• Memory (subset of data)
• Disk (superset of data)
• Data can be stored both off-heap and on-heap.
• When stored off-heap, less constraints on volume of data to
be stored and less GC pauses
• Memory is further divided into Memory Regions
• Memory Regions consist of Memory Segments which comprise
of Data Page, B+ Tree Page, Index Page and FreeList
Structures
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 20
Alluxio Apache Ignite
Advanced Memory Management
MEM
SSD
HDD
Alluxio Tiers
Performance
• Pinning/Unpinning:
To enforce data locality in a specific tier
• Allocators:
For choosing locations to write new data blocks.
• Evictors:
For choosing which data to move to lower tier for
freeing space. Supported algorithms: Greedy, LRU,
LRFU, Partial LRU
• Evictors and Allocators are applied globally
• Write may fail if space cant be freed or if data exceeds the
size of top tier
Memory
Disk
…
Memory Region Memory Region Memory Region
Memory Segment
Data
Page
B+
Tree
Index
Page
Index
Page
Data
Page
Supports memory policies (e.g. eviction) to be applicable at:
• Memory Region Level (for off-heap caching)
• Entry Level (for on-heap caching)
thus providing more granular control
Eviction:
Supported algorithms for Page Based Eviction:
Random-LRU, Random2-LRU
Supported algorithms for Entry Level Eviction:
FIFO, LRU, Random
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 21
Alluxio Apache Ignite
Additional Capabilities: SQL Support
Not supported
• Supports distributed and Horizontally scalable SQL Database
capabilities
• Supports indexing
• SQL ANSI-99 compliant
• Supports all SQL DDL and DML commands including UPDATE,
DELETE, MERGE queries
• Resembles Kudu’s capabilities
• Counters limitation of HDFS
• Supports running queries on data spanning on memory or
disk. All of the data need not be in memory for processing
unlike in Impala or Spark
JDBC ODBC Ignite SQL API
Disk
RAM
Disk
RAM
Disk
RAM
Disk
RAM
Disk
RAM
…
Data and Indexes
Java .Net C++ PHP BI Tools
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 22
Alluxio Apache Ignite
Additional Capabilities (Continued)
Key Value APIs (not transactional/ACID compliant) Key Value APIs (transactional - ACID compliant)
Compute Grid: Distributed and parallel computation
MLGrid: Machine learning library on top of Apache Ignite.
Currently supports limited vector and matrix algebra operations
and other algorithms are on the roadmap.
Streaming ingestion: Ingesting real time streams of data into
ignite in distributed, scalable and fault tolerant manner
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 23
• For inter-process state sharing (e.g. across Spark jobs), both provide adequate
functional capabilities.
• Both platforms provide automatic hot data management whereas Apache Ignite
provides more granular control courtesy of its per memory region policies.
• For convenience in use-cases involving interacting with data in multiple storage
systems at memory speed, Alluxio makes more sense.
• For building real-time data and analytics pipelines, Apache Ignite makes more
sense as a sink.
• For analytical use-cases involving relational processing and in-place mutation at
high speed, Apache Ignite makes more sense.
Key Takeaways:
Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 24
Questions

More Related Content

What's hot

The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...Dremio Corporation
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariDataWorks Summit
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
 
Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsHive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsYifeng Jiang
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiManish Gupta
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataDataWorks Summit
 
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...Andrew Lamb
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioBest Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioAlluxio, Inc.
 
Transactional SQL in Apache Hive
Transactional SQL in Apache HiveTransactional SQL in Apache Hive
Transactional SQL in Apache HiveDataWorks Summit
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimDatabricks
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowDataWorks Summit
 
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureTransactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureDataWorks Summit
 

What's hot (20)

The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsHive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfs
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioBest Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+Alluxio
 
Transactional SQL in Apache Hive
Transactional SQL in Apache HiveTransactional SQL in Apache Hive
Transactional SQL in Apache Hive
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureTransactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and future
 

Similar to Apache Ignite vs Alluxio: Memory Speed Big Data Analytics

Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio, Inc.
 
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreMeetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreAlluxio, Inc.
 
Alluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle MeetupAlluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle MeetupAlluxio, Inc.
 
Unified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudUnified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudAlluxio, Inc.
 
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...Alluxio, Inc.
 
Achieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloadsAchieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloadsAlluxio, Inc.
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAlluxio, Inc.
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Cloudian
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
 
Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio Alluxio, Inc.
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio, Inc.
 
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudInteractive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudAlluxio, Inc.
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio, Inc.
 
Open Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and CloudOpen Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and CloudAlluxio, Inc.
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
 
Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAchieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAlluxio, Inc.
 
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017 Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017 Alluxio, Inc.
 
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Community
 

Similar to Apache Ignite vs Alluxio: Memory Speed Big Data Analytics (20)

Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
 
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreMeetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
 
Alluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle MeetupAlluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle Meetup
 
Unified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudUnified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any Cloud
 
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
 
Achieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloadsAchieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloads
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
HDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the CloudHDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the Cloud
 
Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
 
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudInteractive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
 
Open Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and CloudOpen Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and Cloud
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAchieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud World
 
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017 Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
 
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Apache Ignite vs Alluxio: Memory Speed Big Data Analytics

  • 1. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 1 Memory Speed Big Data AnalyticsAlluxio vs Apache Ignite Irfan Elahi
  • 2. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 2 • Working as a Consultant in Deloitte (Analytics Service Line) • 4+ years of experience in Big Data and Machine Learning in multiple verticals • Recent Deloitte projects in Australia’s biggest Telecom company: • Architecting one of the largest Hadoop deployments in cloud employing in- memory computation technologies • Developing enterprise-grade stream processing system based on Hadoop stack employing NoSQL data-stores • Premium Udemy Instructor with 12,000+ students from 131 countries • Technical Reviewer of an upcoming Hadoop book published by APress • Lets connect: ielahi@deloitte.com.au | linkedin.com/in/irfanelahi | @elahi_irfan About Me
  • 3. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 3 • Big Data – The Evolution and Beyond • In-Memory Computation Trend – Overview • Unaddressed Challenges in Big Data • Introduction and Deep Dive Comparison of Alluxio and Apache Ignite Agenda
  • 4. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 4 The Evolution and Beyond Big Data Scale-up • Make individual expensive machines bigger • Challenges: • No collocated processing • Limited Scalability Scale-out • Add more machines • Challenges: • Programming complexity • Partial Failures Hadoop @ MapReduce • Scalable , economical and fault tolerant processing and storage • Fit for Offline computation • Disk I/O bound Hadoop @ Spark • Distributed in-memory computation framework • Fit for Offline + Online computation • Many challenges remain unaddressed Memory Centric Platforms Enabling advanced caching and memory management features to optimize performance and address many challenges
  • 5. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 5 Name of the game: Memory is the new disk!
  • 6. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 6 Driving Factor: Economics • Memory is much faster than disk (approx. 3000x) • Cost of memory decreasing • Memory per node increasing Challenges: • Memory is still expensive than disk (approx. 80x) • Memory is still limited • Not all data is memory-worthy and that’s not all… Driving Factor: Traditional paradigms’ Limitations Intermittent disk I/O and serialization cost in traditional computing platforms (e.g. MapReduce) causes: • High Latency • In-efficiency in iterative algorithms execution in analytics (e.g. machine learning, graph and network analysis) • In-efficiency in interactive data mining • Infeasibility for innovative use- cases like stream processing Impact: Innovative technologies and processing patterns • New processing patterns: Batch -> Event Driven • New processing technologies: Map Reduce -> Spark Hive -> Impala • New storage technologies: HDFS -> Alluxio | IGFS • New Use-cases: Real-time stream processing IoT Overview In-Memory Trend
  • 7. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 7 Overview Un-addressed Challenges: • On-Heap memory in memory- centric platforms (e.g. Spark) is limited thus causing resource pressure • Data resilience is compromised in the event of application crashes and causes expensive disk I/O • Inter-process data/state sharing still relies on HDFS I/O thereby causing performance issues On-Heap Memory Constraints: • Managing increasing number of compute and storage platforms increases complexity • Adding/Removing respective systems require application changes thus impacting DevOps lifecycle • Data locality gets compromised Many Compute to Storage Integration Paradox: Many leading Big data platforms still don’t support: • ACID compliant Transactions • ANSI SQL compliance • Indexing • In-place mutation Missing SQL and Transactional Support on Hadoop
  • 8. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 8 Potential Missing Pieces of Puzzle: • Alluxio • Apache Ignite
  • 9. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 9 Alluxio • Launched in 2012 by UC Berkeley AMPLab • Formerly known as Tachyon • Licensed under Apache License 2.0 • Approximately 500 contributors • Deployed in Yahoo, Baidu, Intel, Samsung to name a few A distributed and scalable storage virtualized across multiple storage systems under unified namespace to facilitate data access at memory speed
  • 10. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 10 Ignite • First release in early 2015 • August 2015: second fastest project to graduate after Spark • Licensed under Apache License 2.0 • Approximately 120 contributors • Deployed in IBM, Siemens, Citibank, Barclays, Nielsen to name a few An distributed key-value store and scalable in-memory computing platform with powerful SQL, key-value and processing APIs
  • 11. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 11 Deep Dive Comparison • Alluxio • Apache Ignite
  • 12. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 12 Alluxio Apache Ignite Master Nodes: • Manage File System Metadata • Can be Primary or Secondary • HA supported via ZooKeeper Worker Nodes: • Store data in the form of blocks • No rebalancing of blocks upon addition of new nodes • Send heartbeats to Master Nodes Require Under File System (UFS) (e.g. HDFS, S3) for operation Architecture master node(s): group: B worker node(s): group: A servers: Optional Node Roles: • Servers (Default | Equal by design | Multiple servers on one host) • Clients (Explicitly defined | Connect to servers for computation) Logical Grouping: • User configurable node roles via attributes registered by nodes at start-up • Registered attributes can be leveraged for dynamic logical grouping based on predicates (e.g. CPU Utilization > 50%) for localizing processes and jobs No Name-Node Architecture: • When used as IGFS, no centralized metadata management (e.g. like HDFS NameNode or Alluxio’s Master Nodes) is needed • Hashing is used for data locality determination
  • 13. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 13 Alluxio Apache Ignite Configuration: • Requires explicitly specifying: • Master Node(s) • Worker Node(s) in the configuration files • Addition of nodes requires restarting cluster Interfaces: • Alluxio Shell • Web interface (also enables to browse Alluxio FS) Configuration: • Doesn’t require explicit specification of nodes in configuration files • Nodes discover themselves automatically when started • Supported methods for nodes discovery: • Multi-cast • Static IP based • Cluster can be scaled without restarting • Supported deployment modes: Shared or Embedded Interfaces: Visor CommandLine: For viewing topology, node metrics, cache statistics and administrating cluster Web Interface: Needs to be installed separately Architecture (Continued) group: Bgroup: A servers:master node(s): worker node(s):
  • 14. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 14 Alluxio Apache Ignite Architecture (Continued) Alluxio Shell Visor CommandLine
  • 15. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 15 Alluxio Apache Ignite Architecture (Continued) Alluxio Web Interface
  • 16. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 16 Alluxio Apache Ignite • Provides two modes of persistence in addition to in-memory: • Native Persistence (disk only) • 3rd party Persistence (pluggable) Native persistence: • Treats disk for persisting super-set of data • Supports SSD, Flash, 3d Xpoint storage • Features like ACID compliance, SQL are supported only in this mode 3rd Party Persistence: • Data stores like HDFS, Cassandra and JDBC based are pluggable • Involves implementing CacheStore interface for read/write through • Supports write-behind caching for improved performance Integration with Data Stores Alluxio (exposed as file system) Spark Map Reduce Flink … S3 HDFS Blob… Ignite (In-memory) Disk HDFS Cassandra… native persistence 3rd party persistence Spark/ Map Reduce/ Flink SQL Streaming … Machine Learning Compute • Enables processing frameworks to interact with data from different data stores with unified namespace and API • Conveniently supports the following data stores as UFS: • HDFS, Blob, S3, GCS, Minio, Ceph, Swift, MapR-FS to name a few • Process involves mounting different UFS at different mount points in Alluxio namespace and then accessing seamlessly in applications • Addition of more UFS storage is configurable
  • 17. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 17 Alluxio Apache Ignite • Alluxio is provides Hadoop compatible file system APIs and thus data can be read/written via Spark RDD’s file system related APIs • Enables to read/write data from different data stores (configured as UFS) via Alluxio’s unified Namespace and API • Automatically manages movement of data persisted in Alluxio or UFS Two ways to integrate with Spark: • As stand-alone IGFS or caching layer on HDFS: • Ignite is exposed as HDFS and thus data can be read/written via Spark RDD’s File System related APIs • As Distributed Cache via IgniteContext: • Provides implementation of Spark RDDs (supporting all transformations and actions) • Mutable RDDs (view over distributed cache’s content) • Configurable lifespan depending upon Ignite’s deployment mode Integration with Spark Alluxio (exposed as file system) S3 HDFS Blob RDD’s save to file system API IgniteContext IgniteRDD [Tuple2] transformations RDD’s read from FileSystem API transformations Disk… savePairs Action Ignite IGFS Ignite (Distributed Cache) RDD’s read/write to/from FileSystem API
  • 18. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 18 Alluxio Apache Ignite //reading data: val textRdd = sc.textFile(“alluxio://masternode:19998/path”) //transformations: val textRdd2=textRdd.filter(_.contains(“deloitte”)) //writing data: textRdd2.saveAsTextFile(“alluxio://masternode:19998/desti nation_path”) //creating IgniteContext val igniteContext = new IgniteContext(sparkContext,() => new IgniteConfiguration()) //creating IgniteRDD val cacheRdd:org.apache.ignite.spark.IgniteRDD[Integer,String]= igniteContext.fromCache(“deloitte_cache") //transformations: val cacheRdd2=cacheRdd.filter(_._2.contains(“deloitte”)) //writing data: cacheRdd2.savePairs() Integration with Spark (Continued) Alluxio (exposed as file system) S3 HDFS Blob RDD’s save to file system API IgniteContext IgniteRDD [Tuple2] transformations RDD’s read from FileSystem API transformations Disk… savePairs Action Ignite IGFS Ignite (Distributed Cache) RDD’s read/write to/from FileSystem API
  • 19. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 19 Alluxio Apache Ignite Memory Architecture MEM SSD HDD Alluxio Tiers Performance Alluxio storage is divided into three ordered tiers as follows: • MEM (memory) • SSD • HDD • Allows to store data greater than the available Memory in cluster • Automatically manages data between tiers • Data is written to top tier by default Memory Disk … Memory Region Memory Region Memory Region Memory Segment Data Page B+ Tree Index Page Index Page Data Page In Native Persistence, data and index storage is divided into: • Memory (subset of data) • Disk (superset of data) • Data can be stored both off-heap and on-heap. • When stored off-heap, less constraints on volume of data to be stored and less GC pauses • Memory is further divided into Memory Regions • Memory Regions consist of Memory Segments which comprise of Data Page, B+ Tree Page, Index Page and FreeList Structures
  • 20. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 20 Alluxio Apache Ignite Advanced Memory Management MEM SSD HDD Alluxio Tiers Performance • Pinning/Unpinning: To enforce data locality in a specific tier • Allocators: For choosing locations to write new data blocks. • Evictors: For choosing which data to move to lower tier for freeing space. Supported algorithms: Greedy, LRU, LRFU, Partial LRU • Evictors and Allocators are applied globally • Write may fail if space cant be freed or if data exceeds the size of top tier Memory Disk … Memory Region Memory Region Memory Region Memory Segment Data Page B+ Tree Index Page Index Page Data Page Supports memory policies (e.g. eviction) to be applicable at: • Memory Region Level (for off-heap caching) • Entry Level (for on-heap caching) thus providing more granular control Eviction: Supported algorithms for Page Based Eviction: Random-LRU, Random2-LRU Supported algorithms for Entry Level Eviction: FIFO, LRU, Random
  • 21. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 21 Alluxio Apache Ignite Additional Capabilities: SQL Support Not supported • Supports distributed and Horizontally scalable SQL Database capabilities • Supports indexing • SQL ANSI-99 compliant • Supports all SQL DDL and DML commands including UPDATE, DELETE, MERGE queries • Resembles Kudu’s capabilities • Counters limitation of HDFS • Supports running queries on data spanning on memory or disk. All of the data need not be in memory for processing unlike in Impala or Spark JDBC ODBC Ignite SQL API Disk RAM Disk RAM Disk RAM Disk RAM Disk RAM … Data and Indexes Java .Net C++ PHP BI Tools
  • 22. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 22 Alluxio Apache Ignite Additional Capabilities (Continued) Key Value APIs (not transactional/ACID compliant) Key Value APIs (transactional - ACID compliant) Compute Grid: Distributed and parallel computation MLGrid: Machine learning library on top of Apache Ignite. Currently supports limited vector and matrix algebra operations and other algorithms are on the roadmap. Streaming ingestion: Ingesting real time streams of data into ignite in distributed, scalable and fault tolerant manner
  • 23. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 23 • For inter-process state sharing (e.g. across Spark jobs), both provide adequate functional capabilities. • Both platforms provide automatic hot data management whereas Apache Ignite provides more granular control courtesy of its per memory region policies. • For convenience in use-cases involving interacting with data in multiple storage systems at memory speed, Alluxio makes more sense. • For building real-time data and analytics pipelines, Apache Ignite makes more sense as a sink. • For analytical use-cases involving relational processing and in-place mutation at high speed, Apache Ignite makes more sense. Key Takeaways:
  • 24. Memory Speed Big Data Analytics: Alluxio vs Apache IgniteIrfan Elahi - Deloitte 24 Questions

Editor's Notes

  1. Traditional computation (e.g. Map Reduce) requires disk I/O thus resulting in slow performance Training Machine learning models iterative steps which are severely impacted if it involved intermittent disk I/O In machine learning involving interative algorithms, intermittent disk I/O severely impacts performance
  2. Traditional computation (e.g. Map Reduce) requires disk I/O thus resulting in slow performance Training Machine learning models iterative steps which are severely impacted if it involved intermittent disk I/O In machine learning involving interative algorithms, intermittent disk I/O severely impacts performance
  3. In-place mutation Tiered st Traditional computation (e.g. Map Reduce) requires disk I/O thus resulting in slow performance Training Machine learning models iterative steps which are severely impacted if it involved intermittent disk I/O In machine learning involving interative algorithms, intermittent disk I/O severely impacts performance https://www.slideshare.net/imcsummit/imcs2015-1-devopensource-inmemory-platforms?qid=f751b742-9974-420c-b0da-e4ee57eeeadd&v=&b=&from_search=34 States not shareable across processes Causes resource pressure Abstracts are limited to the lifecycle of a job Spark: RDDs don’t exist beyond a job RDDs state cant be shared across jobs ???