Alluxio Product school Webinar - Distributed Caching for Generative AI

Distributed Caching for
Generative AI: Optimizing
LLM Data Pipelines
Shouwei Chen @Alluxio

Head of Community,
Developer Relations
@jasminechenwang
Core Maintainer & PM
Shouwei Chen

Join the conversation
on Slack
alluxio.io/slack
1,200+ Github
contributors & growing
10,000+ Slack
Community Members
Top 10 Most Critical Java
Based Open Source Project[1]
GitHubʼs Top 100 Most Valuable
Repositories Out of 96 Million
[1] Google Comes Up With A Metric For Gauging Critical Open-Source Projects
Open Source Started From UC Berkeley AMPLab

Analytics & AI
in the Hybrid & Multi-Cloud Era
Available:

I/O challenges In ML/DL
Metadata Scalability
CV Training data often
consists of a massive
amount of small raw
files (billions of 100KB
photos)
Cloud API cost
Training job can read
ten thousands of
times for single copy of
files, result in
thousands $
cost/training
High IO throughput
Training jobs are highly
concurrent, require
high I/O to keep
GPU&CPU utilized

Using Alluxio For DL
Distributed Caching
70
70 70
70 70
70
POSIX POSIX POSIX
▪ No full data copy from data source
▪ Serve fast DL training without HPC hardware
▪ Full data management for your machine learning unstructured datasets

Online data platform
Offline data platform / Data warehouse
Inference cluster
models
Offline training platform
Training cluster
Training data
Models /
Training data
models
Training data
Training cluster
models
training data
training data
Training data
Architecture diagram w/ Alluxio as access layer

Online data platform
Offline data platform / Data warehouse
Inference cluster
models
Offline training platform
Training cluster
Training data
Models /
Training data
models
Training data
Training cluster
models
training data
training data
Training data
Problem solved - compare to data migration
Expensive HPC hardware
Customization on data migration create overhead for eng efforts
Engineer have to manually delete the outdate data in persistent layer(Cloud storage & HDFS)
User have to understand the data pipeline between data sources & AI/ML infra
Problem solved - compare to direct access
Low GPU Utilization
Cloud storage API cost in same region / across region data movement cost
Coupled architecture - Business driven(Vendor lock-in)
Architecture diagram w/ Alluxio as access layer

High Scalability
Tens of Billions files /
Training
ESSENTIAL
High Availability
99.99%
ESSENTIAL
High Performance
Higher GPU utilization
ESSENTIAL
Always increasing expectations…

Availability
● No single point failure
● Fault tolerance
● More friendly to K8S and Cloud (No statefulset
needed)
Performance & Scalability
● Unlimited scalability
● Support billions of small files
● Streamlined RPC call
Multi-Tenant in One Cluster
● Proven eﬀectiveness of cost
● Tenant isolation
● Serverless interface
● Easy management
Plugable Quota and Security Management
● Customizable quota management
● Token passthrough for S3/GCS/OSS
● Impersonation & Krb5 for HDFS
Motivation & Benefits

Dora Core Components - Aﬀinity
● Decide which worker to go on
client side
● Worker handle
list/getFileStatus call
● Fetch the worker list from
service periodically.
● Worker persists the metadata
to support quota and tenant
isolation
● For the unsupported calls or
when worker is down, clients
can fallback to UFS

Performance Improvement by Netty
Use Netty for Data Transmission by configuration.
Dora provide netty data transmission that improves reading
performance by 30%-50%.
Advantages compared to gRPC:
● Less data copies through diﬀerent thread pools.
● Zero-copy transmission avoiding serialization of
Protobuf.
● Avoid OOM by optimizing oﬀ-heap memory usage.
● Requires less bytes to transfer as there is no additional
HTTP header any more.

IO throughput improvements
● Up to 9X for unstructured data
● 2-15X for structured data
High Concurrent Read in Dora

Twitter.com/alluxio
Linkedin.com/alluxio
Website
www.alluxio.io
Slack
https://alluxio.io/slack
@
Social Media
Github
https://github.com/Alluxio

Alluxio Product school Webinar - Distributed Caching for Generative AI

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Alluxio Product school Webinar - Distributed Caching for Generative AI

Ähnlich wie Alluxio Product school Webinar - Distributed Caching for Generative AI (20)

Mehr von Alluxio, Inc.

Mehr von Alluxio, Inc. (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Alluxio Product school Webinar - Distributed Caching for Generative AI