Data Infra Meetup
Jan. 25, 2024
Organized by Alluxio
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Juncheng Yang(Ph.D Candidate, @CMU)
As a cache eviction algorithm, FIFO has a lot of attractive properties, such as simplicity, speed, scalability, and flash-friendliness. The most prominent criticism of FIFO is its low efficiency (high miss ratio). In this talk, I will describe a simple, scalable FIFO-based algorithm with three static queues (S3-FIFO). Evaluated on 6594 cache traces from 14 datasets, we show that S3- FIFO has lower miss ratios than state-of-the-art algorithms across traces. Moreover, S3-FIFO’s efficiency is robust — it has the lowest mean miss ratio on 10 of the 14 datasets. FIFO queues enable S3-FIFO to achieve good scalability with 6× higher throughput compared to optimized LRU at 16 threads. Our insight is that most objects in skewed workloads will only be accessed once in a short window, so it is critical to evict them early (also called quick demotion). The key of S3-FIFO is a small FIFO queue that filters out most objects from entering the main cache, which provides a guaranteed demotion speed and high demotion precision.
2. Software cache and eviction
• Ubiquitous deployments of software caches
• page cache, block cache, database cache
• key-value cache, object cache…
• Cache metrics
• efficiency / effectiveness: miss ratio
• throughput and scalability: requests/sec
• simplicity
• A core component of cache design: eviction
2
Cachelib Pelikan
3. A long history of research centered around LRU
•Least-recently-used (LRU)
• maintain objects in a queue with last-access order
• update metadata (with locking) on each read request
• Problems
• not scalable
• not scan-resistant
3
4. A long history of research centered around LRU
•Improve LRU’s efficiency
• add more techniques/queues/metrics: LIRS[SIGMETRICS’02], LRU-K[SIGMOD’93], 2Q[VLDB’94], MQ[ATC’01],
ARC[FAST’03], TinyLFU[TOS’17], LRB[NSDI’20], CACHEUS[FAST’21]…
• sacrifice throughput and/or scalability
•Improve LRU’s throughput and scalability
• reduce #operations/locks per-request: relaxed LRU, CLOCK variants[NSDI’13], FrozenHot[Eurosys’23]
• conventional wisdom: sacrifice efficiency, our finding[HotOS’23] shows not true
•State-of-the-arts: tradeoff between efficiency and throughput
4
[HotOS’23] FIFO queues
can be better than LRU
5. An alternative: FIFO eviction algorithm
•First-in-first-out (FIFO)
•simpler
•fewer metadata
•less computation
•more scalable
•flash-friendly
5
The only drawback:
FIFO has a high miss ratio
7. Observation
• One-hit wonder: objects appeared once in the sequence
• Zipfian workloads: One-hit-wonder ratio decreases with sequence length (measured in #obj)
• Why short sequences? A cache starts eviction after seeing a short request sequence
More one-hit wonders than you would have expected
A B A C B A D A B C B A E C B D
A
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
request
time
start time end time
sequence length
(# objects)
# one-hit wonder one-hit wonder ratio
1 17 5 1 (E) 20%
1 7 4 2 (C, D) 50%
1 4 3 2 (B, C) 66%
8. Observation
• One-hit wonder: objects appeared once in the sequence
• Zipfian workloads: One-hit-wonder ratio decreases with sequence length (measured in #obj)
• Why short sequences? A cache starts eviction after seeing a short request sequence
More one-hit wonders than you would have expected
8
the one-hit-wonder ratio of week-long traces at 10% length:
72% (mean on 6594 traces)
9. Observation
9
LRU cache running Twitter workload
LRU cache running MSR workload
most objects in the cache are one-hit wonders
11. S3-FIFO design
Simple, Sc
a
l
a
ble eviction
a
lgorithm with three St
a
tic FIFO queues
11
main FIFO (90% space)
small
ghost
struct object {
…
uint8_t cnt:2;
}
*using 1-bit
f
l
a
g
a
lso is suf
f
icient for most worklo
a
ds
if cnt == 0
evict
else
reinsert
cnt--
3 on eviction
if not in ghost, else
2 on cache miss
if cnt = 1, else
3 on eviction
cnt++
1 on cache hit
12. S3-FIFO features
• Simple and robust: static queues
• Fast: no metadata update for most requests
• Scalable: no lock
• Tiny metadata: 2 bits
• Flash-friendly: sequential writes
12
Implementation: one, two or three FIFO-queues
14. Evaluation setup
• Data
• 14 datasets, 6594 traces from Twitter, Meta, Microsoft, Wikimedia, Tencent, Alibaba, major CDNs…
• 848 billion requests, 60 billion objects
• collected between 2007 and 2023
• block, key-value, object caches
• Platform
• libCacheSim, cachelib
• CloudLab with 1 million core·hours
• Data and software are all open-sourced
• Metric
• miss ratio reduction from FIFO
• throughput in Mops/sec
14
18. Efficiency
Miss r
a
tio reduction distribution
a
cross
a
ll tr
a
ces
18
More efficient than state-of-the-art algorithms, up to 72% lower miss ratio than LRU
better
.
.
-
19. 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
Miss ratio reduction from FIFO
fiu (block)
MSR (block)
CloudPhysics (block)
Systor (block)
Tencent (block)
Alibaba (block)
Social Network KV
Meta KV
Twitter KV
CDN1
CDN2
TencentPhoto CDN
Meta CDN
Wikimedia CDN
ARC 2Q TinyLFU-0.1 LIRS CACHEUS LHD
Efficiency
Me
a
n miss r
a
tio reduction on e
a
ch d
a
t
a
set
better
19
S3-FIFO yet to be shown
20. )00 1). 2 1).- .
)2 .
.
.2 (40) 0 .
401. .
- -1 .
) .
. ) 13.
1
3)11
- -1 (.1.
1
) ) )
)-4
Efficiency
Me
a
n miss r
a
tio reduction on e
a
ch d
a
t
a
set
• efficient: the best algorithm or is close to the best
• robust: the best on 10 of the 14 datasets
better
21. Throughput and scalability
Zipf worklo
a
ds
• the fastest on a single
thread
• more scalable than
optimized LRU, 6x higher
throughput
• close to Segcache[NSDI’21]
) (
)
)(
(
(
better
21
22. More in the paper
• Why S3-FIFO is effective
• Implication for flash cache
• Byte miss ratio results
• Impact of FIFO sizes
• What if we replace FIFO with LRU
22
23. Takeaway
• Cache workloads exhibit high one-hit-wonder ratio
• most objects in the cache are not re-accessed before being evicted
• critical to remove the one-hit wonders early
• S3-FIFO: simple, scalable caching with three static FIFO queues
• reinsertion to keep popular objects, a small FIFO queue to quickly filter out one-hit wonders
• adoption
• implemented at Google, VMware, Redpanda, etc.
• many libraries for Python, Javascript, Golang, Rust, C++ are available on GitHub
23
https://s3fifo.com
juncheny@cs.cmu.edu