Storage infrastructure using HBase behind LINE messages
1.
2. Storage infrastructure using
HBase behind LINE messages
NHN Japan Corp.
LINE Server Task Force
Shunsuke Nakamura
@sunsuk7tp
13.1.21
Hadoop
Conference
Japan
2013
Winter
2
3. To support ’s users, we have built
message storage that is
Large scale (tens of billion rows/day)
Responsive (under 10 ms)
High available (dual clusters)
13.1.21
Hadoop
Conference
Japan
2013
Winter
3
4. Outline
• About LINE
• LINE & Storage requirements
• What we achieved
• Today’s topics
– IDC online migration
– NN failover
– Stabilizing LINE message cluster
• Conclusion
13.1.21
Hadoop
Conference
Japan
2013
Winter
4
5. LINE
- A global messenger powered by NHN Japan -
Devices
5 different mobile platforms
+ Desktop support
13.1.21
Hadoop
Conference
Japan
2013
Winter
5
9. New year 2013 in Japan
Number of requests in a HBase cluster
Usual Peak Hours New Year 2013
X
3
(ploFed
by
1min)
あけおめ!
新年好!
3
5mes
traffic
explosion
LINE
Storage
had
no
problems
:)
13.1.21
Hadoop
Conference
Japan
2013
Winter
9
10. LINE on Hadoop
Storages for service, backup and log
For HBase, M/R and log archive
Bulk migration and ad-hoc analysis
For HBase and Sharded-Redis
Collecting Apache and Tomcat logs
KPI, Log analysis
13.1.21
Hadoop
Conference
Japan
2013
Winter
10
11. LINE on Hadoop
Storages for service, backup and log
For HBase, M/R and log archive
Bulk migration and ad-hoc analysis
For HBase and Sharded-Redis
Collecting Apache and Tomcat logs
KPI, Log analysis
13.1.21
Hadoop
Conference
Japan
2013
Winter
11
12. LINE service requirements
LINE is a…
Messaging Service - Should be fast
Global Service - Downtime not allowed
But, not a Simple Messaging Service.
Message synchronization b/w phone & PCs
– Messages should be kept for a while.
13.1.21
Hadoop
Conference
Japan
2013
Winter
12
13. LINE’s storage requirements
No
data
loss
Eventual
Low
consistency
latency
HA
Flexible
schema
Easy
scale-‐
management
out
13.1.21
Hadoop
Conference
Japan
2013
Winter
13
14. Our selection is HBase
• Low latency for large amount of data
• Linearly scalable
• Relatively lower operating cost
– Replication by nature
– Automatic failover
• Data model fits our requirements
– Semi-structured
– Timestamp
13.1.21
Hadoop
Conference
Japan
2013
Winter
14
15. Stored rows per day in a cluster
(billions/day)
10
8
6
4
2
13.1.21
Hadoop
Conference
Japan
2013
Winter
15
16. What we achieved with HBase
• No data loss
– Persistent
– Data replication
• Automatic recovery from server failure
• Reasonable performance for large data sets
– Hundreds of billion rows
– Write: ~ 1 ms
– Read: 1 ~ 10 ms
13.1.21
Hadoop
Conference
Japan
2013
Winter
16
17. Many issues we had
• Heterogeneous storages coordination
• IDC online migration
• Flush & Compaction Storms by “too many HLogs”
• Row & Column distribution
• Secondary Index
• Region Management
– load, size balancing
– RS Allocation
– META region
– M/R
• Monitoring for diagnostics
• Traffic burst by decommission
• NN problems
• Performance degradation
– hotspot problem
– timeout burst
– GC problem
• Client bugs
– Thread Blocking on server failure (HBASE-6364)
13.1.21
Hadoop
Conference
Japan
2013
Winter
17
18. Today’s topics
IDC online migration
NN failover
Stabilizing LINE message cluster
13.1.21
Hadoop
Conference
Japan
2013
Winter
18
20. Why?
• Move whole HBase clusters and data
• For better network infrastructure
• Without downtime
13.1.21
Hadoop
Conference
Japan
2013
Winter
20
21. IDC online migration
Before migration
App Server
dst-HBase
write
src-HBase
13.1.21
Hadoop
Conference
Japan
2013
Winter
21
22. IDC online migration
• Write to both (client-level replication)
write
App Server
dst-HBase
write
src-HBase
13.1.21
Hadoop
Conference
Japan
2013
Winter
22
23. IDC online migration
• New data: Incremental replication
• Old data: Bulk migration
• dst’s timestamp equals src’s one
write
App Server
dst-HBase
write
src-HBase
13.1.21
Hadoop
Conference
Japan
2013
Winter
23
24. LINE HBase Replicator & BulkMigrator
Replicator is for incremental replication
BulkMigrator is for bulk migration
13.1.21
Hadoop
Conference
Japan
2013
Winter
24
25. LINE HBase Replicator
• Our own implementation
• Prefer pull to push
• Throughput throttling
• Workload isolation of replicator and RS
• Rowkey conversion and filtering
HBase
Replicator
LINE
HBase
Replicator
src-HBase
src-HBase
push
pull
dst-HBase
dst-HBase
13.1.21
Hadoop
Conference
Japan
2013
Winter
25
26. LINE HBase Replicator
- A simple daemon to replicate local regions -
1. HLogTracker reads a ckpt
and selects next HLog.
2. For each entry in HLog:
1. Filter & convert a HLog.Entry
2. Create Puts and batch to dst HBase
• Periodic checkpointing
• Generally, entries are replicated
in seconds
13.1.21
Hadoop
Conference
Japan
2013
Winter
26
27. Bulk migration
1. MapReduce between any storages
– Map task only
– Read source, write destination
– Task scheduling problem depends on region allocation
2. Non MapReduce version (BulkMigrator)
– Our own implementation
– HBase → HBase
– On each RS, scan & batch by a region
– Throughput throttling
– Slow, but easy to implement and debug
13.1.21
Hadoop
Conference
Japan
2013
Winter
27
31. NameNode failure
in 2012.10
13.1.21
Hadoop
Conference
Japan
2013
Winter
31
32. HA-NN failover failed
• Not NameNode process
• Incorrect leader election at network partitioning
• Complicated configuration
– Easy to mistake, difficult to control
– Pacemaker scripting was not straightforward
– VIP is risky to HDFS
• DRBD split-brain problem
– Protocol C
– Unable to re-sync while service is online
13.1.21
Hadoop
Conference
Japan
2013
Winter
32
33. Now: In-house NN failure handling
• Bye-bye old HA-NN
– Had to restart whole HBase clusters after NN failover
• Alternative ideas
– Quorum-based leader election (Using ZK)
– Using L4 switch
– Implement our own AvatarNode
• Safer solution instead of a little downtime
13.1.21
Hadoop
Conference
Japan
2013
Winter
33
34. In-house NN failure handling (1)
rsync
with
-‐-‐link-‐dest
periodically
13.1.21
Hadoop
Conference
Japan
2013
Winter
34
35. In-house NN failure handling (2)
Bomb
13.1.21
Hadoop
Conference
Japan
2013
Winter
35
36. In-house NN failure handling (3)
13.1.21
Hadoop
Conference
Japan
2013
Winter
36
38. Stabilizing LINE message cluster
Case
1
“Too
many
HLogs”
H/W
Failure
RS
GC
Storm
Handling
Case
3
Case
2
META
region
Hotspot
workload
Performance
problems
isola5on
Case
4
Region
mappings
to
RS
13.1.21
Hadoop
Conference
Japan
2013
Winter
38
39. Case1: “Too many HLogs”
• Effect
– MemStore flush storm
– Compaction storm
• Cause
– Different regions growth
– Heterogeneous tables in a RS
• Solution
– Region balancing
– External flush scheduler
13.1.21
Hadoop
Conference
Japan
2013
Winter
39
40. Case1: Number of HLogs
Forced flushed
shed
N o flu
Periodic flushed
better case
peak
off-peak
worse case
Forced flushed
Forced flushed
flush storm
Forced flushed
13.1.21
Hadoop
Conference
Japan
2013
Winter
40
41. Case2: Hotspot problems
• Effect
– Excessive GC
– RS performance degradation (High CPU usage)
• Cause
– Get/Scan:
• Row or column, updated too frequently
• Row which has too many columns (+ tombstones)
• Solution
– Schema and row/column distribution are important
– Hotspot region isolation
13.1.21
Hadoop
Conference
Japan
2013
Winter
41
42. Case3: META region workload
isolation
• Effect
1. RS high CPU
2. Excessive timeout
3. META lookup timeout
• Cause
– Inefficient exception handling of HBase client
– Hotspot region and META in same RS
• Solution
– META only RS
13.1.21
Hadoop
Conference
Japan
2013
Winter
42
43. Case4: Region mappings to RS
• Effect
– Region mapping is not restored on RS restart
– Some region mappings aren’t restored properly
after graceful restart
• graceful_stop.sh --restart --reload
• Cause
– HBase does not support it well
• Solution
– Periodic dump and restore it
13.1.21
Hadoop
Conference
Japan
2013
Winter
43
44. Summary
• IDC online migration
– Without downtime
– LINE HBase Replicator & BulkMigrator
• NN failover
– Simple solution for a person saying
“What’s Hadoop?”
• Stabilizing LINE message cluster
– Improved response time of RS
13.1.21
Hadoop
Conference
Japan
2013
Winter
44
45. Conclusion
We won 100M user adopting HBase
LINE Storage is a successful example
of a messaging service using HBase
13.1.21
Hadoop
Conference
Japan
2013
Winter
45