HBase Technical Deep Dive: Anatomy of a RegionServer

Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
HBase
Technical Deep Dive
Sept 17 2013 – Toronto Hadoop User Group
Adam Muise
amuise@hortonworks.com

Deep Dive Agenda
• Background
– (how did we get here?)
• High-level Architecture
– (where are we?)
• Anatomy of a RegionServer
– (how does this thing work?)
• Using HBase
– (where do we go from here?)
Page 2

Background
Page 3

So what is a BigTable anyway?
• BigTable paper from Google, 2006, Dean et al.
– “Bigtable is a sparse, distributed, persistent multi-dimensional
sorted map.”
– http://research.google.com/archive/bigtable.html
• Key Features:
– Distributed storage across cluster of machines
– Random, online read and write data access
– Schemaless data model (“NoSQL”)
– Self-managed data partitions
Page 4

Modern Datasets Break Traditional Databases
Page 5
>  10x more always-connected mobile devices than seen in PC era.
>  Sensor, video and other machine generated data easily exceeds 100TB / day.
>  Traditional databases can’t serve modern application needs.

Apache HBase: The Database For Big Data
Page 6
More data is the key to richer application
experiences and deeper insights.
With HBase you can:
ü  Ingest and retain more data, to petabyte scale and beyond.
ü  Store and access huge data volumes with low latency.
ü  Store data of any structure.
ü  Use the entire Hadoop ecosystem to gain deep insight on your data.

HBase At A Glance
Page 7
1
2
4
CLIENT LAYER
HBASE LAYER
HDFS LAYER
1
Clients automatically load
balanced across the cluster.
2
Scales linearly to handle any
load.
3
Data stored in HDFS allows
automated failover.
4
Analyze data with any Hadoop
tool.
3

HBase: Real-Time Data on Hadoop
Page 8
>  Read, Write, Process and Query data in real time using Hadoop infrastructure.

HBase: High Availability
Page 9
>  Data safely protected in HDFS.
>  Failed nodes are automatically recovered.
>  No single point of failure, no manual intervention.
HBase NodeHBase Node
Replication Replication
HDFS HDFS HDFS

HBase: Multi-Datacenter Replication
Page 10
>  Replicate data to 2 or more datacenters.
>  Load balancing or disaster recovery.

HBase: Seamless Hadoop Integration
Page 11
>  HBase makes deep analytics simple using any Hadoop tool.
>  Query with Hive, process with Pig, classify with Mahout.
HDFS

Apache Hadoop in Review
• Apache Hadoop Distributed Filesystem (HDFS)
– Distributed, fault-tolerant, throughput-optimized data storage
– Uses a filesystem analogy, not structured tables
– The Google File System, 2003, Ghemawat et al.
– http://research.google.com/archive/gfs.html
• Apache Hadoop MapReduce (MR)
– Distributed, fault-tolerant, batch-oriented data processing
– Line- or record-oriented processing of the entire dataset
– “[Application] schema on read”
– MapReduce: Simplified Data Processing on Large Clusters, 2004,
Dean and Ghemawat
– http://research.google.com/archive/mapreduce.html
Page 12
For more on writing MapReduce applications, see “MapReduce
Patterns, Algorithms, and Use Cases”
http://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/

High-level Architecture
Page 13

Logical Architecture
• [Big]Tables consist of billions of rows, millions of
columns
• Records ordered by rowkey
– Inserts require sort, write-side overhead
– Applications can take advantage of the sort
• Continuous sequences of rows partitioned into
Regions
– Regions partitioned at row boundary, according to size (bytes)
• Regions automatically split when they grow too large
• Regions automatically distributed around the cluster
– ”Hands-free" partition management (mostly)
Page 14

Logical Architecture
Distributed, persistent partitions of a BigTable
a
b
d
c
e
f
h
g
i
j
l
k
m
n
p
o
Table A
Region 1
Region 2
Region 3
Region 4
Region Server 7
Table A, Region 1
Table A, Region 2
Table G, Region 1070
Table L, Region 25
Region Server 86
Table A, Region 3
Table C, Region 30
Table F, Region 160
Table F, Region 776
Region Server 367
Table A, Region 4
Table C, Region 17
Table E, Region 52
Table P, Region 1116
Legend:
- A single table is partitioned into Regions of roughly equal size.
- Regions are assigned to Region Servers across the cluster.
- Region Servers host roughly the same number of regions.
Page 15

Physical Architecture
• RegionServers collocate with DataNode
– Tight MapReduce integration
– Opportunity for data-local online processing via coprocessors
(experimental)
• HBase Master process manages Region assignment
• ZooKeeper configuration glue
• Clients communicate directly with RegionServers (data
path)
– Horizontally scale client load
– Significantly harder for a single ignorant process to DOS the cluster
• DDL operations clients communicate with HBase Master
• No persistent state in Master or ZooKeeper
– Recover from HDFS snapshot
– See also: AWS Elastic MapReduce's HBase restore path
Page 16

Page 17
Physical Architecture
Distribution and Data Path
...
Zoo
Keeper
Zoo
Keeper
Zoo
Keeper
HBase
Client
JavaApp
HBase
Client
JavaApp
HBase
Client
HBase Shell
HBase
Client
REST/Thrift
Gateway
HBase
Client
JavaApp
HBase
Client
JavaApp
Region
Server
Data
Node
Region
Server
Data
Node
...
Region
Server
Data
Node
Region
Server
Data
Node
HBase
Master
Name
Node
Legend:
- An HBase RegionServer is collocated with an HDFS DataNode.
- HBase clients communicate directly with Region Servers for sending and receiving data.
- HMaster manages Region assignment and handles DDL operations.
- Online conﬁguration state is maintained in ZooKeeper.
- HMaster and ZooKeeper are NOT involved in data path.

Logical Data Model
• Table as a sorted map of maps
{rowkey => {family => {qualifier => {version => value}}}}
– Think: nested OrderedDictionary (C#), TreeMap (Java)
• Basic data operations: GET, PUT, DELETE
• SCAN over range of key-values
– benefit of the sorted rowkey business
– this is how you implement any kind of "complex query”
• GET, SCAN support Filters
– Push application logic to RegionServers
• INCREMENT, CheckAnd{Put,Delete}
– Server-side, atomic data operations
– Require read lock, can be contentious
• No: secondary indices, joins, multi-row transactions
Page 18

Page 19
Logical Data Model
A sparse, multi-dimensional, sorted map
Legend:
- Rows are sorted by rowkey.
- Within a row, values are located by column family and qualifier.
- Values also carry a timestamp; there can me multiple versions of a value.
- Within a column family, data is schemaless. Qualifiers and values are treated as arbitrary bytes.
1368387247 [3.6 kb png data]"thumb"cf2b
a
cf1
1368394583 7
1368394261 "hello"
"bar"
1368394583 22
1368394925 13.6
1368393847 "world"
"foo"
cf2
1368387684 "almost the loneliest number"1.0001
1368396302 "fourth of July""2011-07-04"
Table A
rowkey
column
family
column
qualifier
timestamp value

Anatomy of a RegionServer
Page 20

Storage Machinery
• RegionServers host N Regions, as assigned by Master
– Common case, Region data is local to the RegionServer/DataNode
• Each column family stored in isolation of others
– "column-family oriented” storage
– NOT the same as column-oriented storage
• Key-values managed by "HStore”
– combined view over data on disk + in-memory edits
– region manages one HStore for each column family
• On disk: key-values stored sorted in "StoreFiles”
– StoreFiles composed of ordered sequence of "Blocks”
– also carries BloomFilter to minimize Block access
• In memory: "MemStore" maintains heap of recent edits
– not to be confused with "BlockCache”
– this structure is essentially a log-structured merge tree (LSM-tree)*
with MemStore C0 and StoreFiles C1
Page 21
* http://staff.ustc.edu.cn/~jpq/paper/flash/1996-The%20Log-
Structured%20Merge-Tree%20%28LSM-Tree%29.pdf

Page 22
RegionServer
HDFS
HLog
(WAL)
HRegion
HStore
StoreFile
HFile
StoreFile
HFile
MemStore
...
...
HStore
BlockCache
HRegion
...
HStoreHStore
...
Legend:
- A RegionServer contains a single WAL, single BlockCache, and multiple Regions.
- A Region contains multiple Stores, one for each Column Family.
- A Store consists of multiple StoreFiles and a MemStore.
- A StoreFile corresponds to a single HFile.
- HFiles and WAL are persisted on HDFS.
Storage Machinery
Implementing the data model

Write Path (Storage Machinery cont.)
• Write summary:
1.  Log edit to HLog (WAL)
2.  Record in MemStore
3.  ACK write
• Data events recorded to a WAL on HDFS, for durability
– After fails, edits in WAL are replayed during recovery
– WAL appends are immediate, in critical write-path
• Data collected in "MemStore", until a "flush" writes new
HFiles
– Flush is automatic, based on configuration (size, or staleness interval)
– Flush clears WAL entries corresponding to MemStore entries
– Flush is deferred, not in critical write-path
• HFiles are merge-sorted during "Compaction”
– Small files compacted into larger files
– old records discarded (major compaction only)
– Lots of disk and network IO
Page 23

Page 24
RegionServer
HDFS
HLog
(WAL)
HRegion
HStore
StoreFile
HFile
StoreFile
HFile
MemStore
...
...
HStore
BlockCache
HRegion
...
HStoreHStore
...
Legend:
1. A MutateRequest is received by the RegionServer.
2. A WALEdit is appended to the HLog.
3. The new KeyValues are written to the MemStore.
4. The RegionServer acknowledges the edit with a MutateResponse.
Write Path
Storing a KeyValue
1
2
3
4

Read Path (Storage Machinery, cont.)
• Read summary:
1.  Evaluate query predicate
2.  Materialize results from Stores
3.  Batch results to client
• Scanners opened over all relevant StoreFiles + MemStore
– “BlockCache” maintains recently accessed Blocks in memory
– BloomFilter used to skip irrelevant Blocks
– Predicate matchs accumulate, sorted, return ordered rows
• Same Scanner APIs used for GET and SCAN
– Different access patterns, different optimization strategies
– SCAN:
– HDFS optimized for throughput of long sequential reads
– Consider larger Block size for more data per seek
– GET:
– BlockCache maintains hot Blocks for point access (GET)
– Consider more granular BloomFilter
Page 25

Page 26
RegionServer
HDFS
HLog
(WAL)
HRegion
HStore
StoreFile
HFile
StoreFile
HFile
MemStore
...
...
HStore
BlockCache
HRegion
...
HStoreHStore
...
Legend:
1. A GetRequest is received by the RegionServer.
2. StoreScanners are opened over appropriate StoreFiles and the MemStore.
3. Blocks identiﬁed as potential matches are read from HDFS if not already in the BlockCache.
4. KeyValues are merged into the ﬁnal set of Results.
5. A GetResponse containing the Results is returned to the client.
Read Path
Serving a single read request
1 5
2
3
3
2
4

Using HBase
Page 27

For what kinds of workloads is it well suited?
• It depends on how you tune it, but…
• HBase is good for:
– Large datasets
– Sparse datasets
– Loosely coupled (denormalized) records
– Lots of concurrent clients
• Try to avoid:
– Small datasets (unless you have *lots* of them)
– Highly relational records
– Schema designs requiring transactions
Page 28

HBase Use Cases
Page 29
Flexible
Schema
Huge
Data
Volume

High
Read
Rate
High
Write
Rate

Machine-‐Generated

Data

Distributed
Messaging

Real-‐Time

Analy@cs

Object
Store

User
Proﬁle

Management

Hbase Example Use Case:
Major Hard Drive Manufacturer
Page 30
• Goal: detect defective drives before they leave the
factory.
• Solution:
– Stream sensor data to HBase as it is generated by their test
battery.
– Perform real-time analysis as data is added and deep analytics
offline.
• HBase a perfect fit:
– Scalable enough to accommodate all 250+ TB of data needed.
– Seamless integration with Hadoop analytics tools.
• Result:
– Went from processing only 5% of drive test data to 100%.

Other Example HBase Use Cases
• Facebook messaging and counts
• Time series data
• Exposing Machine Learning models (like risk sets)
• Large message set store and forward, especially in
social media
• Geospatial indexing
• Indexing the Internet
Page 31

How does it integrate with my infrastructure?
• Horizontally scale application data
– Highly concurrent, read/write access
– Consistent, persisted shared state
– Distributed online data processing via Coprocessors
(experimental)
• Gateway between online services and offline storage/
analysis
– Staging area to receive new data
– Serve online “views” on datasets in HDFS
– Glue between batch (HDFS, MR1) and online (CEP, Storm)
systems
Page 32

What data semantics does it provide?
• GET, PUT, DELETE key-value operations
• SCAN for queries
• INCREMENT, CAS server-side atomic operations
• Row-level write atomicity
• MapReduce integration
Page 33

Creating a table in HBase
#!/bin/sh

#
Small
script
to
setup
the
hbase
table
used
by
OpenTSDB.

test
-‐n
"$HBASE_HOME"
||
{
#A

echo
>&2
'The
environment
variable
HBASE_HOME
must
be
set'

exit
1
}

test
-‐d
"$HBASE_HOME"
||
{

echo
>&2
"No
such
directory:
HBASE_HOME=$HBASE_HOME"

exit
1
}

TSDB_TABLE=${TSDB_TABLE-‐'tsdb'}
UID_TABLE=${UID_TABLE-‐'tsdb-‐uid'}
COMPRESSION=$
{COMPRESSION-‐'LZO'}

exec
"$HBASE_HOME/bin/hbase"
shell
<<EOF

create
'$UID_TABLE',
#B
{NAME
=>
'id',
COMPRESSION
=>
'$COMPRESSION'},
#B
{NAME
=>
'name',

COMPRESSION
=>
'$COMPRESSION'}
#B

create
'$TSDB_TABLE',
#C
{NAME
=>
't',
COMPRESSION
=>
'$COMPRESSION'}
#C

EOF

#A
From
environment,
not
parameter

#B
Make
the
tsdb-‐uid
table
with
column
families
id
and
name

#C
Make
the
tsdb
table
with
the
t
column
family

#Script
taken
from
HBase
in
Action
-‐
Chapter
7

Page 34

Coprocessors in a nutshell
•  Two types of coprocessors: Observer and Endpoints
•  Coprocessors are java code executed in each region server
•  Observer
–  Similar to a database trigger
–  Available Observer types: RegionObserver, WALObserver, MasterObserver
–  Mainly used to extend pre/post logic within region server events, WAL events, or
DDL events
•  Endpoint
–  Sort of like a UDF
–  Extend HBase client API to make functions exposed to a user
–  Still executed on RegionServer
–  Often used for sums/aggregations (HBase packs in an aggregate example)
• BE VERY CAREFUL WITH COPROCESSORS
–  They run in your region servers and buggy code can take down your cluster
–  See HOYA details to help mitigate risk
Page 35

What about operational concerns?
• Balance memory and IO for reads
– Contention between random and sequential access
– Configure Block size, BlockCache based on access patterns
– Additional resources
– “HBase: Performance Tuners,” http://labs.ericsson.com/blog/hbase-
performance-tuners
– “Scanning in HBase,” http://hadoop-hbase.blogspot.com/2012/01/
scanning-in-hbase.html
• Balance IO for writes
– Provision hardware with more spindles/TB
– Configure L1 (compactions, region size, &c.) based on write pattern
– Balance contention between maintaining L1 and serving reads
– Additional resources
– “Configuring HBase Memstore: what you should know,” http://
blog.sematext.com/2012/07/16/hbase-memstore-what-you-should-know/
– “Visualizing HBase Flushes And Compactions,” http://www.ngdata.com/
visualizing-hbase-flushes-and-compactions/
Page 36

Operational Tidbits
• Decommissioning Nodes will result in a downed
server, use “graceful_stop.sh” to offload the workload
from the region server
• Use the “zk_dump” to find all of your region servers
and how your zookeeper instances are faring
• Use “status ‘summary’” or “status ‘detailed’” for a
count of live/dead servers, average load, and file
counts
• User “balancer” to automatically balance regions if
HBase is set to auto-balance
• When using “hbase hbck” to diagnose and fix issues,
RTFM!
Page 37

SQL and HBase
Hive and Phoenix over HBase
Page 38

Phoenix over HBase
• Phoenix is a SQL shim over HBase
• https://github.com/forcedotcom/phoenix
• Hbase has fast write capabilities to
Phoenix allows for fast simple query (no
joins) and fast upserts
• Phoenix implements it’s own JDBC
driver so you can use your favorite tools
Page 39

Creative Commons Attribution-ShareAlike 3.0 Unported License.© Hortonworks Inc. 2013
Phoenix over HBase
Page 40

Hive over HBase
• Hive can be used directly with HBase
• Hive uses the MapReduce InputFormat
“HBaseStorageHandler” to query from the table
• Storage Handler has hooks for
– Getting input / output formats
– Meta data operations hook: CREATE TABLE, DROP TABLE, etc
• Storage Handler is a table level concept
– Does not support Hive partitions, and buckets
• Hive does not need to include all columns from HBase
table
Page 41

Hive over HBase
Page 42

Hive over HBase
Page 43

Hive and Phoenix over HBase
> hive
add jar /usr/lib/hbase/hbase-0.94.6.1.3.0.0-107-security.jar;
add jar /usr/lib/hbase/lib/zookeeper.jar;
add jar /usr/lib/hbase/lib/protobuf-java-2.4.0a.jar;
add jar /usr/lib/hive/lib/hive-hbase-handler-0.11.0.1.3.0.0-107.jar; set hbase.zookeeper.quorum=node1.hadoop;
CREATE EXTERNAL TABLE phoenix_mobilelograw( key string,
ip string,
ts string,
code string,
d1 string,
d2 string,
d3 string,
d4 string, properties string )
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = ":key,F:IP,F:TS,F:CODE,F:D1,F:D2,F:D3,F:D4,F:PROPERTIES") TBLPROPERTIES
("hbase.table.name" = "MOBILELOGRAW”);
set hive.hbase.wal.enabled=false;
INSERT OVERWRITE TABLE phoenix_mobilelograw SELECT * FROM hive_mobilelograw; set
hive.hbase.wal.enabled=true;
Page 44

Hbase Roadmap
Page 45

Hortonworks Focus Areas for HBase
Page 46
•  Simplified Operations:
•  Intelligent Compaction
•  Automated Rebalancing
•  Ambari Management:
•  Snapshot / Revert
•  Multimaster HA
•  Cross-site Replication
•  Backup / Restore
•  Ambari Monitoring:
•  Latency metrics
•  Throughput metrics
•  Heatmaps
•  Region visualizations
Simplified Operations Database Functionality
•  First-Class Datatypes
•  SQL Interface Support
•  Indexes
•  Security
•  Encryption
•  More Granular Permissions
•  Performance:
•  Stripe Compactions
•  Short Circuit Read for
Hadoop 2
•  Row and Entity Groups
•  Deeper Hive/Pig Interop

HBase Roadmap Details: Operations
Page 47
• Snapshots:
– Protect data or restore to a point in time.
• Intelligent Compaction:
– Compact when the system is lightly utilized.
– Avoid “compaction storms” that can break SLAs.
• Ambari Operational Improvements:
– Configure multi-master HA.
– Simple setup/configuration for replication.
– Manage and schedule snapshots.
– More visualizations, more health checks.

HBase Roadmap Details: Data Management
Page 48
• Datatypes:
– First-class datatypes offer performance benefits and better
interoperability with tools and other databases.
• SQL Interface (Preview):
– SQL interface for simplified analysis of data within HBase.
– JDBC driver allows embedding in existing applications.
• Security:
– Granular permissions on data within HBase.

HOYA
HBase On YARN
Page 49

HOYA?
•  The new YARN resource negotiation layer in Hadoop allows for
non-mapreduce applications to run on a Hadoop grid, why not
allow HBase to take advantage of this capability?
•  https://github.com/hortonworks/hoya/
•  HOYA is a YARN application that provisions regionservers based
on an HBase cluster configuration
•  HOYA helps to bring HBase into YARN resource management
and paves the way for advanced resource management with
HBase
•  HOYA can be used to spin up temporary HBase clusters
temporarily during MapReduce or other jobs
Page 50

A quick YARN refresher…
Page 51

The 1st Generation of Hadoop: Batch
HADOOP 1.0
Built for Web-Scale Batch Apps
Single
App

BATCH
HDFS
Single
App

INTERACTIVE
Single
App

BATCH
HDFS
•  All other usage
patterns must
leverage that same
infrastructure
•  Forces the creation
of silos for managing
mixed workloads
Single
App

BATCH
HDFS
Single
App

ONLINE

A Transition From Hadoop 1 to 2
HADOOP 1.0
HDFS

(redundant,
reliable
storage)

MapReduce

(cluster
resource
management

&
data
processing)

A Transition From Hadoop 1 to 2
HADOOP 1.0
HDFS

(redundant,
reliable
storage)

MapReduce

(cluster
resource
management

&
data
processing)

HDFS

(redundant,
reliable
storage)

YARN

(cluster
resource
management)

MapReduce

(data
processing)

Others

(data
processing)

HADOOP 2.0

The Enterprise Requirement: Beyond Batch
To become an enterprise viable data platform, customers have
told us they want to store ALL DATA in one place and interact with
it in MULTIPLE WAYS
Simultaneously & with predictable levels of service
Page 55
HDFS
(Redundant,
Reliable
Storage)

BATCH
INTERACTIVE
STREAMING
GRAPH
IN-‐MEMORY
HPC
MPI
ONLINE
OTHER

YARN: Taking Hadoop Beyond Batch
•  Created to manage resource needs across all uses
•  Ensures predictable performance & QoS for all apps
•  Enables apps to run “IN” Hadoop rather than “ON”
– Key to leveraging all other common services of the Hadoop platform:
security, data lifecycle management, etc.
Page 56
ApplicaDons
Run
NaDvely
IN
Hadoop

HDFS2
(Redundant,
Reliable
Storage)

YARN
(Cluster
Resource
Management)

BATCH

(MapReduce)

INTERACTIVE

(Tez)

STREAMING

(Storm,
S4,…)

GRAPH

(Giraph)

IN-‐MEMORY

(Spark)

HPC
MPI

(OpenMPI)

ONLINE

(HBase)

OTHER

(Search)

(Weave…)

HOYA Architecture
Page 57

Key HOYA Design Goals
1.  Create on-demand HBase clusters
2.  Maintain multiple HBase cluster configurations and
implement them as required (i.e. high-load
scenarios)
3.  Isolation – Sandbox clusters running different
versions of HBase or with different coprocessors
4.  Create transient HBase clusters for MapReduce or
other processing
5.  Elasticity of clusters for analytics, data-ingest,
project-based work
6.  Leverage the scheduling in YARN to ensure HBase
can be a good Hadoop cluster tenant
Page 58

Page 59
Time to call it an
evening. We all have
important work to
do…

Thank you….
Page 60
hbaseinaction.com
For more information, check out
HBase: The Definitive Guide
Or
HBase in Action

HBase Technical Deep Dive: Anatomy of a RegionServer

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie HBase Technical Deep Dive: Anatomy of a RegionServer

Ähnlich wie HBase Technical Deep Dive: Anatomy of a RegionServer (20)

Mehr von Adam Muise

Mehr von Adam Muise (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

HBase Technical Deep Dive: Anatomy of a RegionServer