What's new in Ambari

© Hortonworks Inc. 2011 – 2015. All Rights Reserved
What’s New in Ambari?
June 2015
Yusaku Sako @ Hortonworks (Ambari PMC Chair)
Sumit Mohanty @ Hortonworks (Ambari PMC)

What’s Apache Ambari?
100% open-source
platform for simplifying
Hadoop cluster
management and use.
Highly extensible.

Open Source Activity

Inception: AMBARI-1 (Sept, 2011)

Fast forward 4 years to today… (June, 2015)
• Latest JIRA: AMBARI-11854
• 100+ Contributors
• 50 Committers
• ~12k JIRAs filed
• ~11k JIRAs resolved
At 1.5 day per JIRA -> 66 person years! (probably more)
• Used by hundreds of companies

Ambari – 4th Biggest Project* @ Apache
* Based on total JIRAs filed on a project basis out of 162 projects as of June 10, 2015
#2: Hadoop at ~28k as it is split across multiple JIRA Projects
#1
#3
#4
#5

Timeline: Past 1 Year
Ambari 1.6.*
May 2014
907 JIRAs
Ambari 1.5.*
Apr 2014
1218 JIRAs
Ambari 1.7.*
Dec 2014
1620 JIRAs
Ambari 2.0.*
April 2015
1784 JIRAs
Current GA Version (2.0.1)
Ambari 2.1
Coming Soon
1520+ JIRAs
Focus of today’s talk
Resolution of 7k+ JIRAs

What’s new?
• Rolling Upgrade
• Alerts
• Metrics
• Enhanced Dashboard
• Smart Configurations
• Views
• Kerberos Automation
• Blueprints

Rolling Upgrade

Rolling Upgrade of Stack
Side-by-Side Bits and Configs
Bits:
/usr/hdp/2.2.0.0-2041
/usr/hdp/2.2.4.2-2
/usr/hdp/2.3.0.0-3000
Configs:
/etc/hive/conf/ (initial)
/etc/hive/conf/v0 (HDP 2.2.4.2)
/etc/hive/conf/v1 (HDP 2.3)
2.2.0.0 2.2.4.2 2.3.0.0minor jump major jump

Rolling Upgrade – Manage Versions
Install bits in parallel on all agents
No down-time

Rolling Upgrade – Orchestration
Not necessarily “one-click” but fully guided
Services are up the entire time
Upgrade one component at a time
Robust and fault-tolerant
Service-checks performed throughout
2.3.0.0-2283 2.3.0.0-2283

Rolling Upgrade – Upgrade Catalog
Grouping and order

Run custom scripts (python and Server-side)

Mark steps are skippable, retryable
All service checks are skippable, all steps retryable

Set, move, delete, transform configurations

Rolling Upgrade – Downgrade

Alerts

Alert – Types
Type Description Status
Thresholds
Configurable?
PORT
Watches a port based on a configuration property such as
the URI.
OK, WARN, CRIT Yes (seconds)
WEB
Watches an HTTP or HTTPS endpoint and determines
connectivity and HTTP status code.
OK, WARN, CRIT No
AGGREGATE Aggregate of status for another alert definition. OK, WARN, CRIT Yes (percentage)
METRIC
Watches a metric or series of metrics in JMX and compares
a mathematical result against a threshold.
OK, WARN, CRIT Yes (variable)
SCRIPT Uses a custom script to handle checking. OK or CRIT No

UI – Current Alerts
Configured by default; managed via the the web client

UI – Host Alerts
Automatically refreshes
Query alert history

UI– Customization & Instances
Status text, thresholds, and interval

Metrics

Ambari Metrics Service (AMS) - Goals
Ability to collect metrics from Hadoop and other Stack services
Ability to retain metrics at a high precision for a configurable time period
Ability to automatically purge metrics after retention period
At collection time, provide clear integration point for external system
At purge time, provide clear integration point for metrics retention by
external system
Should provide default options for external metrics retention
Provide tools / utilities for analyzing metrics in retention system

Aggregators
Metrics Collector
HTTP REST endpoint
Metrics API
Query Layer
HBASE
Phoenix server
Phoenix client
Namenode
Datanode
Nodemanager
Regionserver
Nimbus
Flume Agent
Kafka worker
Metrics Sinks Metrics Monitor
AMBARI
DashboardsViews REST API
Ambari Metrics System - Architecture

Sample Stats
Total number of raw uncompressed Hadoop metrics written per day on a 300 node cluster =
100 GB
Rows in Phoenix table ~ 100 million
Raw query time: 500 rows selected (1.923 seconds)
Aggregate query time: 204 rows selected (0.19 seconds)
SELECT METRIC_NAME, APP_ID, INSTANCE_ID, TIMESTAMP, METRIC_SUM, HOSTS_COUNT, METRIC_MAX,
METRIC_MIN FROM METRIC_AGGREGATE WHERE METRIC_NAME IN ('dfs.datanode.BytesWritten',
'dfs.datanode.BytesRead') AND APP_ID = 'datanode' AND TIMESTAMP >= 1409770831000 and TIMESTAMP <
1409774431000;
SELECT METRIC_NAME, HOSTNAME, APP_ID, INSTANCE_ID, START_TIME, METRICS FROM METRIC_RECORD
WHERE METRIC_NAME IN ('dfs.datanode.BytesWritten','dfs.datanode.BytesRead') AND APP_ID = 'datanode'
AND START_TIME >= 1409770831000 AND START_TIME < 1409774431000 ORDER BY METRIC_NAME,
START_TIME LIMIT 500;

Key takeaways
Using Phoenix query hints to avoid full table scans
PHOENIX-914 – Use Native Hbase timestamp to skip HFiles
Client side buffering and aggregation built into Sinks and Monitor
Cluster and Host level aggregations across various time dimensions
Table schema optimized for reads and Hbase tuned to support heavy write
loads

Enhanced Dashboard

Customizable Service Dashboards
Service dashboards are now customizable in Ambari 2.1
• Create new widgets
• Graphs, Dial Gauge, Number, Template
• Customize layout
• Share widgets
Future:
• Make Layouts shareable

Recorded Demo

Easy to expose widgets for new services
Out-of-the-box widgets are defined in the stack as JSON files
No frontend code changes necessary

Smart Configurations

Hadoop Configuration Challenges
• Too many configurations
• which ones are important?
• Too easy to mess up
• What are valid/reasonable values?
• What are the units?
• Ok, what about dependencies?
• Gets harder with combinations of services, host assignments, enabled
features, CPU/RAM/disks, etc
• Any recommendations? What am I doing wrong?
• Smart Configs to the rescue! (Ambari 2.1)

Smart Configs Demo

Ambari Smart Configs UI
Customizable layout
- Tabs
- Sections
- Sub-sections
- Simple grid layout
(Advanced Tab contains
remaining configurations)
New Widgets
- Sliders
- Recommended
- Minimum
- Maximum
- Increment Step
- Combos
- Enumerated values
- Toggles
- Binary options
- Spinners
- Splits value into multiple
controls. Time in
milliseconds split into days,
hours, minutes.
- Lists
- Enumerated values
- Single select
- Multi select
Implemented
- HDFS
- YARN
- MapReduce
- Hive
- HBase

Stack Driven Layouts
{
"name": "default",
"description": "Default theme for HBASE service",
"configuration": {
"layouts": [
{
"name": "default",
"tabs": [
{
"name": "settings",
"display-name": "Settings",
"layout": {
"tab-columns": "3",
"tab-rows": "3",
"sections": [
...
]
}
}
]
}
],
"placement": {
"configuration-layout": "default",
"configs": [...]
},
"widgets": [
{
"config": "hbase-env/hbase_master_heapsize",
"widget": {
"type": "slider",
"units": [
{
"unit-name": "GB"
}
]
}
},
...
]
}
}
• Stacks has theme.json file
• Layout
– Tabs
– Sections
– Sub-sections
• Placement
– Configs placement in sub-sections
• Widgets
– Widget type
– Optional Units
- Bytes (B, KB, MB, GB, TB, PB)
- Time (Millis, Seconds, Minutes, Hours, Days, Months, Years)

Config Metadata and Dependencies
{
"StackConfigurations": {
"final": "false",
"property_depends_on": [
{
"type": "yarn-site",
"name": "yarn.nodemanager.resource.memory-mb"
}
],
"property_description": “The minimum allocation for every",
"property_display_name": "Minimum Container Size (Memory)",
"property_name": "yarn.scheduler.minimum-allocation-mb",
"property_type": [],
"property_value": "512",
"property_value_attributes": {
"type": "int",
"maximum": "5120",
"minimum": "0",
"unit": "MB",
"increment_step": "256"
},
"type": "yarn-site.xml"
},
"dependencies": [
{
"StackConfigurationDependency": {
"dependency_name": "hive.tez.container.size",
"property_name": "yarn.scheduler.minimum-allocation-mb”
}
},
{
"dependency_name": "mapreduce.map.memory.mb",
}
},
{
"dependency_name": "mapreduce.reduce.memory.mb",
}
}…
]
}
• Extended Metadata
• Defined in property_value_attributes
• Hold non-UI metadata about value range, increment,
unit, etc
• Dependencies
• Models bi-directional relationship between configs
• Depends On (property_depends_on)
– Answers ‘which configs do I depend on?’
• Depended By (dependencies)
– Answers ‘which configs are dependent on me?’
• Ambari automatically updates dependencies

Views

What are Views?
View Framework
• Provide various applications accessible from Ambari Web UI – interact
with the cluster via a browser from a single place for all users (cluster
operators, data analysis, developers, etc)
Easy to develop
• No need to understand Ambari core code – view development is just
like creating any other web application
Easy to deploy
• Packaged as a single jar file
• Auto create / auto configure

CS Queue Manager for Cluster Operators
Capacity Scheduler
Queue Manager

HDFS File Browser for General Users
HDFS File Browser

Job Analysis for Developers
Troubleshoot Tez JobsTroubleshoot / Improve Hive queries

Query Editors for Data Analysts
Create, edit, execute, and analyze Hive queries Create, edit, and execute Pig scripts

Ambari Server in Views-Only mode
• Use Views on existing clusters not under Ambari’s management
• Can use Views against multiple clusters
Ambari
Server
Cluster managed by Ambari
Ambari
Server “Views-only” mode
(aka “Stand-alone” mode)
Cluster not managed by Ambari
Management
Use Views
Use Views
Use Views

Kerberos Automation

Kerberos Automation
New in Ambari 2.0:
• Have Ambari manage Kerberos principals and keytabs
• Once Kerberized, seamlessly handle:
• Adding new hosts
• Adding new components to existing hosts
• Adding new services
• Moving components to different hosts
• Works with existing MIT KDC or Active Directory

Blueprints

Automated Cluster Deployment
Simple
• Making two REST calls is all it takes to provision a cluster
Who uses it?
• Cloudbreak
• Microsoft Azure Marketplace (portal.azure.com)
• Hortonworks QA
• and many many others

Cluster Replication
Export blueprint of source cluster
Import blueprint to replicate clusters

Example: Create a 100-node Cluster
{
"configurations" : [
{
”hdfs-site" : {
"dfs.datanode.data.dir" : ”/hadoop/1,/hadoop/2,/hadoop/3"
}
}
],
"host_groups" : [
{
"name" : ”master-host",
"components" : [
{ "name" : "NAMENODE” },
{ "name" : "RESOURCEMANAGER” },
…
],
"cardinality" : "1"
},
{
"name" : ”worker-host",
"components" : [
{ "name" : ”DATANODE” },
{ "name" : ”NODEMANAGER” },
…
],
"cardinality" : "1+"
},
],
"Blueprints" : {
"blueprint_name" : ”multi-node-hdfs-yarn",
"stack_name" : "HDP",
"stack_version" : "2.0"
}
}
{
"blueprint" : ”multi-node-hdfs-yarn",
"host_groups" :[
{
"name" : ”master-host",
"hosts" : [
{
"fqdn" : ”master001.ambari.apache.org”
}
]
},
{
"name" : ”worker-host",
"hosts" : [
{
"fqdn" : ”worker001.ambari.apache.org”
},
{
},
…
{
}
]
}
]
}
1. POST /blueprints/my-blueprint 2. POST /clusters/MyCluster

What’s New in Blueprint
New in Ambari 2.0:
• Supports various HA configurations:
• NameNode, ResourceManager, RegionServer, Oozie Server, Hive Metastore, HiveServer2, WebHCat
Server
• Adding hosts “blueprint style” (AMBARI-8458)
New in Ambari 2.1:
• Advanced Cluster Creation – more flexible, scalable, and robust
(AMBARI-10750)

Blueprint Caveats
As of Ambari 2.1:
• Stack advisor (smart/dynamic config recommendation/validation) is not
used (yet)
• Can’t create Kerberized Cluster (yet)

Thank You!
Try Ambari
• Follow the Ambari Quick Start Guide (search online for “Ambari Quick Start Guide”)
Learn more
• Visit the project website (search online for “Ambari”)
Get Involved
• User Mailing List: user-subscribe@ambari.apache.org
• Developer Mailing List: dev-subscribe@ambari.apache.org
• Use JIRA to file bugs and improvement requests (search online for “Ambari JIRA”)

Q & A
Yusaku Sako yusaku@hortonworks.com (Ambari PMC)
Sumit Mohanty smohanty@hortonworks.com (Ambari PMC)

What's new in Ambari

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie What's new in Ambari

Ähnlich wie What's new in Ambari (20)

Mehr von DataWorks Summit

Mehr von DataWorks Summit (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

What's new in Ambari

Hinweis der Redaktion