This session led by Michael Donnelly will teach you how to take your Splunk deployment to the next level. Learn about Splunk high availability architectures with Splunk Search Head Clustering and Index Replication. Additionally, learn how to manage your deployment with Splunk’s operational and management controls to manage Splunk capacity and end user experience
2. 2
Legal Notices
During the course of this presentation, we may make forward-looking statements regarding future
events or the expected performance of the company. We caution you that such statements reflect our
current expectations and estimates based on factors currently known to us and that actual events or
results could differ materially. For important factors that may cause actual results to differ from those
contained in our forward-looking statements, please review our filings with the SEC. The forward-
looking statements made in this presentation are being made as of the time and date of its live
presentation. If reviewed after its live presentation, this presentation may not contain current or
accurate information. We do not assume any obligation to update any forward-looking statements
we may make. In addition, any information about our roadmap outlines our general product direction
and is subject to change at any time without notice. It is for informational purposes only and shall
not be incorporated into any contract or other commitment. Splunk undertakes no obligation either
to develop the features or functionality described or to include any such feature or functionality in a
future release.
2
3. 3
Splunk at the Next Level
Time to move beyond initial Splunk environment
• More use cases – how to tackle?
• More data – how do we scale?
• Splunk is mission critical == HA
• Global deployments
• Improving Splunk user experience Screenshot here
4. 4
Growing your Splunk Deployment
Many customers start with a single use case…
• Ex: Monitor the web servers
• Help ensure up-time & response times
• Track usage, errors
• Provides business value
5. 5
Growing your Splunk Deployment
Value statement for each overall service
Your services exist in a larger context than just one app, or one tier.
What is the value of the service as a whole?
What are CIO commitments for the service?
• The organization’s web site is one of the most critical parts of the business.
• Performance of the overall environment must be maintained at all times.
• Failures in any portion of the web site must be quickly identified, send
notification to the appropriate parties.
• Dependencies on external processes must be monitored as well.
6. 6
Growing your Splunk Deployment
The larger context
• Failure in one system cascades
• Map dependencies, estimate costs
• Use Splunk to track all dependencies.
• What happens when it is down?
Dependencies often include:
• Networking dependencies
• Shared storage
• Databases, middleware, custom apps
• Virtualization layer
Screenshot here
7. 7
Scales to Hundreds of TBs/Day
Enterprise-Class Scale, Resilience and Interoperability
Send data from thousands of servers using any combination of Splunk Forwarders
Auto load-balanced forwarding to Splunk Indexers
Offload search load to Splunk Search Heads
8. Visibility Across Datacenters
Distributed search unifies the view
across locations
Role-based access controls how far a given
user's search will span
New York Tokyo
London Cloud
9. 9
Product Roles
Searching and Reporting (Search Head)
Indexing and Search Services (Indexer)
Data Collection and Forwarding (Forwarder)
Indexer Cluster Master, SHC Deployer
Distributed Management / Deployment Server
License Master, Distributed Mgmt Console
Databases
Networks
Servers
Virtual
Machines
Smart
phones
and
Devices
Custom
Applications
Security
WebServer
Sensors
12. 12
Splunk Universal Forwarder
Why use the UF over other methods?
Collect syslog / event log / custom application logs
Collect configuration files, registry settings
Collect data NOT in log files: scripted inputs on current state
Collect wire data – Splunk Stream
Faster, Lower overhead than “agentless” polling
Centrally administered
… and
13. 13
Forwarder Load Balancing
Have UF balance across multiple indexers
Load Balance
– Multiple hosts in outputs
– DNS round robin
– LB not needed!
Geography-based routing
Optional SSL encryption
Compressed 10 to 1
14. 14
Deployment Server
Central management of Splunk Forwarders
Deployment Server manages Apps, Configs
Select one or more classes for each host
Class defines apps & configs
Works by phone-home
Notes:
DS does not push forwarder binaries
Use Cluster Master to manage indexers in cluster, not DS
15. 15
Forwarding Tier Design Best Practices
15
• Use a Syslog Server for Syslog data
• Deployment server (on a VM) for central management
• Let AutoLB distribute data across available indexers
• May need to increase UF throughput setting for high velocity sources
– Enable forceTimebasedAutoLB (for more even distribution)
– maxKBps (to adjust throttling)
Questions?
17. 17
Indexers
Dedicated indexers serve three primary roles:
Data Storage
Processing and parsing at index-time
Indexing
Data Management
Hot / warm / cold data rotation
Aging and removal
Data Retrieval
Perform search upon request, return data to search heads
18. 18
Scaling - Indexers
Sizing for index performance
Indexers are usually storage-bound
Indexers: 150 to 250 GB per day, each. (With reference HW.)
Ref HW: 12 cores (2 GHz+), 12 GB RAM, 800+ IOPs
Optimal HW (normal disk): 16 CPU cores, 48 GB RAM
Optimal HW (SSD): 24 CPU cores, 132 GB RAM
Questions?
19. 19
Tiered Storage
• Splunk supports tiered storage
• Hot / Warm buckets – put on fastest disk
• Size Hot/Warm for normal saved search durations. (7d, 30d)
• Use slower / cheaper storage (NAS?) for long term access
• Optional: Use Frozen to roll data to glacier, Hadoop, etc.
21. 21
Scaling - Storage
Manual storage calculation
Raw data rate net compression of ~ 50% on disk.
Simple: rate * compression * retention / #indexers
Hot / warm requirements
– 200 GB / day * 50% * 30 days = 3TB per indexer
Cold storage requirements
– 200 GB / day * 50% * 335 days = 33.5TB per indexer
Clustering
– Changes storage story completely
22. 22
Scaling - Storage
One example of good local storage
A well configured indexer using local storage might look like:
• SSDs in RAID 5, sized for 14 days of storage
• SATA drives in RAID 5, sized for 6 months of storage
SSDs: RAID 5 provides decent performance
Spinning disks:
• Hot/Warm, RAID 1+0, 800 IOPS or faster
• Cold – RAID 5 with proper block / stripe sizing
25. 25
Delivers Mission-Critical Availability
• Data replication – maintain
searchability even if servers
go down
• Multi-site capable –
maintain searchability even
if a site goes down
• Search Affinity – optimized
searches by fetching from
the closest/fastest location
REPLICATION
Portland
Datacenter
New York
Datacenter
Clustering
26. 26
Indexer Clustering
High-Availability, Out of the Box
Splunk indexer clustering
Active-Active= better performance
Specific terms:
– Master Node / Master Cluster Node
– Peer Node
– Search Factor
– Replication Factor
Additional details: Splunk Docs, Distributed Deployment Manual
28. 28
How Clustering Affects Sizing
• Increased storage:
– 15% of raw usage for every replica copy
– 35% MORE to make that searchable
• Increased processing
– Incoming data to indexer is streamed to indexing peers to satisfy required
number of copies
• More hosts
– Need “replication factor” + 2 (search head, cluster master)
2
34. 34
SHP vs SHC
Search Head Clustering
Seach Head Pooling
• Available since v4.2
• Sharing configurations through NFS
• Single point of failure
• Performance issues
• No shared storage requirement
• Replication using local storage
• Commodity hardware
• OSes: Linux or Solaris
NFS
35. 35
Search Head Clustering
1. Group search heads into a cluster
2. A captain gets elected dynamically
3. User created reports/dashboards automatically replicated
to other search heads
37. 37
Search Tier Design Best Practices
37
• Minimum 3 nodes required
• ES will still require a Separate Search Head or dedicated SHC
• Use LDAP/AD/SSO for user Authentication
• Load Balancer configured for sticky sessions
• Must use deployer to push apps to search heads
• Confirm your applications’ support for SHC!
Questions?
38. 38
Search Head Clustering
Use “Captain” instead of “Master” to avoid confusion with Index-
Clustering
Minimum 3 nodes required.
Cluster takes certain key decisions based on *majority* (consensus)
In multi-site setup have more nodes in main datacenter
40. 40
Load Balancer
Search Head Cluster, Deployer
Clustered Peer Node + Cluster master
Deployment server
Universal Forwarders on Servers
Syslog, NetFlow data
HFs for scheduled polling via API
40
41. 41
Hybrid Approach for rollout
41
• Add the existing Splunk
instance as a search peer
until the data retention
period has expired
• Disable scheduled searches
on the old instance
• Migrate any Summary Index
data to new Indexers
45. 45
Top 5 things to Remember
45
• Indexers: Storage requirements, IOPS, RAID config
• Indexer clustering: HA, DR, and site affinity!
• SHC: Minimum buy-in for a SHC is 3
• When in doubt – add another Indexer
• Excellent VM candidates:
– Master Cluster Node (Indexer clustering)
– Deployer (Search head clustering)
– Deployment Server (Central Forwarder management)
– License Master
– Distributed Management Console
By allowing Splunk Enterprise to be split into multiple roles, any portion of Splunk can be scaled as needed.
Customers are using Splunk to index hundreds of TB/s a day and search over petabytes of data. Splunk can take a single search and query as many indexers as are needed to complete the job, allowing you to use inexpensive commodity hardware in massively parallel clusters.
Besides achieve massive scale, splitting the roles enabled user to meet location and data segmentation requirements.
Searches can be distributed from a single search head to any number of indexers. These indexers can all be local for massive parallelization for Big Data problems, or spread across a global enterprise to help you keep data wherever makes the most sense for your network, availability, and security requirements.
Splunk Enterprise can be deployed on premise, in the cloud, or a combination of both.
There is also an Amazon Machine Image available or if you don’t want to host or administer Splunk, it can be managed as a service by our experts using “Splunk Cloud”.
These are multiple logical roles, a Splunk instance can be one or more of the roles.
The search head is what most users interact with. It is the webserver and app interpreting engine that provides the primary, web-based user interface. Since most of the data interpretation happens as-needed at search time, the role of the search head is to translate user and app requests into actionable searches for it’s indexer(s) and display the results. The Splunk web UI is highly customizable, either through our own view and app system, or by embedding Splunk searches in your own web apps or our API. Additional search heads can be deployed to scale with user or search load.
The core of the Splunk infrastructure is indexing. An indexer does two things – it accepts and processes new data, adding it to the index and compressing it on disk. The indexer also services search requests, looking through the data it has via it’s indices and returning the appropriate results to the searcher over a secure compressed communication channel. Indexers scale out almost limitlessly and with almost no degradation in overall performance, allowing Splunk to scale from single-instance small deployments to truly massive Big Data challenges.
The Splunk forwarder is an optional component that can be installed to forward data from servers, desktops, mainframes, and even ARM based devices. There are two types of forwarders; the full Splunk distribution or a dedicated “Universal Forwarder”. The full Splunk distribution can be configured to filter data before transmitting, execute scripts locally, or run SplunkWeb. This gives you several options depending on the footprint size your endpoints can tolerate. The universal forwarder is an ultra-lightweight agent designed to collect data in the smallest possible footprint. Both flavors of forwarder come with automatic load balancing, SSL encryption and data compression, and the ability to route data to multiple Splunk instances or third party systems.
The Cluster Master coordinates which indexers have copies of which buckets to ensure we have met the proper number of replication and searchable copies of each bucket. All clustered Indexers check in with the Master to alert them of their status, and the status of each of their replicated indexes and buckets. It also manages the apps and configurations on clustered indexers. We will talk more about buckets later.
The Deployment Server can be used to manage your Splunk forwarders, for centrally managed data collection. More on this to come.
Listed for completeness are the license master and DMC roles, these typically coexist with other roles such as the Deployment server.
This slide shows the way we used to calculate data storage requirements.
This app makes it far easier to size the environment’s storage requirements. And it includes clustering configurations, which we’ll talk about in a sec.
Splunk’s clustering technology allows you to choose how many raw copies and searchable copies of your data you would like to keep. It also allows you to chose which indexers you want to store the copies on. This capability allows servers or even datacenters to go down without losing the ability to access the data.
In addition, the search affinity capability allows users to fetch data from the closest or fastest location where there is a copy of the data which can not only save the time it takes to do a search but bandwidth by eliminating the need to use the WAN when there is a local copy.
Default 3/2 cluster uses 3*.15 + 2*.35 = 115% of license usage for that redudancy
Processing : a little more CPU and more network
this is much better in current versions, the indexed data (tsidx, etc) is streamed to the replica peer, rather than forcing the peer to re-index.
With Search factor / rep factor variables in the mix - what had been simple without clustering now becomes more challenging.
Demo sizing calculator if time allows. Hot/Warm vs. Cold, in different RAID configurations. Sample indexes.conf is generated too.
As discussed – default parameters require *more than* original log size
Uniform user experience among pooled search heads
No single point of failure
Search job failure aware
Does not require external storage such as NFS
Note the Deployer on this image. Deployer virtualizes very well. The deployer pushes apps & configurations to the search head cluster members.
Putting it all together
What’s the best way to roll out some of these features? It depends on customer environment. But one common method is shown here.