Rackspace’s Enterprise Business Intelligence group (EBI) was looking for a cost-effective way to support the reporting and information needs of its internal users, which include business and operations personnel. It was also looking to scale out new infrastructure in order to meet their increasing business demands, house increasing amounts of data, and customize the collection of data, while seeking a way to move away from their legacy Data Warehouse solution. To do this, Rackspace built the Analytical Compute Grid (ACG) by using Hadoop, Cassandra and PostgreSQL with an OpenStack cloud. Read more about it in this presentation.
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Rackspace Analytical Compute Grid (ACG)
1. Big Data on Open Cloud
Analytical Compute Grid (ACG)
Elastic “Big Data” Infrastructure
by Natasha Gajic
March 1, 2013
2. Rackspace’s EBI Environment
Current Environment “Big Data” Problem
Windows and Linux Cost of purchasing
operating systems additional licenses
Oracle and Microsoft Time required to set up
databases solutions new hardware
Microsoft and Oracle Increased demand for DBA
replication technology resources
SSIS System performance
Informatica System scalability
Dedicated servers Capacity
Rapid data set growth
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
2
3. Analytical Compute Grid (ACG) Features
• Host ever growing set of data
• Quick data collection and retrieval
• Rapid scalability
• Ease of maintenance
• Provide standard data access API
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
3
4. Analytical Compute Grid (ACG) Features
• Ability to provide variety of storage types:
• Columnar
• Relational
• HDFS
• Enable users to select optimal storage
type for information collected
• Leverage Rackspace® Private Cloud
powered by OpenStack® and open
source technology
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
4
10. ACG on Rackspace® Private Cloud powered by OpenStack®
Node
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
10
11. ACG on Rackspace® Private Cloud powered by OpenStack®
Node
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
11
12. ACG on Rackspace® Private Cloud powered by OpenStack®
Node
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
12
13. ACG on Rackspace® Private Cloud powered by OpenStack®
Node
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
13
14. ACG on Rackspace® Private Cloud powered by OpenStack®
Controller
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
14
15. ACG on Rackspace® Private Cloud powered by OpenStack®
Controller
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
15
16. ACG on Rackspace® Private Cloud powered by OpenStack®
Controller
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
16
17. ACG on Rackspace® Private Cloud powered by OpenStack®
API
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
17
18. ACG on Rackspace® Private
Cloud powered by OpenStack®
Indexing Structure
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
18
19. ACG on Rackspace® Private Cloud powered by OpenStack®
Indexing Structure
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
19
20. ACG on Rackspace® Private Cloud powered by OpenStack®
Indexing Structure
What is ACG Indexing
Structure?
• System entry point
• Set of pointers ultimately
addressing database
entities
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
20
21. ACG on Rackspace® Private Cloud powered by OpenStack®
Indexing Structure
What is ACG Indexing
Structure?
• System entry point
• Set of pointers ultimately
addressing database
entities
Where is Indexing Structure
Located?
• It is a part of ACG so it
resides on Open Cloud
• ACG Controller manages
Indexing Structure
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
21
22. ACG on Rackspace® Private Cloud powered by OpenStack®
Indexing Structure
What ACG Indexing Structure
Enables?
• Splitting of large data sets
across many instances
• Query parallelization
• Controlled data store size
• Optimal data store
configuration
• Uniform access to data
residing in various storage
types
• System scalability as it
expands horizontally and
vertically to address ever
growing data set
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
22
23. ACG on Rackspace® Private
Cloud powered by OpenStack®
Quality Attributes
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
23
24. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes - Performance
Rackspace® Private Cloud
powered by OpenStack®
Creates ACG node in 30 seconds
Creates ACG nodes concurrently
Re-size ACG nodes adding CPUs
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
24
25. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes - Performance
Rackspace® Private Cloud
powered by OpenStack®
Creates ACG node in 30 seconds
Creates ACG nodes concurrently
Re-size ACG nodes adding CPUs
ACG
Indexing structure and controlled
data set size allow for:
Quick data distribution
Query parallelization
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
25
26. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Availability
Rackspace® Private Cloud
powered by OpenStack®
Rapidly replace failed ACG nodes
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
26
27. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Availability
Rackspace® Private Cloud
powered by OpenStack®
Rapidly replace failed ACG nodes
ACG
Deploys data store native
availability mechanisms
(replication, data distribution…)
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
27
28. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Maintainability
Rackspace® Private Cloud
powered by OpenStack®
Adding ACG nodes expands:
Storage capacity
CPU power
Memory
No DBA or system administrators
activity required
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
28
29. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Maintainability
Rackspace® Private Cloud
powered by OpenStack®
Adding ACG nodes expands:
Storage capacity
CPU power
RAM
No DBA or system administrators
activity required
ACG
Controlled data set size enables:
Optimal and stable data store
configuration
Reducing demand for managing
data store objects
Stable query execution plans
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
29
30. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Flexibility
ACG
Variety of storage types:
Columnar – Cassandra : time series
data
Relational – PostgreSQL : relational data
HDFS – Hadoop : un-structured data
Ability to select optimal storage type
for individual use case
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
30
31. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Usability
ACG
Standard interfaces:
SQL language
JDBC API
ODBC
ACG Management Console
ACG Monitoring Console
Loader utility implementing:
Bulk Loader
Insert Loader
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
31
32. ACG on Rackspace® Private
Cloud powered by OpenStack®
Current State
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
32
33. ACG on Rackspace® Private Cloud powered by OpenStack®
Current State
Columnar Relational HDFS
ACG Controller
Implementation Implementation Implementation
• ACG Manager • Data Store • Data Store • Will start soon
• Rule Engine Controller Controller
• Node • JDBC • JDBC driver
Manager extended to extended with
• ACG work with distributed
Management supercolumn query rewrite
Console • Loader • Loader
• ACG integrated integrated
Monitoring with with
Informatica Informatica
• ODBC (In
Progress)
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
33
34. ACG on Rackspace® Private
Cloud powered by OpenStack®
Rackspace Use Case
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
34
35. ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case
• Subject:
• Complex availability calculation sourcing 3
months of monitoring data and creating 1 billion
records in initial calculation
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
35
36. ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case
• Environment 1
• Data Warehouse Microsoft SQL server database
• SSIS data loading
• SQL server with 24 CPUs and 250GB RAM was
dedicated to the initial calculation
• SQL server stored procedure performed the
calculation
• Source and result are stored in traditional data
warehouse structure
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
36
37. ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case
• Environment 2
• ACG running two Cassandra clusters 4 nodes
each
• Informatica with Cassandra bulk loader
• Each ACG node has 2CPUs and 8GB RAM
• Java program running on instance with 4CPUs
and 8GB RAM
• Source and result are stored in columnar
structure suitable for time series data
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
37
38. ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case - Result
• Calculation Duration
•Microsoft SQL Server lasted 5 days
•ACG calculation completed in 3.5 hours
• Storage Size
• Microsoft SQL server 500GB
•ACG 20 GB
• Complexity of the calculation
•Columnar data store is optimal for time series data.
Sourcing from columnar data store resulted in relatively
simple Java calculation process comparing to SQL
server stored procedure
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
38
39. ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case - Conclusion
• Selecting optimal data store for use case resulted in:
• Substantial performance improvement
• Reduced storage demand
•Simplified processes
•Ability to process terabytes of data per day close to
real-time and on-demand
•Improved trending and reporting:
• enhances support capabilities
• improved Rackspace customer experience
• Significant cost reduction
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
39