2. Abstract
We have a lot of data in our databases but quite often users don't get the full benefit
of these data as they don't have good tools how to analyze these data. SQL
language is good for doing ad-hoc queries but it becomes very complicated when you
need to make more complex analytical queries to get summary results. And also new
NoSQL databases are focusing more on effective processing of detailed records and
not on analytical processing.
There is a range of OLAP (On-Line Analytical Processing) databases and engines
that are focused on making easier multi-dimensional analysis of your data at different
summary levels. One of most-popular open-source OLAP engines is Mondrian
(mondrian.pentaho.com) which can be put in front of your relational SQL database
but it provides MDX multi-dimensional query language which is much more suited for
analytical purposes.
mondrian-olap gem (soon to be released) integrates Mondian OLAP engine using
JRuby Java integration and provides Ruby DSL for creating OLAP schemas on top of
relational database schemas and provides MDX query language or ActiveRecord/
Arel-like query language for making analytical queries. It will be presented how to use
it for new or existing Ruby on Rails applications and how to do much easier data
analysis compared to standard ActiveRecord queries.
4. SQL query like this
SELECT SUM(sales.unit_sales) unit_sales_sum,
SUM(sales.store_sales) store_sales_sum
FROM sales
LEFT JOIN product ON sales.product_id = product.product_id
LEFT JOIN product_class
ON product.product_class_id = product_class.product_class_id
LEFT JOIN time_by_day ON sales.time_id = time_by_day.time_id
LEFT JOIN customer ON sales.customer_id = customer.customer_id
WHERE time_by_day.the_year = 2011 AND time_by_day.quarter = 'Q1'
AND customer.country = 'USA' AND customer.state_province = 'CA'
GROUP BY product_class.product_family
5. Could be written in
MDX like this
SELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} ON COLUMNS,
[Product].children ON ROWS
FROM [Sales]
WHERE ([Time].[2011].[Q1], [Customers].[USA].[CA])
6. Or in Ruby like this
olap.from('Sales').
columns('[Measures].[Unit Sales]', '[Measures].[Store Sales]').
rows('[Product].children').
where('[Time].[2011].[Q1]', '[Customers].[USA].[CA]')
Get sales amount and number of units of all
products sold in California during Q1 of 2011
7. More complex
queries
olap.from('Sales').
with_member('[Measures].[ProfitPct]').
as('Val((Measures.[Store Sales] - Measures.[Store Cost]) / Measures.[Store Sales])',
:format_string => 'Percent').
columns('[Measures].[Store Sales]', '[Measures].[ProfitPct]').
rows('[Product].children').crossjoin('[Customers].[Canada]', '[Customers].[USA]').
top_count(50, '[Measures].[Store Sales]')
where('[Time].[2011].[Q1]')
Get sales amount and profit % of top 50
products cross-joined with USA and Canada
country sales during Q1 of 2011