Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Open Source Geospatial Business Intelligence (GeoBI): Definition, architectures, projects, challenges and outlooks
1. Open Source Geospatial
Business Intelligence (GeoBI):
Definition, architectures, projects,
challenges and outlooks
Prof. Thierry Badard
3rd OSGIS Conference
Nottingham University, Nottingham, UK – June 21-22, 2011
2. Content of the presentation
● Open data / Data deluge
● BI & Geospatial BI (GeoBI)
● Definition
● Architecture
● Offerings
● Open source GeoBI solutions
● Conclusion & outlooks
12. Nowadays, companies and public bodies …
● Store and face huge amounts of data, that are very
diverse and heterogeneous
● Sales
● Stocks
● Socio-economical data
● Surveys
● …
● Impact assessment of their advertisment campaigns
● Website visits
● Social networks
13. In a world that changes rapidly ...
● Which imposes to take informed and strategic
decisions more and more often and rapidly in order
to keep competitive
● So, they require to analyze these huge amounts of
data rapidly, simply and efficiently
● For that they use more and more some BI
(Business Intelligence) tools
● These tools produce:
● Synthesis reports (PDF, Excel, ...)
● Interactive analytical dashboards
20. Location is everywhere! ;-)
● About 80% of all data stored in corporate
databases has a spatial component [Franklin, 1992]
● Mail address
● Postal/ZIP code
● Coordinates
● IP, phone number, …
● As time, space (location) is a crucial dimension that
should be taken into account when analyzing
corporate data!
21. How to take into account location …
● It requires to get the right tools in order to observe,
handle and explore this type of information in BI
tools ...
● Let us imagine you are HSBC … or ministry for
public safety … or Haiti government ...
● Do you think that cross tables or charts are the
best media to answer these type of questions ?
22. The right medium is obviously
Le bon médium, c'est la carte ! a map!
23. The right medium is obviously
Le bon médium, c'est la carte ! a map!
24. The right medium is obviously
Le bon médium, c'est la carte ! a map!
25. So, GeoBI (Geospatial BI) ...
● Brings :
● The map ...
● and all the tools required to represent, navigate and
analyse the geospatial dimension
● In the very core of BI tools of the market
● In order to fully take into account the spatial
dimension in the corporate data analysis and
decision processes based on these data!
26. BI and geospatial industry
Industry
Business
Geospatial Web Intelligence
2.0
(BI)
GeoBI
3 billions $ in software 8,8 billions $ in software
Size
(total of 15 billions $ IT) (increased of 22% in 2008)
Players
27. Back in 2005 ...
● GeoSOA research team: research in BI, geospatial and mobiliy
● Purposes:
● Propose ubiquitous and mobile decision making tools which make use of the
geospatial dimension (mobile GeoBI),
● And find ways to deliver a business info adapted to the context of the user
(location, environment, situation, pupose, ...)
● But, unfortunately:
● No GeoBI tool were available and enabling:
– The processing of large volumes of information that come from very diverse and
heterogeneous data sources
– The rapid delivery of these data to numerous concurrent users in a
connected/disconnected modes (web services architecture)
– The support of standards dedicated to mobility
– With key GeoBI capabilities and with a rich, comprehensive and extensible API at an
affordable cost
– Relying on a robust and complete BI architecture in order to be able to fulfill all types of
user requirements
28. Classical architecture of a BI infrastructure
●
Transactional databases
●
Web ressources
●
XML, flat files, proprietary file
formats (Excel spreadsheets, …)
●
LDAP
●
…
29. The Data Warehouse: the crucial/central part!
●
Repository of an organization’s historical data,
for analysis purposes.
●
Primarily destined to analysts and decision
makers.
●
Separate from operational (OLTP) systems
(source data)
But
often stored in relational DBMS: Oracle, MSSQL,
PostgreSQL, MySQL, Ingres, …
●
Contents are often presented in a summarized
form (e.g. key performance indicators,
dashboards, OLAP client applications, reports).
Need to define some metrics/measures
30. The Data Warehouse: the crucial/central part!
●
Optimized for:
Large volumes of data (up to terabytes);
Fast response (<10 s) to analytical queries (vs. update speed for
transactional DB):
de-normalized data schemas (e.g. star or snowflake schemas),
Introduces some redundancy to avoid time consuming JOIN queries
all data are stored in the DW across time (no corrections),
summary (aggregate) data at different levels of details and/or time scales,
(multi)dimensional modeling (a dimension per analysis axis).
All data are interrelated according to the analysis axes (OLAP datacube
paradigm)
●
Focus is thus more on the analysis / correlation of large
amount of data than on retrieving/updating a precise set of
data!
●
Specific methods to propagate updates into the DW needed!
31. MDX query language
●
MDX stands for MultiDimensional eXpressions
●
Multidimensional query language
●
De facto standard from Microsoft for SQL Server OLAP Services
(now Analysis Services)
●
Also implemented by other OLAP servers (Essbase, Mondrian) and
clients (Proclarity, Excel PivotTables, Cognos, JPivot, …)
●
MDX is for OLAP data cubes what SQL is for relational databases
●
Looks like a SQL query but relies on a different model (close to the
one used in spreasheets)
●
SELECT
{ [Measures].[Store Sales] } ON COLUMNS,
{ [Date].[2002], [Date].[2003] } ON ROWS
FROM Sales
WHERE ( [Store].[USA].[CA] )
32. Results representation
●
SELECT
{ [Product].[All Products].[Drink],
[Product].[All Products].[Food] } ON COLUMNS,
{ [Store].[All Stores].[USA].[WA].[Yakima].[Store 23],
[Store].[All Stores].[USA].[CA].[Beverly Hills].[Store 6],
[Store].[All Stores].[USA].[OR].[Portland].[Store 11] } ON ROWS
FROM Warehouse
WHERE ([Time].[1997], [Measures].[Units Shipped])
OLAP client software propose:
Alternate representation modes (pie charts, diagrams, etc.)
Different tools to refine queries/explore data
Drill down, roll up, pivot, …
Based on operators provided by MDX
33. Geospatial BI adds maps and spatial analysis!
Require to consistently integrate the geospatial component in all parts of the architecture!
35. Pentaho open source BI software stack
●
Pentaho (http://www.pentaho.org)
Pentaho
Reporting
Kettle Mondrian
Weka
+ CDF: Community Dashboard Framework
+ Other projects: olap4j, JPivot, Halogen, …
36. Towards an open source geospatial BI stack
& integration in various
dashboard and reporting
tools
Spatial S
• PostGIS
• Oracle Spatial
GeoKNIME
Spatial
37. An ETL tool is …
●
A type of software used to populate databases or data
warehouses from heterogeneous data sources.
●
ETL stands for:
− Extract – Extract data from data sources
− Transform – Transformation of data in order to correct errors, make
some data cleansing, change the data structure, make them compliant to
defined standards, etc.
− Load – Load transformed data into the target DBMS
●
An ETL tool should manage the insertion of new data and the
updating of existing data.
●
Should be able to perform transformations from :
− An OLTP system to another OLTP system
− An OLTP system to an analytical data warehouse
38. Why use an ETL tool?
●
Automation of complex and repetitive data processing
without producing any specific code
●
Conversion between various data formats
●
Migration of data from a DBMS to another
●
Data feeding into various DBMS
●
Population of analytical data warehouses for decision
support purposes
●
etc.
39. GeoKettle
●
GeoKettle is a "spatially-enabled" version of Pentaho Data
Integration (Kettle)
●
Kettle is a metadata-driven ETL with direct execution of
transformations
− No intermediate code generation!
●
Support of several DBMS and file formats
− DBMS support: MySQL, PostgreSQL, Oracle, DB2, MS SQL Server, ...
(total of 37)
− Read/write support of various data file formats: text, Excel, Access,
DBF, XML, ...
●
Numerous transformation steps
●
Support of methods for the updating of DW
40. GeoKettle
●
GeoKettle provides a true and consistent integration of the spatial
component
− All steps provided by Kettle are able to deal with geospatial data types
− Some geospatial dedicated steps have been added
●
First release in May 2008: 2.5.2-20080531
●
Current stable version: 3.2.0-r188-20090706
●
To be released shortly: GeoKettle 2.0 with many new features!
●
Released under LGPL at http://www.geokettle.org
●
Used in different organizations and countries:
− Some ministries, bank, insurance, integrators, …
− E.g. GeoETL from Inova is in fact GeoKettle! :-)
●
A growing community of users and developers
41. GeoKettle
●
Transformations vs. Jobs:
− Running in parallel vs. running sequentially
●
All can be stored in a central repository (database)
− But each transformation or job could also be saved in a
simple XML file!
●
Offers different interfaces:
− Spoon: GUI for the edition of transformations and jobs
− Pan: command line interface for running transformations
− Kitchen: command line interface for running jobs
− Carte: Web service for the remote execution of
transformations and jobs
43. GeoKettle
● Version 2.0 provides support for:
− Handling geometry data types (based on JTS)
− Accessing Geometry objects in JavaScript
− It allows the definition of custom transformation steps by the user (“Modified
JavaScript Value” step)
− Topological predicates (Intersects, crosses, etc.) and aggregation operators (envelope,
union, geometry collection, ...)
− SRS definition and transformations
− Input / Output with some spatial DBMS
- Native support for Oracle, PostGIS and MySQL
- MS SQL Server 2008 and SQLite/Spatialite
- Ingres and IBM DB2 can be used but it requires some tricks
− GIS file Input / Output: Shapefile, GML 3, KML 2.2 and OGR support (~33 vector data
formats and DBMS)
− Sensor Web (SOS) and metadata (CSW) services
− Cartographic preview
− Spatial analysis and advanced geo-processing
44. GeoKettle
●
GeoKettle releases are aligned with the ones of Pentaho
Data Integration (Kettle),
− GeoKettle then benefits all new features provided by
PDI (Kettle).
●
Kettle is natively designed to be deployed in cluster and
web service environments.
− It makes GeoKettle a perfect software component to be deployed
as a service (SaaS) in cloud computing environments as those
provided by Amazon EC2.
− It enables then the scalable, distributed and on demand
processing of large and complex volumes of geospatial data in
minutes for critical applications and without requiring a company
to invest in an expensive IT infrastructure of servers, networks
and software.
45. GeoKettle – Requirements and install
●
Very simple installation procedure
●
All you need is a Java Runtime Environment
●
Version 5 or higher
●
Just unzip the binary archive of GeoKettle ...
●
And let’s go !
●
Run spoon.sh (UNIX/Linux/Mac)
or spoon.bat (Windows)
●
Need help, please visit our wiki:
●
http://wiki.spatialytics.org
47. GeoKettle
●
Upcoming features:
− Implementation of data matching and conflation steps/jobs in order to
allow advanced geometric data cleansing and comparison of
geospatial datasets
− Read/write support for other DBMS, GIS file formats and services
− LAS (LiDAR), ...
− WFS-T, Sensor Web (TML, SensorML, SOS-T, ...), ...
− Dedicated steps for social media (Twitter, ...), OSM, generalization, ...
− Support of the third dimension, (x,y,z, M), linear reference systems,
raster ...
− Raster support: development in progress of a plugin to integrate all
capabilities provided by the Sextante library (BeETLe)
48. GeoMondrian
●
GeoMondrian is a "spatially-enabled" version of Pentaho
Analysis Services (Mondrian)
●
GeoMondrian brings to the Mondrian OLAP server what
PostGIS brings to the PostgreSQL DBMS
– i.e. a consistent and powerful support for geospatial data.
●
Licensed under the EPL
●
http://www.geo-mondrian.org
49. GeoMondrian
●
As far as we know, it is the first implementation of a true
Spatial OLAP (SOLAP) Server
− And it is an open source project! ;-)
●
Provides a consistent integration of spatial objects into the
OLAP data cube structure
− Instead of fetching them from an external spatial DBMS, web
service or a GIS file
●
Implements a native Geometry data type
●
Provides first spatial extensions to the MDX language
− Add spatial analysis capabilities to the analytical queries
●
At present, it only supports PostGIS datawarehouses
− But other DBMS will be supported in the next version!
50. Spatially enabled MDX
●
Goal: bring to Mondrian and MDX what SQL spatial extensions do
for relational DBMS (i.e. Simple Features for SQL and
implementations such as PostGIS).
●
Example query: filter spatial dimension members based on distance
from a feature
− SELECT
{[Measures].[Population]} on columns,
Filter(
{[Unite geographique].[Region economique].members},
ST_Distance([Unitegeographique].CurrentMember.Properties("geom"),
[Unite geographique].[Province].[Ontario].Properties("geom")) < 2.0
) on rows
FROM [Recensements]
WHERE [Temps].[Rencensement 2001 (2001-2003)].[2001]
51. Spatially enabled MDX
●
Many more possibilities:
− in-line geometry constructors (from WKT)
− member filters based on topological predicates
(intersects, contains, within, …)
− spatial calculated members and measures (e.g.
aggregates of spatial features, buffers)
− calculations based on scalar attributes derived from
spatial features (area, length, distance, …)
53. SOLAPLayers
●
SOLAPLayers is a lightweight cartographic component
(framework) which enables navigation in geospatial (Spatial
OLAP or SOLAP) data cubes, such as those handled by
GeoMondrian.
●
It aims to be integrated into existing dashboard frameworks in
order to produce interactive geo-analytical dashboards.
●
Such dashboards help in supporting the decision making
process by including the geospatial dimension in the analysis
of enterprise data.
●
First version stems from a GSoC 2008 project performed
under the umbrella of OSGeo.
●
Licensed under BSD (client part) and EPL (server part).
●
http://www.solaplayers.org
54. SOLAPLayers v1
●
Version 1 was based on OpenLayers and Dojo
●
It allows:
− the connection with a Spatial OLAP server such as
GeoMondrian,
− some basic navigation capabilities in the geospatial
data cubes,
− and the cartographic representation of some measures
as static or dynamic choropleth maps, maps with
proportional symbols.
57. SOLAPLayers v1
●
Version 1 was a mostly proof of concept!
●
It presents important limitations:
− Allows only the cartographic representation (no crosstabs or
charts)
− Works only for one measure and the spatial dimension !
− Offers limited navigation capabilities in the geospatial
data cubes
− Is able to connect to GeoMondrian only
− Extending the framework is difficult due to the
lack of flexibility and the poor documentation of Dojo,
− Integration with other currently used geo-web and dashboard
frameworks was difficult
− ...
58. SOLAPLayers 2.0
●
So, SOLAPLayers has undergone (and is still
undergoing ;-) ) a deep re-engineering!
●
Version 2 is fully based on ExtJS/GeoExt (and hence
OpenLayers)
− It will make its integration with other geo/web and
BI/dashboard frameworks easier
− It provides some new ExtJS components dedicated to
GeoBI!
− Based on the philosophy for the development of applications
adopted by these geo-web frameworks, it allows an easier
creation/maintenance of the produced geo-analytical
dashboards!
− Like ExtJS, it supports internationalization!
59. SOLAPLayers 2.0 – Architecture
1 Built-in or LDAP
Server
SOLAP Server Authentication
Native or XML/A
OLAP4J
MDX
Client
Server
SOLAPJSON
60. SOLAPLayers 2.0 – Architecture
1 Built-in or LDAP
Server
SOLAP Server Authentication
Native or XML/A
OLAP4J
MDX
OLAP Server
Client
Native or XML/A Server
SOLAPJSON
2
Geospatial data source
(WFS, DBMS, ...)
61. SOLAPLayers 2.0 – Architecture
1 Built-in or LDAP
Server
SOLAP Server Authentication
Native or XML/A
OLAP4J
MDX
OLAP Server
Client
Native or XML/A Server
SOLAPJSON
2
Bridge architecture
Geospatial data source • Maximize what is in place in organisations
• But, no Geo-MDX capabilities available!
(WFS, DBMS, ...)
62. SOLAPLayers 2.0 – Architecture
1 Built-in or LDAP
Server
SOLAP Server Authentication
Native or XML/A
OLAP4J
MDX
OLAP Server
Client
Native or XML/A Server
SOLAPJSON
2
3
Geospatial data source
(WFS, DBMS, ...) Geospatial data source
(WFS, DBMS, ...)
63. SOLAPLayers 2.0 – Architecture
1 Built-in or LDAP
Server
SOLAP Server Authentication
Native or XML/A
OLAP4J
MDX
OLAP Server
Client
Native or XML/A Server
SOLAPJSON
2
• For simple geo-dashboards
• Based on transactional data
3 • Thematic mapping
Geospatial data source • No Geo-MDX and drill-down
(WFS, DBMS, ...) Geospatial data source
(WFS, DBMS, ...) or roll-up capabilities!
64. SOLAPLayers 2.0 – Architecture
●
Many more connectors to come …
– GeoKettle
– CDA (BI domain)
– Oracle/MSSQL support
– ...
●
Many more output formats to be available ...
– Image (PNG, ...)
– KML
– ...
65. SOLAPLayers 2.0 – Geo-dashboard made easy!
1 Define the template of the dashboard in a HTML file
66. SOLAPLayers 2.0 – Geo-dashboard made easy!
1 Define the template of the dashboard in a HTML file
Define your dashboard components in a JS file
2 and map it to the div in the HTML file
80. Prédiction
Nœud de
configuration
KNN
Valeurs réelles
81. Conclusion & outlooks
● A rich and passionating topic ...
● And it is just an introduction ;-)
● Do not hesitate to ask for more demos!
● R&D is still required in this domain ...
● Open source allows cooperation/collaboration and makes research work easier:
students can use the infrastructure to develop their own research without
reinventing the wheel
● Based on this infrastructure, some research works on mobile GeoBI have
already been performed:
– Web services for the dissemination of geo-analytical data
– Methods for real time updating and integration of data stemming from
sensors
– Definition of mobile GeoBI context profiles
– Implementation of first mobile and context aware GeoBI clients
–…
● Towards active geospatial datawarehouses ...