These notes describe a generalised data integration architecture framework and set of capabilities.
With many organisations, data integration tends to have evolved over time with many solution-specific tactical approaches implemented. The consequence of this is that there is frequently a mixed, inconsistent data integration topography. Data integrations are often poorly understood, undocumented and difficult to support, maintain and enhance.
Data interoperability and solution interoperability are closely related – you cannot have effective solution interoperability without data interoperability.
Data integration has multiple meanings and multiple ways of being used such as:
- Integration in terms of handling data transfers, exchanges, requests for information using a variety of information movement technologies
- Integration in terms of migrating data from a source to a target system and/or loading data into a target system
- Integration in terms of aggregating data from multiple sources and creating one source, with possibly date and time dimensions added to the integrated data, for reporting and analytics
- Integration in terms of synchronising two data sources or regularly extracting data from one data sources to update a target
- Integration in terms of service orientation and API management to provide access to raw data or the results of processing
There are two aspects to data integration:
1. Operational Integration – allow data to move from one operational system and its data store to another
2. Analytic Integration – move data from operational systems and their data stores into a common structure for analysis
Data Integration, Access, Flow, Exchange, Transfer, Load And Extract Architecture
1. Data Integration, Access,
Flow, Exchange, Transfer,
Load And Extract
Architecture
Alan McSweeney
http://ie.linkedin.com/in/alanmcsweeney
https://www.amazon.com/dp/1797567616
2. Data Integration, Access, Flow, Exchange, Transfer,
Load, Share And Extract
• Set of data movements between data entities - data sources and data targets -
across the organisation’s data landscape
• Data integration is more than just extracting data from operational systems to
populate data warehouses and long-term data stores
• The movement, creation, transfer and exchange of data breathes life into the set
of organisation solutions
• Data integration is the combination of all these data flows, transfers, exchanges,
loads, extracts that occurs across the data landscape and the tools, methods and
approaches to facilitating and achieving them
• Data integration is an enterprise-level capability that should be available to all
applications and solutions
• The organisation’s data fabric should include infrastructural components and tools
that deliver these data integration facilities
• Individual solution and applications and their implementation projects should not
have to create (additional) point-to-point custom integrations
• Data interoperability and solution interoperability are closely related – you cannot
have effective solution interoperability without data interoperability
March 22, 2021 2
3. Evolution Of Data Integration
• With many organisations, data integration tends to have
evolved over time with many solution-specific tactical
approaches implemented
• The consequence is that there is frequently a mixed,
inconsistent data integration topography
• Data integrations are often poorly understood,
undocumented and difficult to support, maintain and
enhance
March 22, 2021 3
5. Data Integration
• Data integration has multiple meanings and multiple ways
of being used such as:
− Integration in terms of handling data transfers, exchanges,
requests for information using a variety of information movement
technologies
− Integration in terms of migrating data from a source to a target
system and/or loading data into a target system
− Integration in terms of aggregating data from multiple sources
and creating one source, with possibly date and time dimensions
added to the integrated data, for reporting and analytics
− Integration in terms of synchronising two data sources or
regularly extracting data from one data sources to update a target
− Integration in terms of service orientation and API management
to provide access to raw data or the results of processing
March 22, 2021 5
6. Two Aspects Of Data Integration
• Overall data integration architecture needs to handle both types
March 22, 2021 6
Operational
System
Operational
System
Operational
System
Operational Integration – allow data to move from one operational system and its data
store to another
Analytic Integration – move data from operational systems and their data stores into a
common structure for retrieval, reporting and analysis
Operational
System
Operational
System
Analytic Data
Store
Data Retrieval
7. Data Integration And Organisation Data Plumbing
March 22, 2021 7
Organisation
Technology
Solutions
Landscape
Data Plumbing
Required to
Support
Solutions
Landscape and
Solution
Interoperability
8. Data Fabric, Data Landscape And Data Entities
• The data landscape is an integrated view of all data
entities within (core) and outside (extended) the
organisation that the organisation obtains, shares and
provides data
• The data fabric is the aggregation of the data entities and
their data flows across the core and extended organisation
• Data entities are data assets that are involved in the
provisioning, storage, processing and transfer of
organisation data
− Data entities perform data-related activities across the spectrum
of data actions and events
− A data entity is a hardware or software technology component
involved in any form of data processing
March 22, 2021 8
9. Importance Of Data Integration In IT Architecture
• Enterprise Architecture – defines overall IT architecture for the organisation
• Data Architecture – defines the data architecture for the organisation, of which data integration and
interoperability is one element
• Solution Architecture – designs solutions in the context of overall enterprise and data architectures and the
need for solutions to access, integrate, exchange, transfer and extract data
− Effective data integration is key to solution interoperability
• Data Integration Architecture – defines a common approach to and set of enabling and implementing
technologies in the areas of data integration, access, flow, exchange, transfer, load and extract that can be
used by all IT solutions
March 22, 2021 9
Enterprise
Architecture
Data
Architecture
Data
Integration
Architecture
Solution
Architecture
10. Business And Information Technology Architecture
March 22, 2021 10
Business Strategy Business Architecture Business Governance
Information
Technology
Governance
Information
Technology Strategy
Information
Technology
Architecture
Data
Architecture
Information
Technology Security
Architecture
Application, Solution,
Infrastructure and
Service Architecture
11. Overall Data Architecture And Capabilities
March 22, 2021 11
Data Infrastructure
and Storage
Data Security,
Protection,
Access Control,
Authentication,
Authorisation
Data
Management,
Governance,
Architecture,
Operations,
Supporting
Processes
Data Reporting and
Analytics,
Visualisation Tools
and Facilities
Data Design,
Modelling,
Operational Data
Stores
Master and Reference
Data Management
Metadata Data
Management
Data Integration,
Access, Flow,
Exchange, Transfer,
Transformation,
Load And Extract
Data Warehouse, Data
Marts, Data Lakes
Unstructured Data
and Document
Management
External Data Sources
and Interacting
Parties
12. Data Integration Architecture
March 22, 2021 12
Data Sources Data Channels
Data Integration
Security,
Authentication,
Authorisation
Data Integration
Operations
Management,
Administration
Data Integration
Development, Testing
and Deployment
External Data Sources
and Targets
Data Integration
Technologies
Data Integration
Scheduler and Rules
Engine
Internal Data Sources
and Targets
13. Data Integration As Part Of Overall Information
Technology Architecture
March 22, 2021 13
Overall Business and IT
Architecture Context
Data
Architecture
Components
Data
Integration
Architecture
Components
14. Organisation Data Zones
• Data zones are containers for data entities with similar access
and location characteristics
March 22, 2021 14
Central Data
Entities and
Infrastructure
Zone
Business
Unit/Location
Entities and
Infrastructure
Zone(s)
Organisation Data Zone
Secure External Organisation Access Zone
Secure External Organisation Participation and Collaboration Zone
Insecure External Organisation Presentation And Access Zone
15. Sample Organisation Data Zones
• Central Data Infrastructure – this contains the central data applications
and their associated data
• Business Unit/Location Data Infrastructure – this is an individual
organisation business unit or location and the data entities it contains
• Organisation – this data zone represents the entire organisation and it
contains all the locations and business units or functions within the
organisation
• Secure External Organisation Access – this zone contains data entities that
enable secure access from outside the organisation
• Secure External Organisation Participation and Collaboration – this is a
location outside the physical organisation boundary where data entities
that are provided by or too trusted external parties reside, including cloud
platforms
• Insecure External Organisation Presentation And Access – this represents
a location where publicly accessible data entities reside. These entities are
regarded as insecure and/or untrusted
• Integration can occur within and between data zones
March 22, 2021 15
16. Source
Data
Entity
Target
Data
Entity
Internal And External Data
• Data can be defined as internal or external
− Internal data is (logically) held within a source data entity
− External data is data brought into or send out of a source data
entity to a target data entity
March 22, 2021 16
Internal Data
Data Entity
Data Load, Data Processing,
New Data Generation
External Data External Data
17. Internal And External Data
• At its core, data integration is concerned with enabling
the transition of data from internal to external states
• The internal and external state of data is separate from the
internal to external location of the source or target data
entity
− Internal – within the organisation data zones
− External – outside the organisation data zones
March 22, 2021 17
18. Data Integration Issues And Trends
March 22, 2021 18
The data landscape has been broadened and there are more data entities that form part of the extended
organisation data landscape as more applications are moved to the cloud and as cloud platforms are used for
providing additional facilities not currently present in organisations such as data analytics and machine learning
Initiatives and projects that are part digital transformation programmes involve integrating data between
internal and external parties
Need to reduce the latency of data integration as response time requirements are reduce
Performance, resilience and availability integration requirements are increasing
Need to deploy operational integrations more quickly to respond to business needs
There is a wider range of data entities as the data landscape increases in complexity
Process automation initiatives require an operational data integration platform
Greater volume and complexity of data integrations represent a potential data loss risk unless actively
monitored and managed
There are more data demands within the organisation especially in the areas of analytics and the associated
data integrations from operational data sources
19. Data Trends Affecting Data Integration
Greater volumes of operational data from increasing numbers of
different sources and providers
Greater volumes of derived data
More data sources both internal and external to the organisation
Data in larger numbers of different formats
Data with wider range of contents
Data being generated at different rates
Data being generated at different times
Data being generated with varying degrees accuracy, reliability
and greater fuzziness
Data that changes constantly
Data that is of different utility and value
March 22, 2021 19
20. Data Integration, Access, Flow, Exchange, Transfer,
Load And Extraction Processes
March 22, 2021 20
Application
Data Source
Application
Data Store
Data Load
Data
Transfer
Data
Exchange
Application
Application
Data
Access
Data
Extraction
Data Source
Data
Flow
Data
Migration
Data
Extraction
Data Store
Data
Replication
Location
Data
Publication
Application
Data
Presentation
Application
Data
Retrieval
21. Data Integration, Access, Flow, Exchange, Transfer,
Load And Extraction Processes
March 22, 2021 21
Application
Data Source
Application
Data Store
Data Load
Data
Transfer
Data
Exchange
Application
Application
Data
Access
Data
Extraction
Data Source
Data
Flow
Data
Migration
Data
Extraction
Data Store
Data
Replication
Location
Data
Publication
Application
Data
Presentation
Application
Data
Retrieval
Data Integration
22. Data Integration, Access, Flow, Exchange, Transfer,
Load And Extraction Processes
• Within any organisation, there will be many different data movements being performed in
different ways using different technologies and approaches:
− API/Web Service
− SOAP
− RPC
− SOA/ESB
− FTP
− ETL/ELT
− EDI
− AS1/2/3
− SMTP
− Database replication
− Change data capture
− IPaaS
− Stream processing
− Message queueing (MQSeries, MQTT, AMQP, Active MQ, JMS, Azure Queues, …)
− DB link
− Batch
− DDS
− OPC-UA/IEC 62541
− IEC 60870
− Proprietary technologies (such as SWIFT)
− … And many others
March 22, 2021 22
Proliferation of integration
technologies and approaches
indicates the long-standing and
pervasive nature of data
integration with information
technology
23. Wider Data Integration Concerns
March 22, 2021 23
Cloud Data
Store (Lake,
Warehouse)
SaaS
Application
and Data Store
On Premises
Data
Application
and Data Store
On Premises
Data
Warehouse
Cloud
Reporting and
Analysis
Application
On Premises
Reporting and
Analysis
Application
On Premises
Data
Application
and Data Store
On Premises
Data
Application
and Data Store
SaaS
Application
and Data Store
SaaS
Application
and Data Store
SaaS
Application
and Data Store
IaaS Hosted
Application
and Data Store
External
Collaborating
Party
External DMZ
24. Wider Data Integration Scenarios And Concerns
• The data integration landscape is becoming more
heterogenous leading to data integration across data
zones
− Between on-premises entities
− Between on-premises and external collaborating parties
− Between external collaborating parties and cloud-based entities
− Between on-premises and cloud SaaS solutions
− Between on-premises and cloud infrastructure IaaS solutions
− Within the same cloud provider
− Between different cloud providers
• The approach to data integration and the technologies to
use has changed from a purely internal use only solution to
one encompassing a range of inter-zonal data movements
March 22, 2021 24
25. Data Integration Scenarios
March 22, 2021 25
Cloud Data
Store (Lake,
Warehouse)
SaaS
Application
and Data Store
On Premises
Data
Application
and Data Store
On Premises
Data
Warehouse
Cloud
Reporting and
Analysis
Application
On Premises
Reporting and
Analysis
Application
On Premises
Data
Application
and Data Store
On Premises
Data
Application
and Data Store
SaaS
Application
and Data Store
SaaS
Application
and Data Store
SaaS
Application
and Data Store
IaaS Hosted
Application
and Data Store
External
Collaborating
Party
External DMZ
Between
On-premises
Entities
Between On-premises Entities and
External Collaborating Parties
26. Data Integration Logical Components
• On Premises Data Integration
− Performs integration within and between on-premises data
entities
• Data Integration Gateway
− Enables data integration between internal and external data
entities
• External Data Integration
− Enables data integration between internal and external data
entitles
− This includes between on-premises and cloud
March 22, 2021 26
27. Data Integration Components
March 22, 2021 27
Cloud Data
Store (Lake,
Warehouse)
SaaS
Application
and Data Store
On Premises
Data
Application
and Data Store
On Premises
Data
Warehouse
Cloud
Reporting and
Analysis
Application
On Premises
Reporting and
Analysis
Application
On Premises
Data
Application
and Data Store
On Premises
Data
Application
and Data Store
SaaS
Application
and Data Store
SaaS
Application
and Data Store
SaaS
Application
and Data Store
IaaS Hosted
Application
and Data Store
External
Collaborating
Party
On
Premises
Data
Integration
Data
Integration
Gateway
External DMZ
External
Data
Integration
28. Data Integration Platform
March 22, 2021 28
Data Integration Logically Extends
Across The Entire Data Span
Data Integration
Plugboard
29. Data Integration, Access, Flow, Exchange, Transfer,
Load And Extract Architecture – Options
• Options
− Implement full data integration architecture
− Implement a logical meta integration architecture combining
multiple tools and technologies
− Implement multiple separate (technology or application specific)
integration platform, with or without overall management
• Irrespective of the approach, creating and maintaining an
inventory of data integrations in an essential activity
March 22, 2021 29
30. Data Integration Mediation/Wrapper/Meta Tool
• Rather than seek to have one big data integration solution,
consider the option of using multiple tools that are
(logically) integrated into a common integration
architecture
March 22, 2021 30
Individual Data Integration Tools/Applications
Meta Data Integration Platform
31. Tool Or Meta Tool
• Meta data integration tool approach can increase
complexity without increasing flexibility or reducing cost
• Overhead of managing multiple individual integration tools
and integrating these with meta tool can be complex
March 22, 2021 31
32. Core And Extended Dimensions Of Data Integration
March 22, 2021 32
Data Sources
and Data
Ingestion,
Data Ingestion
Rules
Data Targets and
Data Mapping/
Transfer, Data
Integration Rules
Data
Transport
Technologies
Data
Transformations
and Data
Processing Rules
Data
Structures,
Formats and
Types
Security
and
Access
Control
Speed,
Volume,
Throughput,
Capacity,
Scalability
Development,
Validation,
Deployment
and
Maintenance
Monitoring,
Administration
and
Management
Logging,
Analysis,
Reporting,
Event and Alert
Management
Scheduling
and
Triggering Interim
Data
Storage/
Data
Staging
Capacity
Management
Availability
and
Continuity
Management
Platform
Architecture
Management
Operations
Management
Governance
and
Knowledge
Management,
Data
Semantics
Service Level
Management
33. Dimensions Of Data Integration
• Three dimensions of data integration
− Core – operational components – the core functionality of the data integration platform
• Data Sources and Data Ingestion, Data Ingestion Rules
• Data Targets and Data Mapping/Transfer, Data Integration Rules
• Data Transport Technologies
• Interim Data Storage/Data Staging
• Data Structures, Formats and Types
• Data Transformations and Data Processing Rules
− Platform – management aspects – the operational elements of the data integration platform
• Speed, Volume, Throughput, Capacity, Scalability
• Security and Access Control
• Development, Validation, Deployment and Maintenance
• Monitoring, Administration and Management
• Scheduling and Triggering
• Logging, Analysis, Reporting, Event and Alert Management
− Service – key supporting processes and enabling components – that need to be part of any
usable data integration platform
• Service Level Management
• Capacity Management
• Availability and Continuity Management
• Platform Architecture Management
• Governance and Knowledge Management, Data Semantics
• Operations Management
March 22, 2021 33
34. Data Integration Core Operational Characteristics
• Data Sources and Data Ingestion, Data Ingestion Rules – the
sources of data for data integration and the rules and technologies
for processing
• Data Targets and Data Mapping/Transfer, Data Integration Rules –
the targets of data for data integration and the rules and
technologies for processing
• Data Transport Technologies – support for the range of data
integration technologies
• Interim Data Storage/Data Staging – provision of a data staging
area for asynchronous data retrieval
• Data Structures, Formats and Types – support for a range of input
and output data formats and types and the ability to convert from
one to another
• Data Transformations and Data Processing Rules – facility for
transforming source data
March 22, 2021 34
35. Data Integration Platform Management
Characteristics
• Speed, Volume, Throughput, Capacity, Scalability – ability of the platform
to handle the volume of data integration activity within agreed times
• Security and Access Control – provision of facilities to authenticate and
authorise data access requests and to interact with data source security
layer
• Development, Validation, Deployment and Maintenance – capability to
develop, test, deploy and manage new data integrations and changes to
existing data integrations
• Monitoring, Administration and Management – facilities to monitor the
operation of the data integration platform and manage and administer it
• Scheduling and Triggering – capacity to manage data integration
schedules and events that trigger integrations
• Logging, Analysis, Reporting, Event and Alert Management -provision of
event and activity logging, the ability to define and receive alerts and the
ability to report on and analyse event data
March 22, 2021 35
36. Data Integration Platform Service Characteristics
• Service Level Management – ensuring that the platform complies with
agreed data integration performance and throughput service levels
• Capacity Management – monitoring the resources used by the integration
platform and ensuring that the platform has sufficient resources
• Availability and Continuity Management – guaranteeing that the platform
meets availability needs and ensuring its continuity of operations
• Platform Architecture Management – managing the overall platform
architecture, its upgrades, the additional of new facilities and the support
for new integration technologies
• Governance and Knowledge Management, Data Semantics – managing
knowledge about data integration and providing information about data
read from sources and transferred to targets
• Operations Management – managing the provision of operational support
services for all aspects of the data integration platform
March 22, 2021 36
37. Logical Unified Data Integration Architecture
March 22, 2021 37
Dashboard/
Analytics/
Reporting
Deployed Data
Integrations
Operational
Process Usage
Log
Scheduler,
Rules Engine
Operational
Data
Integrations
Integration Design and
Development, Version
Management and Control
Integration
Templates and
Template
Library
Integration
Publication/
Deployment
External
Data Sources
and Targets
Internal Data Sources
and Targets
Integration
Component
/Product
/Tool Library
Deployed
Integration
Operation
Alerting/
Event
Management
Management
and
Administration
Interface
Internal Access
Layer
External
Access
Layer
Data
Knowledge
Store
Security
Interim Data
Store
External
to
Internal
Translation
Data
Integration
Execution
Core integration Platform
Data
Integration
Gateway
38. Logical Unified Data Integration Architecture –
Components – 1/2
• Core integration Platform – this orchestrates and manages the operation of data integrations
• Deployed Integration Operation – these are specific data integrations that have been developed,
tested and are deployed to the Core Integration Platform
• Scheduler, Rules Engine – this component manages the definition and operation integration schedules
and the actioning of integrations based on triggering events
• Operational Data Integrations – these are data integrations that are deployed to operation
• Data Integration Execution – this is the component of the Core Integration Platform that executes data
integrations
• Data Integration Gateway – gateway components provide communications channels to external data
sources and targets
• External Access Layer/Connectors – this allows external data sources and targets connect to the Core
Integration Platform
• Internal Access Layer /Connectors – this allows internal data sources and targets connect to the Core
Integration Platform
• Security – this provide support for source and target authorisation and authentication and integration
with their security layers
• Internal Data Sources and Targets – these are the data sources and targets that are local to the
platform
• External Data Targets and Targets – these are the data sources and targets that are remote from the
platform
• External to Internal Translation – this is intended to represent a facility that translates external
requests to internal addresses to provide an additional level of security
March 22, 2021 38
39. Logical Unified Data Integration Architecture –
Components – 2/2
• Data Knowledge Store – this stores information about data being integrated with to enable its retrieval
by subject and content
• Interim Data Store – this is a staging area for data being stored between transfer from source to target
• Operational Process Usage Log – this contains a log of integration usage and activities
• Alerting/Event Management – this allows for the definition, maintenance and handling events and
alerts
• Dashboard/Analytics/Reporting – this provide a facilities to report on platform activity and usage
• Management and Administration Interface – this allows the platform to be managed and
administered
• Deployed Data Integrations – this represents the set of active deployed integrations
• Integration Design and Development, Version Management and Control – this enables data
integrations to be developed, tested, deployed to production and subsequently updated
• Integration Templates and Template Library – this contains a library of data integration templates that
can be used and reused during development
• Integration Component /Product/Tool Library – this represents a library of integration technology
tools that can be incorporated into and used in integration run times
• Integration Publication/ Deployment – this supports the process for deploying data integrations into
production
March 22, 2021 39
40. Generalised Data Integration Approach
• Every data integration consists of a minimum of two (logical)
components
1. A source extract/provision half
2. A target delivery half
• The source must make the data available in some form and either
allow (enable PULL) or initiate (PUSH) the data movement to the
target
• The target then receives (PUSH) or retrieves (PULL) the data
• Direct source to target data integration involves individual point-to-
point connections, bypassing any data integration hub
• There may be an interim transformation stage where the format
and content of the provided data is changed to suit the needs of
target
• Some Source/Target PUSH/PULL combinations imply the need for a
staging area where extracted/provided data from the source resides
before being passed to the target
− Asynchronous data integration
• Classification can be extended by allowing for multiple sources and
targets
March 22, 2021 40
Source
PUSH PULL
Target
PUSH
PULL
41. Logical Data Integration Scenarios
March 22, 2021 41
Data Source Data Source
Data Source
Data Source
Data Target
Data Source
Source PULL
Target PUSH
Data Source Data Target
Source PUSH
Target PUSH
Source PULL
Target PULL
Source PUSH
Target PULL
Source PUSH
Target PUSH
INCOMING HALF OUTGOING HALF
Data Target
Source PUSH
Target PULL
Data Target
Source PUSH
Target PUSH
Data Target
Data
Integration
Hub
42. Integration Combinations
• There are many different integration modes/patterns depending on factors such as:
− Number of sources for a single integration
− Number of targets for a single integration
− Push or pull by source and target
− Initiator of the integration – source, target or hub
• Single Source, Single Target
− Source Push Target Push
− Source Push Target Pull
− Source Pull Target Push
− Source Pull Target Pull
• Multiple Source, Single Target
− Source Push Target Push
− Source Push Target Pull
− Source Pull Target Push
− Source Pull Target Pull
• Single Source, Multiple Target
− Source Push Target Push
− Source Push Target Pull
− Source Pull Target Push
− Source Pull Target Pull
• Multiple Source, Multiple Target
− Source Push Target Push
− Source Push Target Pull
− Source Pull Target Push
− Source Pull Target Pull
March 22, 2021 42
43. Single Source PUSH Single Target PUSH
• Single data source pushes data to integration hub
• Hub pushes data to target
March 22, 2021 43
Data Source Data Target
Source PUSH
Target PUSH
44. Single Source PUSH Single Target PULL
March 22, 2021 44
• Single data source pushes data to integration hub
• Hub allows the target to pull data
Data Source Data Target
Source PUSH
Target PULL
45. Single Source PULL Single Target PUSH
March 22, 2021 45
• Data pulled from single data source
• Hub pushes data to target
Data Source Data Target
Source PULL
Target PUSH
46. Single Source PULL Single Target PULL
March 22, 2021 46
• Data pulled from single data source
• Hub allows the target to pull data
Data Source Data Target
Source PULL
Target PULL
47. Multiple Source PUSH Single Target PUSH
March 22, 2021 47
Data Source Data Target
Multiple Source PUSH
Target PUSH
Data Source
Data Source
• Multiple data sources push data to integration hub where
it is aggregated
• Hub pushes data to target
48. Multiple Source PUSH Single Target PULL
March 22, 2021 48
Data Source Data Target
Multiple Source PUSH
Target PULL
Data Source
Data Source
• Data pushed from multiple data sources and aggregated
• Hub allows the target to pull data
49. Multiple Source PULL Single Target PUSH
March 22, 2021 49
Data Source Data Target
Multiple Source PULL
Target PUSH
Data Source
Data Source
• Data pulled from multiple data sources and aggregated
• Hub pushes data to target
50. Multiple Source PULL Single Target PULL
March 22, 2021 50
Data Source Data Target
Multiple Source PULL
Target PULL
Data Source
Data Source
• Data pulled from multiple data sources and aggregated
• Hub pushes data to multiple targets
51. Single Source PUSH Multiple Target PUSH
March 22, 2021 51
Data Source Data Target
Source PUSH
Multiple Target PUSH
Data Target
Data Target
• Single data source pushes data to integration hub
• Hub allows the target to pull data
52. Single Source PUSH Multiple Target PULL
March 22, 2021 52
Data Source Data Target
Source PUSH
Multiple Target PULL
Data Target
Data Target
• Single data source pushes data to integration hub
• Hub allows multiple targets to pull data
53. Single Source PULL Multiple Target PUSH
March 22, 2021 53
Data Source Data Target
Source PULL
Multiple Target PUSH
Data Target
Data Target
• Data pulled from single data source
• Hub pushes data to multiple targets
54. Single Source PULL Multiple Target PULL
March 22, 2021 54
Data Source Data Target
Source PULL
Multiple Target PULL
Data Target
Data Target
• Data pulled from single data source
• Hub allows multiple targets to pull data
55. Multiple Source PUSH Multiple Target PUSH
March 22, 2021 55
Data Source Data Target
Multiple Source PUSH
Multiple Target PUSH
Data Target
Data Target
• Multiple data sources pushes data to integration hub and
aggregated
• Hub allows multiple targets to pull aggregated data
Data Source
Data Source
56. Multiple Source PUSH Multiple Target PULL
March 22, 2021 56
Data Source Data Target
Multiple Source PUSH
Multiple Target PULL
Data Target
Data Target
• Multiple data sources pushes data to integration hub and
aggregated
• Hub pushes aggregated data to multiple targets
Data Source
Data Source
57. Multiple Source PULL Multiple Target PUSH
March 22, 2021 57
Data Source Data Target
Multiple Source PULL
Multiple Target PUSH
Data Target
Data Target
• Data pulled from multiple data sources and aggregated
• Hub pushes aggregated data to multiple targets
Data Source
Data Source
58. Multiple Source PULL Multiple Target PULL
March 22, 2021 58
Data Source Data Target
Multiple Source PULL
Multiple Target PULL
Data Target
Data Target
• Data pulled from multiple data sources and aggregated
• Hub allows multiple targets to pull aggregated data
Data Source
Data Source
59. Data Integration Initiation And Notification
• For source PULL/target PUSH integrations, the integration hub is
always in direct control and can synchronise the two halves of the
integration – its can initiate the data PULL and then PUSH the
resulting data
• For other combinations, the hub has less control of synchronisation
− Source PUSH/Target PUSH – integration hub can PUSH the data to the target
after it has been PUSHed by the source
− Source PULL/Target PULL – integration hub can PULL the data from the source
when the target requests it
− Source PUSH/Target PULL – integration hub must wait for source to PUSH data
before it can respond to PULL request from target
March 22, 2021 59
Source
PUSH PULL
Target
PUSH
PULL
= Fully Synchronised
= Partially Synchronised
= Unsynchronised
60. Synchronous And Asynchronous Data Integration
• Synchronous integration occurs where the hub initiates both
the PULLing of source data and the PUSHing of transmitted
data
• Asynchronous integration is where the source supply and the
target provision of data do not occur in sequence or where the
triggering of the source supply or target provision events are
not controlled
• This includes subscription-type integration where the data is
retained by the hub and retrieved by subscribers
March 22, 2021 60
Data Source Data Target
Source PULL
Target PUSH
61. Data Integration Hub Data Retention
• How long should the integration hub retain data?
• The integration hub should not become one more
organisation data store where data is retained forever
• Target PULL integrations are the potential source of
accumulated retained undelivered data
• The integration hub needs to include a facility to purge
unretrieved data and/or the data retention interval needs
to be specified as a data integration attribute
• Where a target makes a PULL request for data no longer
available, the integration hub needs to handle this.
March 22, 2021 61
62. Data Integration Initiation – Source PULL/Target
PUSH
March 22, 2021 62
Data Target
Data Source Data Target
Hub Requests Data from Source and Send it
To The Target
63. Data Integration Initiation – Source PUSH/Target
PUSH
March 22, 2021 63
Data Source Data Target
Hub Receives Data from Source
Data Target
Data Target
Hub Pushes Data to Target
64. Data Integration Initiation – Source PULL/Target
PULL
March 22, 2021 64
Data Target
Data Target
Target Requests Data
Data Source Data Target
Hub Pulls Data From Source
Data Target
Data Target
Hub Responds to Pull Request From Target
65. Data Integration Initiation – Source PUSH/Target
PULL
March 22, 2021 65
Data Target
Data Target
Target Requests Data
Hub Responds Data Is Not Available
Data Source Data Target
Source Pushes Data to Hub Hub Receives Data from Source
Data Target
Data Target
Hub Notifies Target Data is Available
Data Target
Data Target
Target Requests Data
Hub Responds to Pull Request From Target
66. Data Integration Security
• Data integration security arises in fours areas
− Source
• PUSH – source may need to authenticate with the integration hub
• PULL – integration hub may need to authenticate with data source
− Target
• PUSH – integration hub may need to authenticate with data target
• PULL – target may need to authenticate with the integration hub
• Integration hub needs to support a range of authentication
and authorisation protocols
• Integration hub also needs to support security operations
and administration
March 22, 2021 66
67. Data Integration Security – Source PUSH
March 22, 2021 67
Data Source Data Target
Hub Authenticates Source and Transmits
Authorisation and Access Details
Data Source Data Target
Data Source Data Target
Source Authenticates With Hub, Identifying
Integration Name
Source PUSHes data
68. Data Integration Security – Source PULL
March 22, 2021 68
Data Source Data Target
Source Authenticates Source and Transmits
Authorisation and Access Details
Data Source Data Target
Data Source Data Target
Hub Authenticates With Source, Identifying
Integration Name
Hub PULLs data
69. Data Integration Security – Target PUSH
March 22, 2021 69
Data Target
Data Target
Data Target
Data Target
Data Target
Data Target
Target Authenticates Source and Transmits
Authorisation and Access Details
Hub Authenticates With Target, Identifying
Integration Name
Hub PUSHes data
70. Data Integration Security – Target PULL
March 22, 2021 70
Data Target
Data Target
Data Target
Data Target
Data Target
Data Target
Hub Authenticates Target and Transmits
Authorisation and Access Details
Target Authenticates With Hub, Identifying
Integration Name
Target PULLs data
71. Data Integration Metadata
• Data that provides information about the data integration that enables the
integration to be defined, implemented, operated, managed and monitored
• Classifications of metadata types
March 22, 2021 71
Types of
Integration
Metadata
Descriptive Information about the data integration
Business
What the data is, its sources, targets, meaning and relationships
with other data
Structural How the data integration is organised, operated and how versions
are maintained?
Administrative/
Process
How the data integration should be managed and administered
through its lifecycle stages and who can perform what operations
on the metadata
Statistical Information on actual data integration options, usage and other
volumetrics
Reference Sets of values for structured metadata fields
72. Attributes Of A Data Integration
• Each data
integration has a
number of
attributes or sets
of metadata that
defines its
operation and
use in detail
• This information
is needed to
define and
operate the
integration
• The information
must be
collected, stored,
made available
and maintained
in a metadata
store
March 22, 2021 72
Attribute Description
Identifier Defines a unique integration identifier
Related Integrations Lists related integrations and identifies the nature of the relationships, including any dependencies
Source(s) Defines the source systems or locations where the source data will be obtained from
Target(s) Defines the target systems or locations to which the data will be delivered or made available
Push/Pull from Source Identifies if the data is pulled or pushed from the source
Push/Pull from Target Identifies if the data is pulled or pushed to the target
Source Data Format Defines the format of the source data
Target Data Format Defines the format of the target data
Source Protocol Defines the interface protocol used to obtain the source data and any protocol-specific information
Target Protocol Defines the interface protocol used to deliver the target data and any protocol-specific information
Validation Lists any validations to be performed on the source data, defining where they are blocking or non-
blocking and any exception processing to be performed
Transformation Defines any transformation to be performed on the source data including transformation steps and
any splits or aggregations performed
Data Size Contains an estimate of the size of the source and (transformed) target data
Trigger Defines the event(s) that triggers the integration, if relevant
Frequency Defines the expected frequency of the data integration, if relevant
Data Retention Defines how long the data should be retained between source and target
Monitoring and Alerting Lists how the integration will be monitored and how alerts will be generated based on events
Source Access Security Defines any security associated with accessing the data source
Target Access Security Defines any security associated with accessing the data target
Audit Log Identifies where audit information relating to the operation and use of the integration ate stored
Restart After Failure Lists detail on how the integration should be recovered and restarted after failure
Data Sensitivity Lists the sensitivity of the data being handled by the integration
Ownership Identifies the business and technical owners of the integration
Priority Defines any priority assigned to the integration
Supporting Documentation Identifies where documentation relating to the integration is available
User Interface to
View/Maintain Transferred
Data
Identifies the user interface that is available to view and maintain the transferred data
Version Details on the current integration version and any previous versions
Active/Inactive Flag Indicates if the integration is active or inactive
73. Data Integration Specification
• Data integration can be logically specified as follows
{Integration{Name, Attributes}
Sources
{Source1,TechnologyType,Direction,Attributes}
{Source2,TechnologyType,Direction,Attributes}
{…}
}
{Transformation
{Name, Attributes}
Steps
{Step1,<Processing>}
{Step2,<Processing>}
[…]
}
Targets
{Target1,TechnologyType,Direction,Attributes}
{Target2,TechnologyType,Direction,Attributes}
{…}
}
March 22, 2021 73
Set of data sources, the mechanisms
by which data is transferred, the
transfer direction (PUSH/PULL) and
the extended integration attributes
The transformation performed on
the source data to create the data
sent to or made available to the
target
Set of data targets, the mechanisms
by which data is transferred, the
transfer direction (PUSH/PULL) and
the extended integration attributes
Overall integration identifier and
attributes
74. Data Integration Specification
• Attributes can be defined at the overall data integration
level or at the individual data source and target definition
level
• Technology type could be one of:
− FT – transfer a file using a file transfer protocol
− API – information is requested using an API made available by the
application
− MSG – information is exchanged using a message queueing
protocol
− ETL – data is exchanged using an ETL process
− HTTP – data is exchanged using HTTP GET/PUT
• This describes a common approach to defining data
integrations
March 22, 2021 74
75. Data Integration Transformation Specification
• Set of data processing activities, requiring on or more inputs
and performed in structured interim contingent outcome-
dependent order or sequence to generate one or more outputs
and cause one or more outcomes
• Transformation is the self-contained unit that completes a
given task
• Transformation can consist of sub-processes and/or activities
• Transformation and its constituent activities, stages and steps
can be decomposed into a number of levels of detail, down to
the individual atomic level
• Transformation is primarily concerned with its outcomes and
outputs
March 22, 2021 75
76. Data Integration Transformation
March 22, 2021 76
• Transformation can be represented at different levels of detail
Transformation
Trigger(s)
Required Input(s)
Output(s)
Outcome(s)
77. Data Integration Transformation
March 22, 2021 77
• Activities within transformation can be linked by routers that
direct flow and maintain order based on the values of output(s)
and the status of outcome(s)
Data
Processing
Trigger(s)
Required
Input(s)
Output(s)
Outcome(s)
Router
Data
Processing
Trigger(s)
Required
Input(s)
Output(s)
Outcome(s)
Data
Processing
Trigger(s)
Required
Input(s)
Output(s)
Outcome(s)
78. Standardised Deployed Operational Data
Integrations
March 22, 2021 78
Dashboard/
Analytics/
Reporting
Deployed Data
Integrations
Operational
Process Usage
Log
Scheduler,
Rules Engine
Operational
Data
Integrations
Integration Design and
Development, Version
Management and Control
Integration
Templates and
Template
Library
Integration
Publication/
Deployment
External
Data Sources
and Targets
Internal Data Sources
and Targets
Integration
Component
/Product
/Tool Library
Deployed
Integration
Operation
Alerting/
Event
Management
Management
and
Administration
Interface
Internal Access
Layer
External
Access
Layer
Data
Knowledge
Store
Security
Interim Data
Store
External
to
Internal
Translation
Data
Integration
Execution
Core integration Platform
Data
Integration
Gateway
79. Next Steps
• Understand the Scope of the Current Data Integration
State
− Create an inventory of data integration technologies
− Create an inventory of existing data integrations
• Create a Future State Data Integration Architecture
− Create a data integration reference architecture
− Translate reference architecture into an implementation design
− Map implementation design to integration technologies and
products
− Map existing integrations to implementation design
March 22, 2021 79