Digital integration hub: Why, what and how?
Andrea Gioia, CTO at Quantyca and co-founder at Blindata.io
https://www.meetup.com/Milano-Kafka-meetup/events/282183436/
DSPy a system for AI to Write Prompts and Do Fine Tuning
Digital integration hub: Why, what and how?
1. KAFKA Meetup - December 2021
Andrea Gioia
CTO at Quantyca
Co-Founder at Blindata
Digital Integration Hub: why, what and how?
2. Legacy systems
Some truths to face
Legacy system are growing in
size and number. They are here
to stay!
If your architecture does not
manage legacy systems, legacy
systems will menage sooner or
later your architecture.
3. Who am I?
Not an easy question to answer but keeping it simple...
Andrea Gioia
andrea.gioia@quantyca.it
Quantyca is a privately owned technological
consulting firm specialized in data and metadata
management based in Italy
quantyca.it
Blindata is a SAAS platform that leverages Data
Governance and Compliance to empower your
Data Management projects.
blindata.io
CTO
CO-FOUNDER
4. What is legacy modernization
Digital transformation continuously push toward the
development of new
● touchpoints in a omnichannel logic (System of
engagement)
● analytical and AI based services (System of insight)
These new applications are usually integrated with
back-office legacy systems with a point-to-point logic.
This way of integrating the new with the legacy does not scale
up in the long term.
Because the legacy cannot be simply thrown away a better
integration architecture is needed in order to modernize them
in place.
...and why it matters
System of Engagement System of Insight
System of Records
Legacy
Systems
Application
Layer
Integration
Layer
Point to point “Spaghetti” integration
5. Legacy modernization
TIME-TO-MARKET AND BUSINESS AGILITY IMPROVEMENT: Go beyond the limits imposed by
legacy systems to improve business agility
Key business drivers
COSTS AND RISKS REDUCTION: Rationalize integrations to reduce development and
maintenance costs and to avoid uncontrolled access to data
RESILIENCE AND PERFORMANCE IMPROVEMENT: Ensure the uptime of legacy systems even
in the face of significant increases in the workloads
6. Integration architecture #1
All new functionalities are implemented directly by extending
the legacy system or by buying complementary products
offered by the same vendor of the legacy system.
Integration layer if present is limited to an API Gateway to
decouple legacy backend from frontend applications
Legacy systems take it all
System of Engagement
Frontend
System of Insight
Frontend
System of Records
Legacy
Systems
Application
Layer
Integration
Layer
API Gateway
SoE
&
SoI
Backend
SoE
&
SoI
Backend
SoE
&
SoI
Backend
SoE
&
SoI
Backend
SoE
&
SoI
Backend
TIME-TO-MARKET AND BUSINESS AGILITY
IMPROVEMENT
COSTS AND RISKS REDUCTION
RESILIENCE AND PERFORMANCE
IMPROVEMENT
7. Integration architecture #2
Integration rationalization through composite services
System of engagement System Of Insight
System of Records
Legacy
Systems
Application
Layer
Integration
Platform
API Gateway
Request Based Integration Layer
Application Services
Process Services
Sourcing Services
Composite Services
Integrations are rationalized through different layers of
reusable and composable services.
Sourcing services wrap legacy systems, process service
orchestrate business process and application services
provide a backend for frontend applications
TIME-TO-MARKET AND BUSINESS AGILITY
IMPROVEMENT
COSTS AND RISKS REDUCTION
RESILIENCE AND PERFORMANCE
IMPROVEMENT
8. Integration architecture #2
Integration rationalization through data virtualization
System of engagement System Of Insight
System of Records
Legacy
Systems
Application
Layer
Integration
Platform
API Gateway
Request Based Integration Layer
Application Layer
Business Layer
Physical Layer
Virtual DWH
TIME-TO-MARKET AND BUSINESS AGILITY
IMPROVEMENT
COSTS AND RISKS REDUCTION
RESILIENCE AND PERFORMANCE
IMPROVEMENT
Integrations are rationalized through different layers of
views served by a data virtualization application.
Physical layer wraps legacy systems, business layer
exposes the business model and application layer provide
projections designed to facilitate consumption.
9. Integration architecture #2
Integration rationalization
System of engagement System Of Insight
System of Records
Legacy
Systems
Application
Layer
Hybrid
Integration
Platform
API Gateway
Request Based Integration Layer
Virtual DWH
Composite Services
TIME-TO-MARKET AND BUSINESS AGILITY
IMPROVEMENT
COSTS AND RISKS REDUCTION
RESILIENCE AND PERFORMANCE
IMPROVEMENT
Composite services and data virtualization can be used in the
same architecture. The former is preferred to back system of
engagement the latter to back system of insight.
Both solutions simplify integrations but don’t reduce the
workload on the backend systems
10. Integration architecture #3
Data offloading
System of engagement System Of Insight
System of Records
Legacy
Systems
Application
Layer
Hybrid
Integration
Platform
API Gateway
Event-Based Integration Layer
High-Performance Data Store
Microservices
Metadata Management
TIME-TO-MARKET AND BUSINESS AGILITY
IMPROVEMENT
COSTS AND RISKS REDUCTION
RESILIENCE AND PERFORMANCE
IMPROVEMENT
Data offloaded from legacy systems are aggregated into
low-latency, high performance datastore accessible via APIs,
events or batch.
The data store synchronizes with the beck ends via
event-driven integration patterns.
11. Digital Integration Hub
Key building blocks
Event store
High
performance
data store
Connectors
Legacy
Systems Applications
Services
Where the data is
stored
Keeps the legacy
systems and the high
performance data
store in sync
offloading all
modifications to
relevant data in real
time
Transform technical
events coming from
connectors to domain
and business events
that can be
consumed
downstream by high
performance data
store or other
consumers (event
driven integration)
Stores domain
specific data exposing
a single
consolidated view of
entities
~
Supports fast
ingestion to reduce
eventual consistency
window
~
Can support
analytical queries
Connect to high
performance data
store for read queries
Execute write on the
legacy systems by
means of command
events pushed on the
event store
(command query
responsibility
segregation)
Where the data is
used
12. Connectors
Data acquisition patterns
Trigger
(Push Mode)
Good for neo-legacies.
Problematic for old-school
legacies.
Change Data Capture
(Backend Interception)
The best option but the CDC
connectors can be quite
expansive.
Active Pooling
(Pop Mode)
Difficult to find a trade off that
satisfies the load constraints of
the legacy and real time needs
of applications.
Interesting source connectors for legacy
modernization available for Kafka are:
○ JDBC Connectors: for active pooling
○ Debezium Connector: for CDC from
MySql, Postgres, …
○ Salesforce Connectors: for CDC from
salesforce
○ Oracle Connector: for CDC from oracle
○ Partner Connectors: for CDC from other
legacies like SAP and Mainframe (ex. Qlik
Replicate Connector)
Decorating Collaborator
(Frontend Interception)
Largely cited in letterature.
Good in theory problematic in
reality.
13. Event Store
Event driven integration
Legacy System Streaming Platform
Technical
Events
(Speed &
Fidelity)
Domain
Events
(Trusted
Views)
Business
Events
(Ease of
consumption)
High Performance Data
Store
14. Event Store
Offloading patterns
Legacy System Streaming Platform
Technical
Events
(Speed &
Fidelity)
Domain
Events
(Trusted
Views)
One table per topic
Changes to each table are mapped to
distinct topics, one topic per table.
Stream joins are used to
create domain events from technical
events spread across different topics
Preserving transactional coherence within
aggregates can be complex when the
aggregate is spread among multiple tables
updated by long running transactions
15. Event Store
Offloading patterns
Legacy System Streaming Platform
Technical
Events
(Speed &
Fidelity)
Domain
Events
(Trusted
Views)
One aggregate per topic
All changes to tables that are part of the
same aggregate are mapped to the same
topic.
The identifier of the aggregate is used to
partition the topic.
It’s easier to create domain events from
technical events preserving transactional
coherence even with complex aggregates
or unpredictable transactional pattern.
16. Event Store
Offloading patterns
Legacy System Streaming Platform
Technical
Events
(Speed &
Fidelity)
Domain
Events
(Trusted
Views)
Transactional outbox pattern
The legacy system is modified in order to
inserts messages/events into an outbox
table as part of the local transaction.
The modification can be performed at
code or database level (es. triggers or
materialized views).
The connector that offload data to the
streaming platform is triggered by the
outbox table.
OUTBOX Table
COMMIT TRX
INSERT
UPDATE
DELETE
INSERT
17. Event Store
Offloading patterns
Legacy System Streaming Platform
Technical
Events
(Speed &
Fidelity)
Domain
Events
(Trusted
Views)
Triggered publisher
All changes to tables that are part of the same
aggregate are mapped to the same topic as
technical event that can contain only the
aggregate id and transaction id as payload.
For every transaction id a stream processor query
the legacy database extracting the modified
aggregate, filtering by id, and publishing it as
payload of a new domain event
To reduce the workload on legacy the stream
processor can query a read replica
Transactional coherence within the aggregate is
guaranteed by the upstream database
18. High-performance data store
Some options with pros and cons
KSQL
DB
Document
DB
In Memory
DB
HATP DB
PROS
+ Does not requery external components
+ Low Latency
+ Can handle very high throughput
+ Moving from event to state is simple and requires a small
integration effort
+ Stored data can be consumed directly also by stream
processors
CONS
- Not SQL compliant
- Serving to external consumers have some limitations that
must be managed directly by the consumers
- It’s not a good fit for complex analytical workloads
- TCO maybe not optimal for huge data volumes
19. High-performance data store
Some options with pros and cons
KSQL
DB
Document
DB
In Memory
DB
HATP DB
PROS
+ Does not require format transformation during the whole
flow from streaming platform to services
+ Largely used by service developers, probably already
present in the architecture
+ Good fit to expose single read view of domain entities
consolidated from different sources
+ Quite easy to handle schema changes
CONS
- Not SQL compliant
- Not a good fit for complex analytical workloads
- Not a good fit to expose business entity whose access
pattern from service is not predictable
- Can have some performance issues at very high
throughput
20. High-performance data store
Some options with pros and cons
KSQL
DB
Document
DB
In Memory
DB
HATP DB
PROS
+ SQL compliant (some of them, not all)
+ Can handle very high throughput
+ Can handle complex analytical queries
+ Good fit to expose read view of domain events and
business events as well
+ TCO can be optimize selecting the right strategy of
distribution of stored data between RAM and disk
CONS
- Require format transformation from document to
relational and then back to document when moving data
from streaming platform first and to service then
- Changes in schema performed upstream must be actively
managed
21. High-performance data store
Some options with pros and cons
KSQL
DB
Document
DB
In Memory
DB
HATP DB
PROS
+ Can handle very high throughput
CONS
- Not SQL compliant (in most of the cases, not all)
- Not a good fit for complex analytical workloads
- Can require format transformation when data is read
from streaming platform first and then again when data
is consumed by services
- TCO maybe not optimal for huge data volumes
22. Closing the loop with CQRS
From services back to legacy systems
Legacy System Streaming Platform
Technical
Events
(Speed &
Fidelity)
Domain
Events
(Trusted
Views)
High
Performance
Data Store
Business
Events
(Ease of
consumption)
Commands Micro/Mini
Services
READ
WRITE
23. The legacy modernization journey
Offloading, Isolation and Refactoring
Legacy System
Digital Integration
Hub
Applications
1
Legacy
Offloading
Legacy System
Digital Integration
Hub
Applications
Anti Corruption
Layer
Bubble Context
2
Legacy
Isolation
Digital Integration
Hub
Applications
Anti Corruption
Layer
Bubble Context
3
Legacy
Refactoring
24. Takeaways
Digital integration hub can be seen as a way of decoupled systems using data as anti corruption layer. Data offloaded into the
integration platform become a first-class citizen of the new data centric architecture.
Benefits
○ Responsive user experience
○ Offload legacy systems from expansive workloads
generated by front-end services
○ Support legacy refactoring
○ Align services to business domain
○ Enable real time analytics
○ Foster a data centric approach to integration
Challenges
○ Adapting the conceptual architecture to your
specific context
○ Assembling different technology components,
possibly from different vendors
○ Operating a complex distributed and loosely coupled
architecture
○ Supporting bidirectional synchronization
○ Designing the domain data models for the business
entities
○ Developing services that can tolerate eventual
consistency
○ Managing organizational politics related to data
ownership