The emerging DataOps is not Just DevOps for Data. According to Gartner, DataOps is a collaborative data management practice focused on improving the communication, integration and automation of data flows between data managers and consumers across an organization.
The goal of DataOps is to create predictable delivery and change management of data, data models and related artifacts. DataOps uses technology to automate data delivery with the appropriate levels of security, quality and metadata to improve the use and value of data in a dynamic environment.
This session will discuss how to add Security in DataOps and DevOps.
3. 3
Ulf Mattsson
• Head of Innovation at TokenEx
• Chief Technology Officer at Protegrity
• Chief Technology Officer at Atlantic BT Security Solutions
• Chief Technology Officer at Compliance Engineering
• Developer at IBM Research and Development
• Inventor of 70+ issued US patents
• Provided products and services for
• Data Encryption and Tokenization,
• Data Discovery,
• Robotics, ERP, CRM in Manufacturing,
• Cloud Application Security Broker (CASB),
• Web Application Firewall (WAF),
• Managed Security Services,
• Security Operation Center (SOC),
• Benchmarking/Gap-analysis
5. 5
The privacy breach trend is alarming
The US FEDERAL TRADE COMMISSION (FTC) reported that credit card
fraud tops the list of identity theft reports in 2018. FTC received nearly
three million complaints from consumers in 2018.
The FTC received more than 167,000 reports from people who said their
information was misused on an existing account or to open a new credit
card account
Source: Redhat / IBM
13. 13
Are IBM mainframes still used?
• 70 percent of all enterprise data globally, resides on a mainframe
• Visa, for example, uses the mainframe to secure billions of credit and debit
card payments every year. In fact, mainframes process about $7 trillion in
Visa payments annually, roughly equal to the annual GDP of Japan, the
world’s third largest economy
• 71 percent of all Fortune 500 companies have their core businesses located
on a mainframe
• 96 of the world’s largest 100 banks
• 23 of the world’s top 25 retailers use the mainframe to make sure they can
provide their customers with customized service.
• All Top 10 insurers use the cloud on the mainframe to save money for their
consumers
Source: Forbes
17. 17
Software Developer Challenges
Source: OVHcloud
1. Pace of change in the software development
industry.
2. With the move to modern software development on
web, mobile and cloud, new languages, frameworks,
plug-ins, modules and components appear almost
weekly.
3. How can developers keep on top of all the options
available and how can developers ensure the choices
made of which to use, are the right ones in the long-
term?
4. Building a new generation of modern applications
may require significant reskilling of the development
team.
5. For maintaining existing applications, there may be
little opportunity for developers to add new skills.
6. Some developers will embrace the change, whilst
others will prefer to stick with what they know.
18. 18
Low-code development
Source: Gartner, OVHCloud
Enterprise low-code application
platforms offer compelling
productivity gains.
• By 2024, three-quarters of large
enterprises will be using at least
four low-code development tools
for both IT application
development and citizen
development initiatives.
• By 2024, low-code application
development will be responsible
for more than 65% of application
development activity.
19. 19
Low-code development platforms
Source: OVHcloud
Faster development
• Writing less code means more apps can be built faster than ever before.
Digital transformation
• Transformation of manual and paper-based processes into cloud, desktop, web and mobile applications
for better efficiency, productivity, data accuracy and customer service.
Reducing the maintenance burden
• By simplifying application maintenance as well as development, overall life-cycle costs can be reduced,
and resources freed up to build new applications.
Move to mobile
• Satisfy the increasing demand for mobile applications across the business.
Cloud computing
• Improve availability while cutting operational costs by quickly moving applications, or parts of applications
to the cloud for better agility and elasticity.
Skills management
• Eliminate pockets of expertise and specialized skills. Allow any developers to work on any part of an
application. Eliminate resource shortages and conflicts.
Combating Shadow IT
• Accelerate the deployment of applications so that business users don’t feel they need to take matters into
their own hands. Deliver apps in days or weeks instead of months or years.
22. 22
DataOps (Gartner)
Definition:
• DataOps is a collaborative data management practice focused on improving the communication, integration and
automation of data flows between data managers and consumers across an organization.
• The goal of DataOps is to create predictable delivery and change management of data, data models and related artifacts.
• DataOps uses technology to automate data delivery with the appropriate levels of security, quality and metadata to
improve the use and value of data in a dynamic environment.
Position and Adoption Speed Justification:
• Currently, there are no standards or known frameworks for DataOps.
• Today's loose interpretation makes it difficult to know where to begin, what success looks like, or if organizations are
even "doing DataOps" at all.
User Advice:
• As a new practice, DataOps will be most successful on projects targeting a small scope with some level of executive
sponsorship, primarily from the CDO or other top data and analytics leader.
• Executive sponsorship will be key as DataOps represents a new way of delivering data to consumers.
• Practitioners will have to overcome the resistance to change existing practices as they introduce this concept.
23. 23
DataOps is NOT Just DevOps for Data
• One common misconception about DataOps is that it is just DevOps applied
to data analytics.
• While a little semantically misleading, the name “DataOps” has one positive
attribute.
• It communicates that data analytics can achieve what software
development attained with DevOps.
• DataOps can yield an order of magnitude improvement in quality and cycle
time when data teams utilize new tools and methodologies.
• The specific ways that DataOps achieves these gains reflect the unique
people, processes and tools characteristic of data teams (versus software
development teams using DevOps).
Source: datakitchen
39. 39
Security Tools for DevOps
Dynamic Application Security
Testing (DAST) dynamically
'crawls' through an
application's interface, testing
how it reacts to various inputs
Manual reviews
often catch
obvious stuff that
tests miss, and
developers can
miss
Source: Securosis
40. 40
Security Tools for DevOps
Static Application
Security Testing
(SAST) examines all
code — or runtime
binaries
(less effective for
Micro Services)
Fuzz testing is
essentially throwing
lots of random
garbage at
applications,
seeing whether any
particular (type of)
garbage causes
errors
Vulnerability
Analysis including
platform
configuration, patch
levels or application
composition to
detect known
vulnerabilities
Runtime Application
Self Protection
(RASP) provides
execution path
scanning,
monitoring and
embedded
application white
listing
(effective for Micro
Services)
Interactive
Application Self-
Testing (IAST)
provides execution
path scanning,
monitoring and
embedded
application white
listing
(emerging)
Source: Securosis, Webomates
Regression testing enhances the
visibility on your build quality before
putting it in production.
Examples:
Full Regressions, Overnight Targeted
Checks and Smoke Checks executed
with manual, automation, crowdsourcing
and artificial intelligence and allows a
software development team to quickly
validate their UI and API as well as
load test it.
41. 41
DevOps - Security for APIs and Microservices
Source: Securosis
Trend:
Test/scan API flows,
context, parameter
input/output.
DAST works better.
Old:
Larger monolithic apps that
contain more context.
SAST works well.
Shift right
Trend:
IAST is
emerging
49. 49
A Framework can help organizations prepare
for GDPR
IBM Framework Helps Clients Prepare for the EU's General Data Protection
Regulation
50. 50
Data sources
Data
Warehouse
In Italy
Complete policy-
enforced de-
identification of
sensitive data across
all bank entities
Tokenization for Cross Border Data-centric Security (EU GDPR)
• Protecting Personally Identifiable Information
(PII), including names, addresses, phone, email,
policy and account numbers
• Compliance with EU Cross Border Data
Protection Laws
• Utilizing Data Tokenization, and centralized
policy, key management, auditing, and
reporting
52. 52
• Privacy enhancing data de-identification terminology and classification of techniques
Source: INTERNATIONAL STANDARD ISO/IEC 20889
Encrypted data
has the same
format
Server model Local model
Differential
Privacy (DP)
Formal privacy measurement models
(PMM)
De-identification techniques
(DT)
Cryptographic tools
(CT)
Format
Preserving
Encryption (FPE)
Homomorphic
Encryption
(HE)
Two values
encrypted can
be combined*
K-anonymity
model
Responses to queries
are only able to be
obtained through a
software component
or “middleware”,
known as the
“curator**
The entity
receiving the
data is looking
to reduce risk
Ensures that for
each identifier there
is a corresponding
equivalence class
containing at least K
records
*: Multi Party Computation (MPC)
**: Example Apple and Google
ISO Standard for Encryption and Privacy Models
54. 54
Data
Warehouse
Centralized Distributed
On-
premises
Public
Cloud
Private
Cloud
Vault-based tokenization y y
Vault-less tokenization y y y y y y
Format preserving
encryption
y y y y y
Homomorphic encryption y y
Masking y y y y y y
Hashing y y y y y y
Server model y y y y y y
Local model y y y y y y
L-diversity y y y y y y
T-closeness y y y y y y
Formal
privacy
measurement
models
Differential
Privacy
K-anonymity
model
Privacy enhancing data de-identification
terminology and classification of techniques
De-
identification
techniques
Tokenization
Cryptographic
tools
Suppression
techniques
Example of mapping of data security and privacy techniques (ISO) to different
deployment models
55. 55
Risk reduction and truthfulness of some de-identification techniques and
models
Singling out Linking Inference
Deterministic
encryption
Yes All attributes No Partially No
Order-preserving
encryption
Yes All attributes No Partially No
Homomorphic
encryption
Yes All attributes No No No
Masking Yes Local identifiers Yes Partially No
Local suppression Yes Identifying attributes Partially Partially Partially
Record suppression Yes
Sampling Yes N/A Partially Partially Partially
Pseudonymization Yes Direct identifiers No Partially No
Generalization Yes Identifying attributes
Rounding Yes Identifying attributes No Partially Partially
Top/bottom coding Yes Identifying attributes No Partially Partially
Noise addition No Identifying attributes Partially Partially Partially
Cryptographic tools
Suppression
Generalization
Technique name
Data
truthfulness at
record level
Applicable to types of
attributes
Reduces the risk of
Source: INTERNATIONAL STANDARD ISO/IEC 20889
56. 56
Type of
Data
Use
Case
I
Structured
How Should I Secure Different Types of Data?
I
Un-structured
Simple –
Complex –
PCI
PHI
PII
Encryption
of Files
Card
Holder
Data
Tokenization
of Fields
Protected
Health
Information
Personally Identifiable Information
57. 57
On Premise tokenization
• Limited PCI DSS scope reduction - must
still maintain a CDE with PCI data
• Higher risk – sensitive data still resident
in environment
• Associated personnel and hardware costs
Cloud-Based tokenization
• Significant reduction in PCI DSS scope
• Reduced risk – sensitive data removed
from the environment
• Platform-focused security
• Lower associated costs – cyber
insurance, PCI audit, maintenance
Total Cost and Risk of Tokenization
Example: 50% Lower Total Cost
58. 58
Cloud transformations are accelerating
Risk
Elasticity
Out-sourcedIn-house
On-premises
system
On-premises Private
Cloud
Hosted Private Cloud
Public Cloud
Low -
High -
Compute Cost
- High
- Low
Risk Adjusted Computation
59. 59
Which of the following most closely describes what ‘hybrid cloud’ means in your
organization?
Source: Forrester
60. 60
For each of the
following data
center and IT
infrastructure
components, how
much outsourcing
and managed
services does your
firm use for IT
operation?
(excluding systems
integrators for
project
implementation)
Source: Forrester