When you received your Uber ‘Tuesday Evening Ride Receipt’ or Spotify’s ‘This Week’s New Music’ email, did you think about how they got there?
SendGrid’s reliable email platform delivers each month over 20 Billion transactional and marketing emails on behalf of many of your favorite brands, including Uber, Airbnb, Spotify, Foursquare and NextDoor.
SendGrid was looking to evolve its data warehouse architecture in order to improve decision making and optimize customer experience. They needed a scalable and reliable architecture that would allow them to move nimbly and efficiently with a relatively small IT organization, while supporting the needs of both business and technical users at SendGrid.
SendGrid’s Director of Enterprise Data Operations will be joining architects from Amazon Web Services (AWS) and Informatica to discuss SendGrid’s journey to a hybrid cloud architecture and how a hybrid data warehousing solution is optimized to support SendGrid’s analytics initiative. Speakers will also review common technologies and use cases being deployed in hybrid cloud today, common data management challenges in hybrid cloud and best practices for addressing these challenges.
Join us to learn:
• How to evolve to a hybrid data warehouse with Amazon Redshift for scalability, agility and cost efficiency with minimal IT resources
• Hybrid cloud data management use cases
• Best practices for addressing hybrid cloud data management challenges
SendGrid Improves Email Delivery with Hybrid Data Warehousing
1. Put your data to work with Big Data
services from AWS and Informatica
2. Data is growing
of new data will be
created every second
for every human being
on the planet by 2020
http://www.whizpr.be/upload/medialab/21/c
ompany/Media_Presentation_2012_DigiUn
iverseFINAL1.pdf
1.7MB
compound annual
growth rate of 58%
surpassing $1 billion by
2020 forecasted for the
Hadoop market
http://www.ap-institute.com/big-data-
articles/big-data-what-is-hadoop-
%E2%80%93-an-explanation-for-
absolutely-anyone.aspx
http://www.marketanalysis.com/?p=279
58%
of all data is ever
analyzed and used at
the moment
http://www.technologyreview.com/news/51
4346/the-data-made-me-do-it/
0.5%<
3. Big Data is for everyone
The market for Big Data technologies is growing more than six times faster than the
information technology market as a whole….
…and those companies who use their data well win.
4. Why AWS for Big Data?
Immediately
Available
Broad and Deep
Capabilities
Trusted and
Secure
Scalable
5. Collect, Store, Analyze, and Visualize
It’s easy to get data to AWS, store it securely, and analyze it with the engine of your choice,
without any long-term commitment or vendor lock-in
Collect
Import/Export
Snowball
Direct Connect
VM Import/Export
Store
Amazon S3
EMR
Amazon Glacier
Amazon Redshift
DynamoDB
Analyze
Amazon Kinesis
Lambda
EMR
EC2
Aurora
6. AWS provides the most complete
platform for Big Data
What can you do with Big Data on AWS?
Big Data Repositories Clickstream Analysis ETL Offload
Machine Learning Online Ad Serving BI Applications
7. The Amazon Redshift view of data warehousing
10x cheaper
Easy to provision
Higher DBA productivity
10x faster
No programming
Easily leverage BI tools,
Hadoop, Machine Learning,
Streaming
Analysis in-line with process
flows
Pay as you go, grow as you
need
Managed availability & DR
Enterprise Big Data SaaS
8. The cloud can be made more secure than on-premises
High speed redundant direct connect lines
Load billions of rows in minutes
All data in private VPC
All data encrypted with private on-premises hardware keys
Encryption of data, transport, backups, partial spills
Audit of all SQL actions
Audit of all configuration changes
9. Data warehouses can support real-time data
Big data does not mean batch
Can be streamed in
Can be processed in near real time
Can be used to respond quickly to requests
You can mix and match
on-premises and cloud
Custom development and managed services
Infrastructure with managed scaling, security
10. Hybrid Cloud Data Management
with AWS and Informatica
Presented by Andrew McIntyre
11. Agenda
The IT Landscape and how it is changing
IT challenges with Hybrid Cloud Architecture
Customer success story with SendGrid
How Informatica can help customer migrate to Hybrid Architecture
Why choose Informatica?
13. Why Enterprises are Adopting Cloud Architecture
Business agility requires IT agility
Cloud economics pay off in a big way
Focus on core competencies & unique value
14. Hybrid Cloud is Common Approach
ERP & On-Premises AppsTraditional Relational
Databases
Traditional Data Warehouse
Amazon
Redshift
+
15. Defining Hybrid Cloud Data Management
Integrate, Cleanse, Govern, Master, Secure
^Integrating data from:
On-premises databases, data warehouses, apps
with SaaS applications
With Public cloud: AWS
16. Data Management Challenges in
Hybrid Cloud Architecture
Connectivity
Many Data Systems: Cloud & On-Prem
Reuse work across systems
Secure connection
Data Visibility
Complex data flows-less comprehension
Quality, Governance, security, regulation,
audits, mastering
Scalability
Support large data volume
Match infinite capacity in cloud platform
Operational Control
Monitor & Manage data in production
Ensure operational success
Monitor end to end business process
17. Informatica + AWS Use Cases
Lift and Shift: Moving on-premises databases,
systems and/or DW to AWS-based workloads
Hybrid App Integration: Integrate on-premises
and cloud apps with Informatica Cloud. Also known
as iPaaS (integration Platform-as-a-Service)
Hybrid Data Warehousing: Load multiple data
sources from cloud and/or on premise to AWS
using Informatica Cloud
+
18. Lift and Shift your Workloads
Cloud
On premise
Use Case Summary:
Moving on-premises databases,
systems and/or data warehouse
to AWS-based workloads
Amazon Redshift
On-premises
Data Warehouse
Other Databases Your Data
Integration Platform
Firewall
Amazon RDS
Amazon Aurora
19. Hybrid App Integration
Use Case Summary:
Integrate on-premises and cloud
apps with Informatica Cloud. Also
known as iPaaS (integration
Platform-as-a-Service)
Cloud
On
premise
Data
Warehous
e on-premises Apps
Firewall
Amazon RDS Amazon Redshift
Your Data
Integration
Platform
on-premises
Data
Warehouse
Other Databases
20. Hybrid Data Warehousing
Use Case Summary:
Load multiple data sources from
cloud and/or on premise to AWS
using Informatica Cloud
On-premises
Data Warehouse
Your Data
Integration Platform
ERP, on-premises
Apps
Traditional Relational Databases
Social
Media
Logs IoT
Analytics
Tools
Cloud
On
premise
Firewall
Amazon RDS Amazon Redshift
21. Informatica Cloud for Amazon Web Services
Amazon DynamoDB
Amazon EMR
Amazon S3
Amazon Redshift
Amazon Aurora
Amazon RDS
Informatica Cloud provides native connectivity to Amazon Web
Services for scalable, high-performance integration with any cloud
and on-premises data source.
22. Informatica Cloud and Amazon Redshift
Seamless integration with any data system on cloud and on-prem
Native, high performance data integration and synchronization
The only solution to provide “Upsert” functionality
Step by step integration wizards for non-technical users
Advanced point and click integration workflows for technical users
24. SendGrid: Company Background
Founded in 2009, after graduating from the TechStars program, SendGrid
developed an industry-disrupting, cloud-based email service to solve the
challenges of reliably delivering emails on behalf of growing companies.
Like many great solutions, SendGrid was born from the frustration of three
engineers whose application emails didn’t get delivered, so they built an
app for email deliverability. Today, SendGrid’s reliable email platform
delivers each month over 25 billion transactional and marketing emails on
behalf of many of your favorite brands, including Uber, Airbnb, Spotify,
Foursquare and NextDoor.
25. Business and Technical Requirements
Emphasis for the architecture was speed over
accuracy, sustainability and growth.
As a result, the architecture was already hitting the
limitations of its design.
Architecture Issues
Prior to my joining the company, SendGrid had already
committed to using MySQL for a new data warehouse build.
The SendGrid Data Warehouse architecture that was
underway did not follow a formal data warehousing
methodology. It was built specifically to support the BI tool
and it’s features and limitations.
This resulted in an architecture that does not follow many of
the industry standard Data Warehousing best practices.
26. Business and Technical Requirements
Our small team is responsible for the strategic direction, design, delivery and availability of business data for corporate-
wide utilization in measuring performance, business outcomes and decision making capabilities. Data and analytics
need to be provided in various ways and formats through effective and efficient delivery methods.
To accomplish this, the team was tasked with building a new data warehouse. We planned to start on our main data
source, which houses our email event and customer information.
Business Needs for Data & Analytics
Director, Enterprise Data Operations
Data Warehouse Architect / ETL Developer
BI Developer
Business Systems Analyst
Meet the Team:
27. Technical Requirements
Evaluate the overall data warehouse architecture
and suggest required changes and improvements to:
Database technology, design and work products
ETL tool, design and work products
Data Warehouse Assessment Needed
Database Technology:
Nimble
Cost effective
Meets storage and capacity needs
Allows the team to be self-sufficient without reliance on
other team’s skill-sets
ETL Tool:
Mature ETL tool to leverage for data warehousing
Technical Requirements
29. Data Warehouse Assessment
The Findings: Overview
Confirmed assumption that utilizing MySQL was not sustainable as a
database technology
Switch to a technology that better aligns with a data warehouse
infrastructure: Amazon Redshift selected
Mature ETL tool is needed for data warehousing while providing a user-
friendly tool for business communities
Informatica selected to load data into Amazon Redshift from multiple data
sources, cloud, and on-prem while supporting citizen integrators.
30. Data Warehouse and Analytics ConceptualArchitecture
Marketo
SalesForce
Zuora
Mail db
Raw Data
Acquisition
Layer
Core Layer
Mapping
Schemas
Data Sources
Data Mining,
Benchmark
Data
Enterprise Data Warehouse (Amazon Redshift)
Time
RevenueCustomer
SalesForce
Product Volume Usage
Product
Usage
Segment*
Jira
Hadoop Cluster
Analytics Tools
Reporting/
Analytics,
Dashboards,
Export Data
Publishing Layer
Clean
Data/Metadata Dimensional Data
Zendesk
Test and
Learn
Campaigns
ETL
Informatica
Cloudon-premises
ETL
Informatica
ETL
Informatica
Or
31. Project Outlook
The project is still in the early stages of the data
warehouse build.
The project is in the early stages of the data warehouse build. We have set up our Amazon
Redshift instance for the data warehouse and have started sourcing data from six sources,
a mix of both cloud and on-premises.
We are actively using Informatica data integration portfolio in a hybrid architecture to
support ETL integration.
By the end of 2016, we will have enough data from multiple data sources in the Amazon
Redshift data warehouse and our BI tool, allowing us to roll-out self service analytics with a
foundational view of customer, product, revenue, and email volume and usage data.
We are confident that with this approach we have set ourselves up for success in a nimble,
scalable, cost effective manner to rapidly enable business driven insights for SendGrid!
33. Connectivity
High-performance out-of-the-box native
connectors to any data system
Abstraction layer enables reuse
Secure
Data Visibility
Metadata-driven visual design: visibility
into data flows cross cloud and on-prem
Metadata: the foundation of quality,
governance, security, mastering
Scalability
Inherently designed for performance at scale
iPaaS offers infinite integration capacity and
bursting
Operational Control
Single point of control for production data
across cloud and on-prem
Admin can monitor production data flows
and flag issues early
Informatica Addresses Data Management Challenges
34. Hundreds of Connectors For Every Type of Data Source
Sales & ServiceBig Data
Human Resources
Web Protocols & API
ERP & Financials
B2B
Marketing
Social
IT & Admin
Analytics
35. Informatica’s 3 Key Differentiators
The project is still in the early stages of the data
warehouse build.
Unlock your data
1 2
Scale with
Performance
UI maximizes productivity
for developers
& citizen integrators
Visual data mapping
Out of box templates & wizards
Easy to use & highly reusable
3
Hundreds of out-of-box connectors
for cloud and on-prem data
sources
Optimized to process the
largest data volumes
Pushdown Optimization
Automated
CONNECT DEVELOP DEPLOY
36. Informatica Product Portfolio for Hybrid Cloud Management
The project is still in the early stages of the data
warehouse build.
Cloud
Test Data
Management
Cloud
Application
Integration
Cloud
Data Integration
Data
as a
Service
Cloud
Customer 360
37. Amazon Redshift Upsert – Manual Coding Method
1. Extract the data from source
2. Put into flat files and compress
3. Transfer Compressed Files To
S3
4. Wait for S3 Consistency
5. Create Staging Table in
Redshift
6. Copy Data From S3 Into
Staging Table
7. Inner Join With Target Table To
Delete Rows To Be Updated
8. Insert Updated Rows From
Staging Table
9. Delete Staging Table
10. Delete Files From S3
Or, Do It In 3 Simple Steps…
38. Amazon Redshift Upsert – Informatica Cloud Method
1. Choose Upsert Operation
2. Map Your Fields
3. Run Or Schedule!
39. Informatica Cloud Amazon Redshift Architecture
Informatica Cloud
Secure Agent
Metadata Mappings
Build mapping and execute job
1
1
Retrieve Account Data2
2
3 Put Account Data into Flat File(s)
4 Transfer compressed Flat File(s) to S3
5 Initiate copy from S3
6 Load data into Amazon Redshift
6
3
5
4
Firewall
40. iPaaS customers
4,500
OEMs with over 1,000
customers
70+
Transactions per month
130% growth yoY
300B
Integration jobs /
processes per day
1M
<
43. Getting Started – Amazon Web Services
www.informatica.com/products/cloud-integration/connectivity/amazon-connectors.html
4 hour Trial of Specific Use Cases
60 Day Trial of All Functionality
www.informatica.com/products/cloud-integration/connectivity/amazon-connectors/amazon-test-drive.html
Informatica.com Amazon Marketplace