SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Downloaden Sie, um offline zu lesen
©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved
Building Your Data Warehouse with
Amazon Redshift
Vidhya Srinivasan, AWS (vid@amazon.com)
Guest Speaker: Justin Cunningham, Yelp (s)
Data Warehouse - Challenges
Cost
Complexity
Performance
Rigidity
Petabyte scale; massively parallel
Relational data warehouse
Fully managed; zero admin
SSD & HDD platforms
As low as $1,000/TB/Year
Amazon
Redshift
Clickstream Analytics for Amazon.com
• Web log analysis for Amazon.com
– Over one petabyte workload
– Largest table: 400TB
– 2TB of data per day
• Understand customer behavior
– Who is browsing but not buying
– Which products / features are winners
– What sequence led to higher customer conversion
• Solution
– Best scale out solution – query across 1 week
– Hadoop – query across 1 month
Using Amazon Redshift
• Performance
– Scan 2.25 trillion rows of data: 14 minutes
– Load 5 billion rows data: 10 minutes
– Backfill 150 billion rows of data: 9.75 hours
– Pig  Amazon Redshift: 2 days to 1 hr
• 10B row join with 700 M rows
– Oracle  Amazon Redshift: 90 hours to 8 hrs
• Reduced number of SQLs by a factor of 3
• Cost
– 1.6 PB cluster
– 100 node dw1.8xl (3-yr RI)
– $180/hr
• Complexity
– 20% time of one DBA
• Backup
• Restore
• Resizing
Who uses Amazon Redshift?
Common Customer Use Cases
• Reduce costs by
extending DW rather than
adding HW
• Migrate completely from
existing DW systems
• Respond faster to
business
• Improve performance by
an order of magnitude
• Make more data
available for analysis
• Access business data via
standard reporting tools
• Add analytic functionality
to applications
• Scale DW capacity as
demand grows
• Reduce HW & SW costs
by an order of magnitude
Traditional Enterprise DW Companies with Big Data SaaS Companies
Selected Amazon Redshift Customers
Amazon Redshift Partners
Amazon Redshift Architecture
• Leader Node
– SQL endpoint
– Stores metadata
– Coordinates query execution
• Compute Nodes
– Local, columnar storage
– Execute queries in parallel
– Load, backup, restore via
Amazon S3; load from
Amazon DynamoDB or SSH
• Two hardware platforms
– Optimized for data processing
– DW1: HDD; scale from 2TB to 2PB
– DW2: SSD; scale from 160GB to 326TB
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
analyze compression listing;
Table | Column | Encoding
---------+----------------+----------
listing | listid | delta
listing | sellerid | delta32k
listing | eventid | delta32k
listing | dateid | bytedict
listing | numtickets | bytedict
listing | priceperticket | delta32k
listing | totalprice | mostly32
listing | listtime | raw
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Direct-attached storage
• Large data block sizes
• Track of the minimum and
maximum value for each block
• Skip over blocks that don’t
contain the data needed for a
given query
• Minimize unnecessary I/O
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
• Use direct-attached storage
to maximize throughput
• Hardware optimized for high
performance data
processing
• Large block sizes to make the
most of each read
• Amazon Redshift manages
durability for you
Amazon Redshift has security built-in
• SSL to secure data in transit
• Encryption to secure data at rest
– AES-256; hardware accelerated
– All blocks on disks and in Amazon S3 encrypted
– HSM Support
• No direct access to compute nodes
• Audit logging & AWS CloudTrail integration
• Amazon VPC support
• SOC 1/2/3, PCI-DSS Level 1, FedRAMP, others
10 GigE
(HPC)
Ingestion
Backup
Restore
Customer VPC
Internal
VPC
JDBC/ODBC
Amazon Redshift is 1/10th the Price of a Traditional Data Warehouse
DW1 (HDD)
Price Per Hour for
DW1.XL Single Node
Effective Annual
Price per TB
On-Demand $ 0.850 $ 3,723
1 Year Reserved Instance $ 0.215 $ 2,192
3 Year Reserved Instance $ 0.114 $ 999
DW2 (SSD)
Price Per Hour for
DW2.L Single Node
Effective Annual
Price per TB
On-Demand $ 0.250 $ 13,688
1 Year Reserved Instance $ 0.075 $ 8,794
3 Year Reserved Instance $ 0.050 $ 5,498
Expanding Amazon Redshift’s
Functionality
Custom ODBC and JDBC Drivers
• Up to 35% higher performance than open source drivers
• Supported by Informatica, Microstrategy, Pentaho, Qlik, SAS, Tableau
• Will continue to support PostgreSQL open source drivers
• Download drivers from console
Explain Plan Visualization
User Defined Functions
• We’re enabling User Defined Functions (UDFs) so
you can add your own
– Scalar and Aggregate Functions supported
• You’ll be able to write UDFs using Python 2.7
– Syntax is largely identical to PostgreSQL UDF Syntax
– System and network calls within UDFs are prohibited
• Comes with Pandas, NumPy, and SciPy pre-
installed
– You’ll also be able import your own libraries for even more
flexibility
Scalar UDF example – URL parsing
CREATE FUNCTION f_hostname (VARCHAR url)
RETURNS varchar
IMMUTABLE AS $$
import urlparse
return urlparse.urlparse(url).hostname
$$ LANGUAGE plpythonu;
Interleaved Multi Column Sort
• Currently support Compound Sort Keys
– Optimized for applications that filter data by one
leading column
• Adding support for Interleaved Sort Keys
– Optimized for filtering data by up to eight columns
– No storage overhead unlike an index
– Lower maintenance penalty compared to indexes
Compound Sort Keys Illustrated
• Records in Redshift
are stored in blocks.
• For this illustration,
let’s assume that four
records fill a block
• Records with a given
cust_id are all in one
block
• However, records
with a given prod_id
are spread across
four blocks
1
1
1
1
2
3
4
1
4
4
4
2
3
4
4
1
3
3
3
2
3
4
3
1
2
2
2
2
3
4
2
1
1 [1,1] [1,2] [1,3] [1,4]
2 [2,1] [2,2] [2,3] [2,4]
3 [3,1] [3,2] [3,3] [3,4]
4 [4,1] [4,2] [4,3] [4,4]
1 2 3 4
prod_id
cust_id
cust_id prod_id other columns blocks
1 [1,1] [1,2] [1,3] [1,4]
2 [2,1] [2,2] [2,3] [2,4]
3 [3,1] [3,2] [3,3] [3,4]
4 [4,1] [4,2] [4,3] [4,4]
1 2 3 4
prod_id
cust_id
Interleaved Sort Keys Illustrated
• Records with a given
cust_id are spread
across two blocks
• Records with a given
prod_id are also
spread across two
blocks
• Data is sorted in
equal measures for
both keys
1
1
2
2
2
1
2
3
3
4
4
4
3
4
3
1
3
4
4
2
1
2
3
3
1
2
2
4
3
4
1
1
cust_id prod_id other columns blocks
How to use the feature
• New keyword ‘INTERLEAVED’ when defining sort keys
– Existing syntax will still work and behavior is unchanged
– You can choose up to 8 columns to include and can query with any or
all of them
• No change needed to queries
• Benefits are significant
[ SORTKEY [ COMPOUND | INTERLEAVED ] ( column_name [, ...] ) ]
Amazon Redshift
Spend time with your data, not your database….
• Cost
• Performance
• Simplicity
• Use Cases
©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved
Using Redshift at
Justin Cunningham
Technical Lead – Business Analytics and Metrics
justinc@
Evolved Data Infrastructure
Scribe S3
MySQL
EMR
with
MRJob
Python
Batches
Evolved Data Infrastructure
Scribe S3
MySQL
EMR
with
MRJob
Python
Batches
S3
python my_job.py -r emr s3://my-inputs/input.txt
EMR
Cluster
EMR
Cluster
EMR
Cluster
…
EMR
Cluster
EMR
Cluster
EMR
Cluster
…
Data
Warehouse
Cluster
Team
Cluster
Team
Cluster
Team
Cluster
Analysis
Cluster
Analysis
Cluster
Analysis
Cluster
Who Owns Clusters?
• Every Data Team – Front-end and Back-end Too
• Why so many?
– Decouples Development
– Decouples Scaling
– Limits Contention Issues
Data Loading Patterns - EMR
Scribe S3 Redshift
EMR
with
MRJob
github.com/Yelp/mrjob
Mycroft - Specialized EMR
S3 Redshift
EMR
with
MRJob
Mycroft
github.com/Yelp/mycroft
Mycroft - Specialized EMR
github.com/Yelp/mycroft
Kafka and Storm
S3 Redshift
Data
Loader
Worker
Kafka Storm
Kafka
github.com/Yelp/pyleus
Data Loading Best Practices
• Batch Updates
• Use Manifest Files
• Make Operations Idempotent
• Design for Autorecovery
Support Multiple Clusters
S3
Redshift
Data
Loader
Worker
Storm
Kafka
Redshift
Redshift
Data
Loader
Worker
ETL -> ELT
S3 RedshiftKafka StormProducer
Time Series Data – Vacuum Operation
Sorted
Sorted
Sorted
Unsorted
Region
Sorted
Region
Sorted
Sorted
Sorted
Append in Sort Key Order
Sort Unsorted
Region
Merge
Distkeys
Node
Node
Node
Node
Node
Node
business
id
name
…
business_image
id
business_id
url
…
Take Advantage of Elasticity
"The BI team wanted to calculate some expensive
analytics on a few years of data, so we just
restored a snapshot and added a bunch of nodes
for a few days"
Monitoring
Querying: Use Window Functions
More Information: http://bit.ly/1FeqDp1
SELECT AVG(event_count) OVER (
ORDER BY event_timestamp ROWS 2 PRECEDING
) AS average_count, event_count, event_timestamp
FROM events_per_second ORDER BY event_timestamp;
average_count event_count event_timestamp
50 50 1427315395
53 57 1427315396
65 88 1427315397
53 14 1427315398
58 72 1427315399
Open-Source Tools
• github.com/Yelp/mycroft
– Redshift Data Loading Orchestrator
• github.com/Yelp/mrjob
– EMR in Python
• github.com/Yelp/pyleus
– Storm Topologies in Python
SAN FRANCISCO
SAN FRANCISCO
©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

Weitere ähnliche Inhalte

Was ist angesagt?

The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineAmazon Web Services
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation Brett VanderPlaats
 
Redshift performance tuning
Redshift performance tuningRedshift performance tuning
Redshift performance tuningCarlos del Cacho
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
 
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJAEvaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJADataWorks Summit
 
Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...
Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...
Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...Amazon Web Services
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseSnowflake Computing
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2inovex GmbH
 
Snowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetSnowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetJeno Yamma
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon RedshiftAmazon Web Services
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Vinoth Chandar
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftAmazon Web Services
 
(STG401) Amazon S3 Deep Dive & Best Practices
(STG401) Amazon S3 Deep Dive & Best Practices(STG401) Amazon S3 Deep Dive & Best Practices
(STG401) Amazon S3 Deep Dive & Best PracticesAmazon Web Services
 

Was ist angesagt? (20)

The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
AWS RDS
AWS RDSAWS RDS
AWS RDS
 
Redshift performance tuning
Redshift performance tuningRedshift performance tuning
Redshift performance tuning
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
 
data warehouse vs data lake
data warehouse vs data lakedata warehouse vs data lake
data warehouse vs data lake
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
 
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJAEvaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
 
Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...
Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...
Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
 
Snowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetSnowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat Sheet
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
 
Intro to AWS: Database Services
Intro to AWS: Database ServicesIntro to AWS: Database Services
Intro to AWS: Database Services
 
(STG401) Amazon S3 Deep Dive & Best Practices
(STG401) Amazon S3 Deep Dive & Best Practices(STG401) Amazon S3 Deep Dive & Best Practices
(STG401) Amazon S3 Deep Dive & Best Practices
 

Ähnlich wie Building Your Data Warehouse with Amazon Redshift

Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseAmazon Web Services
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseAmazon Web Services
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Introduction to Database Services
Introduction to Database ServicesIntroduction to Database Services
Introduction to Database ServicesAmazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftAmazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Amazon Web Services
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAmazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Amazon Web Services
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftAmazon Web Services
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksAmazon Web Services
 
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介Amazon Web Services Japan
 
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 

Ähnlich wie Building Your Data Warehouse with Amazon Redshift (20)

Redshift overview
Redshift overviewRedshift overview
Redshift overview
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data Warehouse
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data Warehouse
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features
 
AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Introduction to Database Services
Introduction to Database ServicesIntroduction to Database Services
Introduction to Database Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...
 
Processing and Analytics
Processing and AnalyticsProcessing and Analytics
Processing and Analytics
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
 
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
 
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 

Mehr von Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Kürzlich hochgeladen

Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 

Kürzlich hochgeladen (20)

Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 

Building Your Data Warehouse with Amazon Redshift

  • 1. ©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Building Your Data Warehouse with Amazon Redshift Vidhya Srinivasan, AWS (vid@amazon.com) Guest Speaker: Justin Cunningham, Yelp (s)
  • 2. Data Warehouse - Challenges Cost Complexity Performance Rigidity
  • 3. Petabyte scale; massively parallel Relational data warehouse Fully managed; zero admin SSD & HDD platforms As low as $1,000/TB/Year Amazon Redshift
  • 4. Clickstream Analytics for Amazon.com • Web log analysis for Amazon.com – Over one petabyte workload – Largest table: 400TB – 2TB of data per day • Understand customer behavior – Who is browsing but not buying – Which products / features are winners – What sequence led to higher customer conversion • Solution – Best scale out solution – query across 1 week – Hadoop – query across 1 month
  • 5. Using Amazon Redshift • Performance – Scan 2.25 trillion rows of data: 14 minutes – Load 5 billion rows data: 10 minutes – Backfill 150 billion rows of data: 9.75 hours – Pig  Amazon Redshift: 2 days to 1 hr • 10B row join with 700 M rows – Oracle  Amazon Redshift: 90 hours to 8 hrs • Reduced number of SQLs by a factor of 3 • Cost – 1.6 PB cluster – 100 node dw1.8xl (3-yr RI) – $180/hr • Complexity – 20% time of one DBA • Backup • Restore • Resizing
  • 6. Who uses Amazon Redshift?
  • 7. Common Customer Use Cases • Reduce costs by extending DW rather than adding HW • Migrate completely from existing DW systems • Respond faster to business • Improve performance by an order of magnitude • Make more data available for analysis • Access business data via standard reporting tools • Add analytic functionality to applications • Scale DW capacity as demand grows • Reduce HW & SW costs by an order of magnitude Traditional Enterprise DW Companies with Big Data SaaS Companies
  • 10. Amazon Redshift Architecture • Leader Node – SQL endpoint – Stores metadata – Coordinates query execution • Compute Nodes – Local, columnar storage – Execute queries in parallel – Load, backup, restore via Amazon S3; load from Amazon DynamoDB or SSH • Two hardware platforms – Optimized for data processing – DW1: HDD; scale from 2TB to 2PB – DW2: SSD; scale from 160GB to 326TB 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
  • 11. Amazon Redshift dramatically reduces I/O • Data compression • Zone maps • Direct-attached storage • Large data block sizes ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375
  • 12. Amazon Redshift dramatically reduces I/O • Data compression • Zone maps • Direct-attached storage • Large data block sizes ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375
  • 13. Amazon Redshift dramatically reduces I/O • Column storage • Data compression • Zone maps • Direct-attached storage • Large data block sizes analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw
  • 14. Amazon Redshift dramatically reduces I/O • Column storage • Data compression • Direct-attached storage • Large data block sizes • Track of the minimum and maximum value for each block • Skip over blocks that don’t contain the data needed for a given query • Minimize unnecessary I/O
  • 15. Amazon Redshift dramatically reduces I/O • Column storage • Data compression • Zone maps • Direct-attached storage • Large data block sizes • Use direct-attached storage to maximize throughput • Hardware optimized for high performance data processing • Large block sizes to make the most of each read • Amazon Redshift manages durability for you
  • 16. Amazon Redshift has security built-in • SSL to secure data in transit • Encryption to secure data at rest – AES-256; hardware accelerated – All blocks on disks and in Amazon S3 encrypted – HSM Support • No direct access to compute nodes • Audit logging & AWS CloudTrail integration • Amazon VPC support • SOC 1/2/3, PCI-DSS Level 1, FedRAMP, others 10 GigE (HPC) Ingestion Backup Restore Customer VPC Internal VPC JDBC/ODBC
  • 17. Amazon Redshift is 1/10th the Price of a Traditional Data Warehouse DW1 (HDD) Price Per Hour for DW1.XL Single Node Effective Annual Price per TB On-Demand $ 0.850 $ 3,723 1 Year Reserved Instance $ 0.215 $ 2,192 3 Year Reserved Instance $ 0.114 $ 999 DW2 (SSD) Price Per Hour for DW2.L Single Node Effective Annual Price per TB On-Demand $ 0.250 $ 13,688 1 Year Reserved Instance $ 0.075 $ 8,794 3 Year Reserved Instance $ 0.050 $ 5,498
  • 19. Custom ODBC and JDBC Drivers • Up to 35% higher performance than open source drivers • Supported by Informatica, Microstrategy, Pentaho, Qlik, SAS, Tableau • Will continue to support PostgreSQL open source drivers • Download drivers from console
  • 21. User Defined Functions • We’re enabling User Defined Functions (UDFs) so you can add your own – Scalar and Aggregate Functions supported • You’ll be able to write UDFs using Python 2.7 – Syntax is largely identical to PostgreSQL UDF Syntax – System and network calls within UDFs are prohibited • Comes with Pandas, NumPy, and SciPy pre- installed – You’ll also be able import your own libraries for even more flexibility
  • 22. Scalar UDF example – URL parsing CREATE FUNCTION f_hostname (VARCHAR url) RETURNS varchar IMMUTABLE AS $$ import urlparse return urlparse.urlparse(url).hostname $$ LANGUAGE plpythonu;
  • 23. Interleaved Multi Column Sort • Currently support Compound Sort Keys – Optimized for applications that filter data by one leading column • Adding support for Interleaved Sort Keys – Optimized for filtering data by up to eight columns – No storage overhead unlike an index – Lower maintenance penalty compared to indexes
  • 24. Compound Sort Keys Illustrated • Records in Redshift are stored in blocks. • For this illustration, let’s assume that four records fill a block • Records with a given cust_id are all in one block • However, records with a given prod_id are spread across four blocks 1 1 1 1 2 3 4 1 4 4 4 2 3 4 4 1 3 3 3 2 3 4 3 1 2 2 2 2 3 4 2 1 1 [1,1] [1,2] [1,3] [1,4] 2 [2,1] [2,2] [2,3] [2,4] 3 [3,1] [3,2] [3,3] [3,4] 4 [4,1] [4,2] [4,3] [4,4] 1 2 3 4 prod_id cust_id cust_id prod_id other columns blocks
  • 25. 1 [1,1] [1,2] [1,3] [1,4] 2 [2,1] [2,2] [2,3] [2,4] 3 [3,1] [3,2] [3,3] [3,4] 4 [4,1] [4,2] [4,3] [4,4] 1 2 3 4 prod_id cust_id Interleaved Sort Keys Illustrated • Records with a given cust_id are spread across two blocks • Records with a given prod_id are also spread across two blocks • Data is sorted in equal measures for both keys 1 1 2 2 2 1 2 3 3 4 4 4 3 4 3 1 3 4 4 2 1 2 3 3 1 2 2 4 3 4 1 1 cust_id prod_id other columns blocks
  • 26. How to use the feature • New keyword ‘INTERLEAVED’ when defining sort keys – Existing syntax will still work and behavior is unchanged – You can choose up to 8 columns to include and can query with any or all of them • No change needed to queries • Benefits are significant [ SORTKEY [ COMPOUND | INTERLEAVED ] ( column_name [, ...] ) ]
  • 27. Amazon Redshift Spend time with your data, not your database…. • Cost • Performance • Simplicity • Use Cases
  • 28. ©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Using Redshift at Justin Cunningham Technical Lead – Business Analytics and Metrics justinc@
  • 29.
  • 30. Evolved Data Infrastructure Scribe S3 MySQL EMR with MRJob Python Batches
  • 31. Evolved Data Infrastructure Scribe S3 MySQL EMR with MRJob Python Batches
  • 32. S3 python my_job.py -r emr s3://my-inputs/input.txt EMR Cluster EMR Cluster EMR Cluster … EMR Cluster EMR Cluster EMR Cluster …
  • 34. Who Owns Clusters? • Every Data Team – Front-end and Back-end Too • Why so many? – Decouples Development – Decouples Scaling – Limits Contention Issues
  • 35.
  • 36.
  • 37. Data Loading Patterns - EMR Scribe S3 Redshift EMR with MRJob github.com/Yelp/mrjob
  • 38. Mycroft - Specialized EMR S3 Redshift EMR with MRJob Mycroft github.com/Yelp/mycroft
  • 39. Mycroft - Specialized EMR github.com/Yelp/mycroft
  • 40. Kafka and Storm S3 Redshift Data Loader Worker Kafka Storm Kafka github.com/Yelp/pyleus
  • 41. Data Loading Best Practices • Batch Updates • Use Manifest Files • Make Operations Idempotent • Design for Autorecovery
  • 43. ETL -> ELT S3 RedshiftKafka StormProducer
  • 44. Time Series Data – Vacuum Operation Sorted Sorted Sorted Unsorted Region Sorted Region Sorted Sorted Sorted Append in Sort Key Order Sort Unsorted Region Merge
  • 46. Take Advantage of Elasticity "The BI team wanted to calculate some expensive analytics on a few years of data, so we just restored a snapshot and added a bunch of nodes for a few days"
  • 48. Querying: Use Window Functions More Information: http://bit.ly/1FeqDp1 SELECT AVG(event_count) OVER ( ORDER BY event_timestamp ROWS 2 PRECEDING ) AS average_count, event_count, event_timestamp FROM events_per_second ORDER BY event_timestamp; average_count event_count event_timestamp 50 50 1427315395 53 57 1427315396 65 88 1427315397 53 14 1427315398 58 72 1427315399
  • 49. Open-Source Tools • github.com/Yelp/mycroft – Redshift Data Loading Orchestrator • github.com/Yelp/mrjob – EMR in Python • github.com/Yelp/pyleus – Storm Topologies in Python
  • 51. SAN FRANCISCO ©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved