Slides from the Cloudyna event in Katowice, Poland on November 14th, 2015. Data analysis is being used to transform businesses, increase efficiency, and drive innovation. The AWS Cloud has a comprehensive portfolio of analytics services to help you process data of any volume and automate how you put that data to work for your organization. In this session we'll see how to put those services at work on structured, unstructured and real-time data.
22. Amazon
Elastic MapReduce
(Amazon EMR) Managed clusters
For Hadoop, Spark, Presto
or any other applications
in the Apache / Hadoop stackWhat is
Amazon EMR?
23. Amazon
Elastic MapReduce
(Amazon EMR)
Overview of
Amazon EMR
Architecture
Storage
HDFS EMRFS
Local
File System
Data Processing Frameworks
Hadoop Spark …
Applications and Programs
Hive Pig …
ClusterResourceManagement
YARNAgent…
25. Separate Compute and Storage
Resize and shut down
Amazon EMR clusters with no data loss
Point multiple Amazon EMR clusters
at the same data in Amazon S3
Easily evolve your analytic infrastructure
as technology evolves
Leverage
Amazon S3 with
EMR File System
(EMRFS)
S3 Bucket
Cluster
EMR Cluster
Cluster
EMR Cluster
Amazon
Elastic MapReduce
(Amazon EMR)
26. Read-after-write consistency
Very fast list operations
(thanks to Amazon DynamoDB)
Transparent to applications as s3://…
S3 Bucket
Cluster
EMR Cluster
DynamoDB Table
Amazon
Elastic MapReduce
(Amazon EMR)
EMRFS
makes it easier
to use Amazon S3
34. A managed service that makes it easy
to deploy, operate, and scale Elasticsearch
in the AWS Cloud
High availability, patch management, failure detection
and node replacement, backups, and monitoring
Integrated with Logstash and Kibana
Scale up and scale down your cluster to deliver optimum
performance as data and usage patterns change, paying
only for the resources you actually consume
Control access to the Elasticsearch APIs using
AWS Identity and Access Management (IAM) policies
What is
Amazon ES?
Amazon
Elasticsearch Service
(Amazon ES)
39. Relational Data Warehouse
a lot faster
a lot simpler
a lot cheaper
Massively parallel + Petabyte scale
Fully managed
HDD and SSD Platforms
$1,000/TB/Year; starts at $0.25/hour
What is
Amazon Redshift?
Amazon
Redshift
55. A Platform for Streaming Data on AWS
What is
Amazon Kinesis?
Amazon
Kinesis
Amazon
Kinesis
Streams
Amazon
Kinesis
Firehose
Amazon
Kinesis
Analytics
59. Amazon
Kinesis
Streams
Amazon
Kinesis
Low latency I/O
Configurable retention period from 1 to 7 days
The maximum size of a data blob is up to 1 MB
Each shard can support:
up to 1,000 records / second and
up to 1 MB / second for writes
up to 5 transactions / second and
up to 2 MB / second for reads
66. Machine learning is the technology that automatically
finds patterns in your data and uses them to make
predictions for new data points as they become
available
Your Data + Machine Learning
= Smart Applications
What is
Machine Learning?
Amazon
Machine Learning
(Amazon ML)
67. Designed for Developers
No Machine Learning skills are required
Batch prediction
Real-time predictions
Can be used by other applications via APIs
What can you do?
Amazon
Machine Learning
(Amazon ML)
80. Data Orchestration can be a Task by Itself
S3 Bucket
Cluster
EMR Cluster
DynamoDB Table
Redshift DB
RDS Instance
S3 Bucket
On
Premises
81. Helps you reliably process and move data between
different AWS compute and storage services, as well
as on-premise data sources, at specified intervals
What is AWS
Data Pipeline?
AWS
Data Pipeline
82. Access your data where it’s stored, transform and
process it at scale, and efficiently transfer the results
to other AWS services
What is AWS
Data Pipeline?
AWS
Data Pipeline
83. Helps you migrate databases to AWS easily and
securely: the source database remains fully
operational during the migration, minimizing
downtime to applications that rely on the database
What is
AWS Database
Migration Service?
AWS Database
Migration Service
Customer
Premises
Application Users
AWS
Internet
VPN
AWS
Database Migration
Service
84. Migrate off Oracle and SQL Server
Move your tables, views, stored procedures and DML
to MySQL, MariaDB, and Amazon Aurora
AWS Schema
Conversion Tool
AWS Database
Migration Service
85. Know exactly where manual edits are needed
AWS Schema
Conversion Tool
AWS Database
Migration Service
90. A very fast, cloud-powered business intelligence (BI)
service that makes it easy to build visualizations,
perform ad-hoc analysis, and quickly get business
insights from their data
What is Amazon
QuickSight?
Amazon
QuickSight
92. Amazon
QuickSight
Architecture
Amazon
QuickSight Business User
QuickSight API
Data Prep Metadata SuggestionsConnectors SPICE
Business User
QuickSight UI
Mobile Devices Web Browsers
Partner BI products
Amazon
S3
Amazon
Kinesis
Amazon
DynamoDB
Amazon
EMR
Amazon
Redshift
Amazon RDSFiles Third-party
98. Use AWS Partner
BI Solutions
with Amazon
QuickSight
Amazon
QuickSight
Amazon QuickSight provides partners
a simple SQL-like interface to query the data stored
in SPICE, so that customers can continue using their
existing BI tools while benefiting from the faster
performance delivered by SPICE