Strategic Uses for Cost Efficient Long-Term Cloud Storage
1. Mas Kubo, Senior Product Manager, Amazon Glacier
Jacob Weinstein, Senior Director of Global Content Operations, Sony DADC New Media Solutions
March 21, 2017
Strategic Uses for Cost Efficient Long-
Term Cloud Storage
2. Petabyte Scale Use Cases
• Satellite imagery archive and distribution
• Original raw user content preservation
• HIPAA compliant medical record archives
• Public sector regulated data backups
• Video content active archive workflow
3. Cloud Data Migration
Direct
Connect
Snow* data
transport
family
3rd Party
Connectors
Transfer
Acceleration
Storage
Gateway
Kinesis Firehose
AWS Storage Platform and SolutionsThe AWS Storage Portfolio
Object
Amazon GlacierAmazon S3
Block
Amazon EBS
(persistent)
Amazon EC2
Instance Store
(ephemeral)
File
Amazon EFS
4. Glacier is a powerful data storage solution that can
address the full spectrum of archive use cases
• Deep Archives: very rarely or never retrieved, such as compliance or public sector
archives
• Active Workloads: minute-level access, such as media broadcasting
• Mass Content Distribution: petabyte-scale data access, such as big data analytics
5. Flexible Data Access
Three retrieval options
from minutes to hours
Durable
11 9s of durability (5 orders of
magnitude better than 2
copies on tape)
Management Features
Vault Lock, Retrieval Policies,
CloudTrail
Cost-Effective
Starting at $0.004 per GB
per month
Secure
All data encrypted at rest
Scalable
From gigabytes to exabytes
Glacier’s Value Proposition
7. Amazon Glacier
Metered
usage:
pay as you go
No capital investment
No commitment
No risky capacity
planning
Avoid risks of
physical media
handling
Control your
geographic
locality for
performance
and compliance
8. 1 PB raw storage
800 TB usable storage
600 TB allocated storage
400 TB application data
Storage pricing - pay only for what you use
AWS Cloud
Storage
Amazon Glacier starts at $0.004/GB/month
Price dropped by 43% on 11/21/2016
13. Amazon Glacier Vault Lock allows you to easily
set compliance controls on individual vaults and
enforce them via a lockable policy
Time-based retention
MFA authentication
Controls govern all
records in a vault
Immutable policy
Two-step locking
Compliance storage with Vault Lock
16. Amazon Glacier received a third-party assessment
from Cohasset Associates on how Amazon Glacier
with Vault Lock can be used to meet the
requirements of SEC 17a-4(f) and CFTC 1.31(b)-(c)
Third-party assessment
Management features: Vault Lock
18. Accessing Amazon Glacier
1. Direct Amazon Glacier API/SDK
2. Amazon S3 lifecycle integration
3. Third-party tools and gateways
FastGlacier
19. Amazon Glacier – Direct access/APIs
Create
Vault
Configure
Access
Upload
Archives
Register
Archive ID
Data Upload
Initiate
Retrieval
Async
Retrieval
Completion
Completion
Notification
Download
Data
Data Retrieval
20. L
i
f
e
c
y
c
l
e
Available
S3: 99.99%
S3-IA: 99.9%
Performant
Low Latency
High Throughput
≥ 30
Days
≥ 128K
≥ 90
Days
Durable
99.999999999%
Scalable
Elastic capacity
No preset limits
> 0K
$0.004/GB per month
$0.0125/GB per month
“Hot” Data
Active and/or
Temporary Data
“Warm” Data
Infrequently
Accessed Data
“Cold” Data
Archive and
Compliance Data
≥ 0
Days
> 0K$0.021/GB per month
$0.01/GB retrieval
Using Glacier via Lifecycle policies
S3-IA
Glacier
S3
Starting $0.0025/GB
retrieval
22. Save money on storage
• 1 PB of storage and growing
• 1 PB for S3 Standard = $24,117 per month for storage cost
• 1 PB for S3 Standard-IA = $13,107 per month for
storage cost, 45% saving over S3 Standard
• 1 PB archived in Amazon Glacier = $4,194 per month
for storage cost, 83% saving over S3 Standard
* Assumes the highest public pricing tier
23. Amazon Glacier: Amazon S3 lifecycle policies
• Object-level tagging for S3
objects
• Apply lifecycle rules based on
object tags
• Example: transition objects to
Amazon Glacier when 1 year
old and have object tags
‘Project=Delta’ and ‘Data
type=HPI’.
24. Uploading data: Internet or sneaker-net
AWS Direct
Connect
Dedicated bandwidth between
your site and AWS
Internet
Transfer data in a secure SSL tunnel
over the public Internet
AWS Import/Export
AWS Snowball
Physical transfer of media into
and out of AWS
25. Migrating Tapes to AWS
• Index Engines – announced AWS integration January 31.
• Managed tape migration service to AWS with native S3 support
• Direct indexing, reporting and access to backup data
• Supports data backed up by IBM, Dell EMC, Veritas, HP, etc.
• Cost effective migration from legacy tape to AWS S3
• Clients include: JPMC, Citi, DB, Barclays, TIAA-CREF, Rabo
AgriFinance
26. Amazon Glacier – Data Retrievals
Standard Retrieval
• Current model
• 3-5 hours
Disaster Recovery
Bulk Retrieval
• Batch/Bulk access
• 5-12 hours
PB scale re-transcoding
or video/image analysis
Expedited Retrieval
• Emergency access
• 1-5 minutes
Last minute play-out
schedule swap
$0.03/GB $0.01/GB $0.0025/GB
On-site tape replacement Off-site tape replacement
Retrieval Cost
Use Case
27. 2017 Glacier Roadmap
• Prepaid long-term storage: discounted upfront storage pricing for 1, 3, 5, 10+ years.
• S3 lifecycle and Glacier unification: retrieval notifications, range retrievals, direct
put to Glacier storage class, permanent restore, etc.
• Batch retrievals: retrieve a list of archives with a single API call
• Query over archive: run basic queries over a data set without having to retrieve
data.
FEEDBACK WELCOME!!
30. “If physical deliveries can happen
within one hour based on
unpredictable requests, surely we
are able to exceed such
expectations digitally”
@SonyDADCNMS
31. Our migration
The Challenge
• Seamlessly migrate a platform that enables content
delivery across all devices and more than 1,200
distribution points worldwide
• Store 20 petabytes of motion picture and television
content
• Equating to 1,000,000 M+ hours of content
• At a growth curve of ~1 petabyte every quarter
Desired Goals:
• One-hour delivery turn around time
• Agile, scalable, predictable cost model and
infrastructure
• Investing in innovation vs. hardware
@SonyDADCNMS
Amazon EBS is designed for workloads that require persistent storage accessible by single EC2 instances. Typical use cases include boot volumes, transactional and NoSQL databases (like Microsoft Exchange, Cassandra and MongoDB), Big Data analytics platforms (like Hadoop, Amazon EMR, and HortonWorks), stream and log processing applications (like Kafka and Splunk), and data warehousing applications (like Vertica and Cassandra).
Amazon EFS provides simple, scalable file storage for sharing data between Amazon EC2 instances in the AWS Cloud. It delivers a file system interface with standard file system access semantics for Amazon EC2 instances. Amazon EFS grows and shrinks capacity automatically, and provides high throughput with consistently low latencies. Amazon EFS is designed for high availability and durability, and provides performance for a broad spectrum of workloads and applications, including Big Data and analytics, media processing workflows, content management, web serving, container storage, and home directories.
Amazon S3 is object storage designed to store and access any type of data over the Internet. It is secure, 99.999999999% durable, and scales past tens of trillions of objects. Amazon S3 is used for backup and recovery, tiered archive, user-driven content (like photos, videos, music and files), data lakes for Big Data analytics and data warehouse platforms, or as a foundation for serverless computing design.
Amazon Glacier is an extremely low-cost, highly durable storage for long-term backup and archive. Amazon Glacier is a solution for customers who want low-cost storage for infrequently accessed data. It can replace tape while assisting with compliance in highly regulated organizations like healthcare, life science, and financial services.
Amazon Cloud Data Migration services help customers migrate data into and out of the AWS Cloud in offline, online, or streaming models.
In this sessions, I’m going to discuss how Amazon Glacier is designed to help address these concerns. Whether this is your first time learning about Amazon Glacier and you are either considering moving your archival workload to Amazon Glacier or are anticipating data archival needs in the future, or you already using Amazon Glacier and are here to learn about new features and how to make better use of Amazon Glacier to optimize your archival workflow, you’re going to get a lot of today’s talk. And if there’s one thing I want you to walk away with from this presentation, other than excitement that it’s finally time for the pub crawl, its that Amazon Glacier is am extremely-low cost, powerful, and flexible data storage solution that can address the full spectrum of your archive use cases, ranging from deep archives that are never retrieved to active workloads with minute-level access, such as media broadcasting, to petabyte-scale content distribution or big data analytics use cases.
Data stored on Amazon Glacier starts at just $0.004 per GB. This comes after our announcement last week of dropping our storage price by 43% and continues AWS’ tradition of innovating to reduce costs and to then pass those savings onto our customers.
- We also announced last week the launch of new retrieval features that make it easier than ever and more cost-effective to access your Glacier data.
11 9’s of durability. In mathematical terms, 11 9’s of durability means that out of 10,000 objects, you might expect to lose one every 10 million years. We asked a large Hollywood studio to run the same markov model to determine the number of 9s for two copies of data on tape and they came back with 5 – 6 9s. Having 6 more 9s means that Glacier’s durability is 6 orders of magnitude more durable than two copies of your data on tape.
All data is encrypted at rest
Glacier offers a suite of features from compliance, to audit logging, to cost management.
From a value-proposition standpoint
Amazon Glacier removes the need for upfront capital expenditures that, in particular for archival hardware which involves long investment horizons that increases the risk involved with choosing the right solution. With Amazon Glacier, there’s no commitment and you simply pay as you go for only the storage you use.
We remove the need for time-consuming capacity planning and on-going negotiations with multiple hardware and software vendors.
Help avoid the risk of physical media handling.
Enable you to control your geographic locality, performance and compliance.
The other thing to note is our strong histroy of price cuts. Normally when you buy capital equipment and price is reduced, no one calls you and offers you a refund for what you’ve already purchased. AWS frequently cut pricing as we contiue to gain scale and realize efficienes in our operational model. Just last year we cut S3 pricing by 65%. This year we introduced S3-IA which provides roughly 60% savings compared to S3 and cut Glacier pricing by 30%.
S3 is highly durable. Your data is stored across three separate facilities giving you geo-redundancy and we can sustain data loss in two facilities simultaneously and your data is still safe, providing a statistical measure of 11 9’s of durability. Consider what it would take to architect for such a level of durability in your own data centres
Data archival often involves business critical data and so many customers require audit logging to know who did what when, whether that action was approved or denied and why. Glacier customer have access to such logs via AWS CloudTrail, which can be enabled with just a few clicks in the console, and the logging applies to all Glacier APIs.
We launched Vault Lock in summer 2015 which allows customers to set compliance controls on the Glacier storage containers (vault) via a lockable policy. For example, for customers who used to buy WORM storage/drives for records retention, they can now easily set up a Vault Lock with say 7 year retention and Glacier will enforce the retention control such that any archives stored in the Vault cannot be deleted until it has been stored for 7 years.
We recognize that data retention is one of the most common archive use cases and we launched Vault Lock to make life simpler for these customers. However, Vault Lock does more than data retention (WORM). It can be used to enforce a number of compliance objectives, such as protection on data access. For example, a pharmaceutical company can lock their top secret drug formula in a Vault that requires a 3 way multi-factor authentication for access.
Talk about our data hierarchy – customer maps to a Vault, social post is in an Archive. Retention and legal hold are set at Vault level.
Walk through the policy. Note that we set it with less than 20 lines of json.
HIPAA, Soc, others
There are 3 ways to access Glacier. The first is to interact directly with Glacier using its APIs or SDK, which gives you full access to Glacier’s features and provides a great way to build applications on Glacier. Like any AWS service, always be sure to be thoughtful and diligent to set access policies that meet your security requirements around who can upload, access, or delete your Glacier data. The second way to access Glacier, which is also one of the most popular is via S3 lifecycle policies. If your application or data set also uses S3 storage, then using Glacier by transitioning your S3 objects into the Glacier storage class a simple and powerful way to manage all your object storage in AWS.
Lastly, there are a large and ever-growing number of third party vendors that integrate with Glacier ranging from free consumer-grade tools like FastGlacierr and CloudBerry to enterprise grade applications and storage gateways, like Commvault and Netapp AltaVault. One thing to keep in mind when using third party tools is whether your archives are kept in their native format or whether they are reformatted by the 3rd party tool for different reasons like compression or indexing features. Many customers find it useful to have their data in its native format in AWS because it provides the flexibility to use that data for analysis or other workflows without having to use the 3rd party tool in the cloud to understand the way the tool formatted your data.
Highlight customer architecture and how durability, avail, performance, and scalability relate to application type
You can transition objects from S3 Standard to SIA after 30 day and then transition to Glacier after 365 days.
Now, many customers keep a lot of their data in the same bucket for application design reasons and often times they only want certain sets of objects in that bucket to be transitioned to Glacier. However, since lifecycle policies apply at the bucket level, some have found it challenging to write policies that only applied to the objects they actually want transitioned. Well, I’m excited to share that yesterday S3 launched a powerful new feature that will enable far more granularity when it comes to lifecycle policies. Yesterday, we announced that you can now add up to 10 tags to any of your S3 objects and that lifecycle policies can refer to these tags. In the example here, the policy still applies the entire bucket, but now the rules only executes for objects with the tags “project = Delta” and Data type = HPI”. The variations are limitless.
In addition, many customers have asked for the ability to manually transition data to Glacier one object at a time, regardless of age. Object tagging enables that. The way you would do it is by defining a 0 day lifecycle policy on a bucket that applies to a certain object tag, such as “Freeze it!”. I.e. any object with the tag “Freeze it!” will be transitioned to Glacier that day. Therefore, to transition a single object, you simply make the API call to add the tag Freeze it.
Before diving into ways you can upload data, I’d like to mention briefly a couple options you have when it comes to transferring data. If you have large amounts of data that you need to transfer, such as terabytes or petabytes, either as a one-time migration such as decommissioning an old tape library to move to the cloud, or on an ongoing basis, such as mass content distribution use cases, then AWS Direct Connect and Snowball can make that easier and more cost effective. Direct connect provides a dedicated, secure, high speed connection between AWS and your datacenter, up to 10Gbps of bandwidth per connection and can dramatically reduce egress costs. Snowball is a low-cost way to transfer large amounts of data securely and often times faster than over the public internet for large datasets.
Add “…” for more options
Changed comprehensive in the header to Accelerated
Replaced graphic to make text size larger / removed media poster art