SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you.
Make the subtitle something clever. People will think it’s neat.
Welcome!
DoIT International
Practicing multi-cloud since 2010.
Agenda
1
2
3
4
5
AWS Athena
Google BigQuery
Test Drive
Summary
Q & A
2
DoIT International confidential │ Do not distribute
About me..
Vadim Solovey - CTO // DoiT International
DoIT International confidential │ Do not distribute
DoIT International confidential │ Do not distribute
Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you.
Make the subtitle something clever. People will think it’s neat.
AWS Athena vs Google BigQuery
for interactive SQL queries on large datasets (#20/16)
Vadim Solovey - CTO // DoIT International
Google Cloud Developer Expert | AWS Certified Solutions Architect
Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you.
Make the subtitle something clever. People will think it’s neat.
Athena (/əˈθiːnə/; Greek:
- the goddess of
wisdom, craft,
and war
Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you.
Make the subtitle something clever. People will think it’s neat.
OR
Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you.
Make the subtitle something clever. People will think it’s neat.
Will Athena
slay BigQuery?
Vadim Solovey - CTO // DoIT International
Google Cloud Developer Expert | AWS Certified Solutions Architect
Section Slide Template Option 2
Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you.
Make the subtitle something clever. People will think it’s neat.
your mileage may
will vary
Warning:
DoIT International confidential │ Do not distribute
AWS Athena
• Serverless Analytical Columnar Database based on Facebook’s Presto
• Data:
• External Tables (*SV, JSON, ORC, PARQUET files in S3 bucket)
• Ingestion:
• Just store files on S3
• Convert to columnar/compressed format using EMR
• ANSI SQL 2011
• Priced at $5/TB of scanned data & standard S3 storage/ops costs
• Cost Optimization -converting data into columnar format, partitioning, limit queried columns.
DoIT International confidential │ Do not distribute
Google BigQuery
• Serverless Analytical Columnar Database based on Google Dremel
• Data:
• Native Tables
• External Tables (*SV, JSON, AVRO files stored in Google Cloud Storage bucket)
• Ingestion:
• File Imports
• Streaming API (up to 100K records/sec per table)
• Federated Tables (files in bucket, Bigtable table or Google Spreadsheet)
• ANSI SQL 2011
• Priced at $5/TB of scanned data + storage + streaming (if used)
• Cost Optimization - partitioning, limit queried columns, 24-hour cache, cold data.
DoIT International confidential │ Do not distribute
Summary
Feature  Product AWS Athena Google BigQuery
Data Formats *SV, JSON, PARQUET/z, ORC/z External (*SV, JSON, AVRO) / Native
ANSI SQL Support Yes* Yes*
DDL Support Only CREATE/ALTER/DROP CREATE/UPDATE/DELETE (w/ quotas)
Underlying Technology FB Presto Google Dremel
Caching No Yes
Cold Data Pricing S3 Lifecycle Policy 50% discount after 90 days of inactivity
User Defined Functions No Yes
Data Partitioning On Any Key By DAY
Pricing $5/TB (scanned) plus S3 ops $5/TB (scanned) less cached data
DoIT International confidential │ Do not distribute
How we tested?
• Dataset
• New York Yellow Taxi Public Dataset (https://data.cityofnewyork.us) [130GB, 1.1B rows]
• Akamai Log (30GB, 1B rows]
• BigQuery [NY Taxi]
• Import of data into native table
• External table on top of 500x uncompressed CSV files in GCS bucket
• Caching: off
• AWS Athena [NY Taxi]
• Copied 500x uncompressed CSV files from GCS to S3 bucket
• Using EMR 5.2 (HIVE/PRESTO) converted the data into ORC/z and PARQUET/z formats
DoIT International confidential │ Do not distribute
Tables & Formats
BigQuery
• trips_ext (500x CSV files, 490MB each) [245GB in total]
• trips_nat (130GB total)
AWS Athena
• trips_csv (500x CSV files, 490MB each)
• trips_par (4 files, 3.2GB each)
• trips_parz (8 files, 1.7GB each)
• trips_orc (8 files, 2GB each)
• trips_orcz (8 files, 2.1GB each)
DoIT International confidential │ Do not distribute
Test Drive Summary
Query Type AWS Athens (GB/time) Google BigQuery (GB/time) t.diff %
[1] LOOKUP 48MB (4.1s) 130GB (2.0s) - 51%
[2] LOOKUP & AGGR 331MB (4.35s) 13.4GB (2.7s) - 48%
[3] GROUP/ORDER BY 5.74GB (8.85s) 8.26GB (5.4s) - 27%
[4] TEXT FUNCTIONS 606MB (11.3s) 13.6GB (2.4s) - 470%
[5] JSON FUNCTIONS 29MB (17.8s) 63.9GB (8.9s) - 100%
[6] REGEX FUNCTIONS (1.3s) 5.45GB (1.9s) + 31%
[7] FEDERATED DATA 133GB (19.4s) 133GB (36.4s) +47%
DoIT International confidential │ Do not distribute
What Athena does better than BigQuery?
Advantages:
• Can be faster than BigQuery, especially with federated/external tables
• Ability to use regex to define a schema (query files without needing to change the format)
• Can be faster and cheaper than BigQuery when using a partitioned/columnar format
• Tables can be partitioned on any column
Issues:
• It’s not easy to convert data between formats
• Doesn’t support DDL, i.e. no insert/update/delete
• Randomly giving the HIVE_UNKNOWN_ERROR
• No streaming support
• Struggles with really large datasets
DoIT International confidential │ Do not distribute
What BigQuery does better than Athena?
• It has native table support giving it better performance and more features
• It’s easy to manipulate data, insert/update records and write query results back to a table
• Querying native tables is very fast
• Easy to convert non-columnar formats into a native table for columnar queries
• Supports UDFs, although they will be available in the future for Athena
• Supports nested tables (nested and repeated fields)
• Works well for petabyte scale queries
Section Slide Template Option 2
Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you.
Make the subtitle something clever. People will think it’s neat.
Questions?
DoIT International confidential │ Do not distribute
[1] Lookup Query
SELECT *
FROM trips_par
WHERE vendor_id = 'VTS'
LIMIT 10
Back
DoIT International confidential │ Do not distribute
[2] Lookup & Aggregation
SELECT max(passenger_count)
FROM trips_par
WHERE vendor_id <> 'VTS'
Back
DoIT International confidential │ Do not distribute
[3] GROUP BY / ORDER BY Query
SELECT substr(string(pickup_datetime),1,7) month,
COUNT(*) trips
FROM [doit-playground:playground.trips_nat]
WHERE substr(string(pickup_datetime),1,4) = '2014'
GROUP BY 1
ORDER BY 1
Back
DoIT International confidential │ Do not distribute
[4] ‘LIKE’ Functions Query
SELECT
count(*)
FROM
log_par
WHERE
UA LIKE '%AppleWebKit%' OR
Back
DoIT International confidential │ Do not distribute
[5] JSON Functions Query
SELECT
JSON_EXTRACT(Misc_Fields,'$.network.edgeIP') AS edgeIP, COUNT(*) AS total
FROM
[doit-playground:playground.akamai_errors]
GROUP BY
edgeIP
ORDER BY total DESC
LIMIT 10
Back
DoIT International confidential │ Do not distribute
[6] Regex Functions Query
SELECT *
FROM log_par
WHERE REGEXP_MATCH(reqPath, r'msn.*-home') LIMIT 10
Back

Weitere ähnliche Inhalte

Was ist angesagt?

Serverless Big Data Architecture on Google Cloud Platform at Credit OK
Serverless Big Data Architecture on Google Cloud Platform at Credit OKServerless Big Data Architecture on Google Cloud Platform at Credit OK
Serverless Big Data Architecture on Google Cloud Platform at Credit OKKriangkrai Chaonithi
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkVinoth Chandar
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge DatabasesLynn Langit
 
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...InfluxData
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
Presto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspectivePresto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspectiveAlluxio, Inc.
 
Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...
Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...
Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...★ Akshay Surve
 
Scylla @ Disney+ Hotstar
Scylla @ Disney+ HotstarScylla @ Disney+ Hotstar
Scylla @ Disney+ HotstarScyllaDB
 
Hadoop Networking at Datasift
Hadoop Networking at DatasiftHadoop Networking at Datasift
Hadoop Networking at Datasifthuguk
 
Migrating a multi tenant app to Azure (war biopic)
Migrating a multi tenant app to Azure (war biopic)Migrating a multi tenant app to Azure (war biopic)
Migrating a multi tenant app to Azure (war biopic)★ Akshay Surve
 
Change Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHChange Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHParis Data Engineers !
 
Presto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix ContainersPresto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix Containerskbajda
 
NDC Minnesota - Analyzing StackExchange data with Azure Data Lake
NDC Minnesota - Analyzing StackExchange data with Azure Data LakeNDC Minnesota - Analyzing StackExchange data with Azure Data Lake
NDC Minnesota - Analyzing StackExchange data with Azure Data LakeTom Kerkhove
 
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersDisney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersScyllaDB
 
Dynamic Object Routing
Dynamic Object RoutingDynamic Object Routing
Dynamic Object RoutingCloudian
 
Clickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaClickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaValery Tkachenko
 
Amazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni VamvadelisAmazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni Vamvadelishuguk
 
Presto Summit 2018 - 10 - Qubole
Presto Summit 2018  - 10 - QubolePresto Summit 2018  - 10 - Qubole
Presto Summit 2018 - 10 - Qubolekbajda
 
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...Altinity Ltd
 

Was ist angesagt? (20)

Serverless Big Data Architecture on Google Cloud Platform at Credit OK
Serverless Big Data Architecture on Google Cloud Platform at Credit OKServerless Big Data Architecture on Google Cloud Platform at Credit OK
Serverless Big Data Architecture on Google Cloud Platform at Credit OK
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on Spark
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge Databases
 
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Presto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspectivePresto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspective
 
Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...
Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...
Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...
 
Scylla @ Disney+ Hotstar
Scylla @ Disney+ HotstarScylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
 
Hadoop Networking at Datasift
Hadoop Networking at DatasiftHadoop Networking at Datasift
Hadoop Networking at Datasift
 
Migrating a multi tenant app to Azure (war biopic)
Migrating a multi tenant app to Azure (war biopic)Migrating a multi tenant app to Azure (war biopic)
Migrating a multi tenant app to Azure (war biopic)
 
Change Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHChange Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVH
 
Presto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix ContainersPresto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix Containers
 
NDC Minnesota - Analyzing StackExchange data with Azure Data Lake
NDC Minnesota - Analyzing StackExchange data with Azure Data LakeNDC Minnesota - Analyzing StackExchange data with Azure Data Lake
NDC Minnesota - Analyzing StackExchange data with Azure Data Lake
 
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersDisney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
 
Dynamic Object Routing
Dynamic Object RoutingDynamic Object Routing
Dynamic Object Routing
 
REDSHIFT - Amazon
REDSHIFT - AmazonREDSHIFT - Amazon
REDSHIFT - Amazon
 
Clickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaClickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek Vavrusa
 
Amazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni VamvadelisAmazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni Vamvadelis
 
Presto Summit 2018 - 10 - Qubole
Presto Summit 2018  - 10 - QubolePresto Summit 2018  - 10 - Qubole
Presto Summit 2018 - 10 - Qubole
 
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
 

Andere mochten auch

Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryIntro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryChris Schalk
 
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleAn indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleData Con LA
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperMárton Kodok
 
Google BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsGoogle BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsAndreas Raible
 

Andere mochten auch (7)

Redshift VS BigQuery
Redshift VS BigQueryRedshift VS BigQuery
Redshift VS BigQuery
 
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryIntro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
 
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleAn indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
 
Google Cloud Spanner Preview
Google Cloud Spanner PreviewGoogle Cloud Spanner Preview
Google Cloud Spanner Preview
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
Google BigQuery
Google BigQueryGoogle BigQuery
Google BigQuery
 
Google BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsGoogle BigQuery - Features & Benefits
Google BigQuery - Features & Benefits
 

Ähnlich wie AWS Athena vs. Google BigQuery for interactive SQL Queries

Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Michael Rys
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys
 
TDC2016POA | Trilha Cloud Computing - Microsoft Azure ? From Zero To Hero!
TDC2016POA | Trilha Cloud Computing - Microsoft Azure ? From Zero To Hero!TDC2016POA | Trilha Cloud Computing - Microsoft Azure ? From Zero To Hero!
TDC2016POA | Trilha Cloud Computing - Microsoft Azure ? From Zero To Hero!tdc-globalcode
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Michael Rys
 
Optimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudOptimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudQubole
 
A lap around Azure Data Factory
A lap around Azure Data FactoryA lap around Azure Data Factory
A lap around Azure Data FactoryBizTalk360
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsIke Ellis
 
Amazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopAmazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopDoiT International
 
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_AnalyticsPASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_AnalyticsDustin Vannoy
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analyticsIke Ellis
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptxIke Ellis
 
Splitgraph: AHL talk
Splitgraph: AHL talkSplitgraph: AHL talk
Splitgraph: AHL talkSplitgraph
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAlluxio, Inc.
 
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...MongoDB
 
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of GruterBig Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of GruterData Con LA
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureChristos Charmatzis
 
Building Cloud-Native Applications with Microsoft Windows Azure
Building Cloud-Native Applications with Microsoft Windows AzureBuilding Cloud-Native Applications with Microsoft Windows Azure
Building Cloud-Native Applications with Microsoft Windows AzureBill Wilder
 

Ähnlich wie AWS Athena vs. Google BigQuery for interactive SQL Queries (20)

GCP for AWS Professionals
GCP for AWS ProfessionalsGCP for AWS Professionals
GCP for AWS Professionals
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
 
TDC2016POA | Trilha Cloud Computing - Microsoft Azure ? From Zero To Hero!
TDC2016POA | Trilha Cloud Computing - Microsoft Azure ? From Zero To Hero!TDC2016POA | Trilha Cloud Computing - Microsoft Azure ? From Zero To Hero!
TDC2016POA | Trilha Cloud Computing - Microsoft Azure ? From Zero To Hero!
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
Optimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudOptimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public Cloud
 
A lap around Azure Data Factory
A lap around Azure Data FactoryA lap around Azure Data Factory
A lap around Azure Data Factory
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for Analytics
 
Amazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopAmazon Athena Hands-On Workshop
Amazon Athena Hands-On Workshop
 
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_AnalyticsPASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
 
Splitgraph: AHL talk
Splitgraph: AHL talkSplitgraph: AHL talk
Splitgraph: AHL talk
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
 
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
 
Phissug s01 ep6, stretch database
Phissug s01 ep6, stretch databasePhissug s01 ep6, stretch database
Phissug s01 ep6, stretch database
 
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of GruterBig Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with Azure
 
Building Cloud-Native Applications with Microsoft Windows Azure
Building Cloud-Native Applications with Microsoft Windows AzureBuilding Cloud-Native Applications with Microsoft Windows Azure
Building Cloud-Native Applications with Microsoft Windows Azure
 

Mehr von DoiT International

Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules RestructuredDoiT International
 
GAN training with Tensorflow and Tensor Cores
GAN training with Tensorflow and Tensor CoresGAN training with Tensorflow and Tensor Cores
GAN training with Tensorflow and Tensor CoresDoiT International
 
Orchestrating Redis & K8s Operators
Orchestrating Redis & K8s OperatorsOrchestrating Redis & K8s Operators
Orchestrating Redis & K8s OperatorsDoiT International
 
K8s best practices from the field!
K8s best practices from the field!K8s best practices from the field!
K8s best practices from the field!DoiT International
 
An Open-Source Platform to Connect, Manage, and Secure Microservices
An Open-Source Platform to Connect, Manage, and Secure MicroservicesAn Open-Source Platform to Connect, Manage, and Secure Microservices
An Open-Source Platform to Connect, Manage, and Secure MicroservicesDoiT International
 
Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?DoiT International
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
 
AWS Cyber Security Best Practices
AWS Cyber Security Best PracticesAWS Cyber Security Best Practices
AWS Cyber Security Best PracticesDoiT International
 
Google BigQuery 101 & What’s New
Google BigQuery 101 & What’s NewGoogle BigQuery 101 & What’s New
Google BigQuery 101 & What’s NewDoiT International
 
Running Production-Grade Kubernetes on AWS
Running Production-Grade Kubernetes on AWSRunning Production-Grade Kubernetes on AWS
Running Production-Grade Kubernetes on AWSDoiT International
 
Scaling Jenkins with Kubernetes by Ami Mahloof
Scaling Jenkins with Kubernetes by Ami MahloofScaling Jenkins with Kubernetes by Ami Mahloof
Scaling Jenkins with Kubernetes by Ami MahloofDoiT International
 
CI Implementation with Kubernetes at LivePerson by Saar Demri
CI Implementation with Kubernetes at LivePerson by Saar DemriCI Implementation with Kubernetes at LivePerson by Saar Demri
CI Implementation with Kubernetes at LivePerson by Saar DemriDoiT International
 
Kubernetes @ Nanit by Chen Fisher
Kubernetes @ Nanit by Chen FisherKubernetes @ Nanit by Chen Fisher
Kubernetes @ Nanit by Chen FisherDoiT International
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
 
Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)DoiT International
 

Mehr von DoiT International (16)

Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules Restructured
 
GAN training with Tensorflow and Tensor Cores
GAN training with Tensorflow and Tensor CoresGAN training with Tensorflow and Tensor Cores
GAN training with Tensorflow and Tensor Cores
 
Orchestrating Redis & K8s Operators
Orchestrating Redis & K8s OperatorsOrchestrating Redis & K8s Operators
Orchestrating Redis & K8s Operators
 
K8s best practices from the field!
K8s best practices from the field!K8s best practices from the field!
K8s best practices from the field!
 
An Open-Source Platform to Connect, Manage, and Secure Microservices
An Open-Source Platform to Connect, Manage, and Secure MicroservicesAn Open-Source Platform to Connect, Manage, and Secure Microservices
An Open-Source Platform to Connect, Manage, and Secure Microservices
 
Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?
 
Applying ML for Log Analysis
Applying ML for Log AnalysisApplying ML for Log Analysis
Applying ML for Log Analysis
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
 
AWS Cyber Security Best Practices
AWS Cyber Security Best PracticesAWS Cyber Security Best Practices
AWS Cyber Security Best Practices
 
Google BigQuery 101 & What’s New
Google BigQuery 101 & What’s NewGoogle BigQuery 101 & What’s New
Google BigQuery 101 & What’s New
 
Running Production-Grade Kubernetes on AWS
Running Production-Grade Kubernetes on AWSRunning Production-Grade Kubernetes on AWS
Running Production-Grade Kubernetes on AWS
 
Scaling Jenkins with Kubernetes by Ami Mahloof
Scaling Jenkins with Kubernetes by Ami MahloofScaling Jenkins with Kubernetes by Ami Mahloof
Scaling Jenkins with Kubernetes by Ami Mahloof
 
CI Implementation with Kubernetes at LivePerson by Saar Demri
CI Implementation with Kubernetes at LivePerson by Saar DemriCI Implementation with Kubernetes at LivePerson by Saar Demri
CI Implementation with Kubernetes at LivePerson by Saar Demri
 
Kubernetes @ Nanit by Chen Fisher
Kubernetes @ Nanit by Chen FisherKubernetes @ Nanit by Chen Fisher
Kubernetes @ Nanit by Chen Fisher
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data Processing
 
Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)
 

Kürzlich hochgeladen

PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 

Kürzlich hochgeladen (20)

PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 

AWS Athena vs. Google BigQuery for interactive SQL Queries

  • 1. Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you. Make the subtitle something clever. People will think it’s neat. Welcome! DoIT International Practicing multi-cloud since 2010.
  • 3. DoIT International confidential │ Do not distribute About me.. Vadim Solovey - CTO // DoiT International
  • 4. DoIT International confidential │ Do not distribute
  • 5. DoIT International confidential │ Do not distribute
  • 6. Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you. Make the subtitle something clever. People will think it’s neat. AWS Athena vs Google BigQuery for interactive SQL queries on large datasets (#20/16) Vadim Solovey - CTO // DoIT International Google Cloud Developer Expert | AWS Certified Solutions Architect
  • 7. Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you. Make the subtitle something clever. People will think it’s neat. Athena (/əˈθiːnə/; Greek: - the goddess of wisdom, craft, and war
  • 8. Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you. Make the subtitle something clever. People will think it’s neat. OR
  • 9. Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you. Make the subtitle something clever. People will think it’s neat. Will Athena slay BigQuery? Vadim Solovey - CTO // DoIT International Google Cloud Developer Expert | AWS Certified Solutions Architect
  • 10. Section Slide Template Option 2 Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you. Make the subtitle something clever. People will think it’s neat. your mileage may will vary Warning:
  • 11. DoIT International confidential │ Do not distribute AWS Athena • Serverless Analytical Columnar Database based on Facebook’s Presto • Data: • External Tables (*SV, JSON, ORC, PARQUET files in S3 bucket) • Ingestion: • Just store files on S3 • Convert to columnar/compressed format using EMR • ANSI SQL 2011 • Priced at $5/TB of scanned data & standard S3 storage/ops costs • Cost Optimization -converting data into columnar format, partitioning, limit queried columns.
  • 12. DoIT International confidential │ Do not distribute Google BigQuery • Serverless Analytical Columnar Database based on Google Dremel • Data: • Native Tables • External Tables (*SV, JSON, AVRO files stored in Google Cloud Storage bucket) • Ingestion: • File Imports • Streaming API (up to 100K records/sec per table) • Federated Tables (files in bucket, Bigtable table or Google Spreadsheet) • ANSI SQL 2011 • Priced at $5/TB of scanned data + storage + streaming (if used) • Cost Optimization - partitioning, limit queried columns, 24-hour cache, cold data.
  • 13. DoIT International confidential │ Do not distribute Summary Feature Product AWS Athena Google BigQuery Data Formats *SV, JSON, PARQUET/z, ORC/z External (*SV, JSON, AVRO) / Native ANSI SQL Support Yes* Yes* DDL Support Only CREATE/ALTER/DROP CREATE/UPDATE/DELETE (w/ quotas) Underlying Technology FB Presto Google Dremel Caching No Yes Cold Data Pricing S3 Lifecycle Policy 50% discount after 90 days of inactivity User Defined Functions No Yes Data Partitioning On Any Key By DAY Pricing $5/TB (scanned) plus S3 ops $5/TB (scanned) less cached data
  • 14. DoIT International confidential │ Do not distribute How we tested? • Dataset • New York Yellow Taxi Public Dataset (https://data.cityofnewyork.us) [130GB, 1.1B rows] • Akamai Log (30GB, 1B rows] • BigQuery [NY Taxi] • Import of data into native table • External table on top of 500x uncompressed CSV files in GCS bucket • Caching: off • AWS Athena [NY Taxi] • Copied 500x uncompressed CSV files from GCS to S3 bucket • Using EMR 5.2 (HIVE/PRESTO) converted the data into ORC/z and PARQUET/z formats
  • 15. DoIT International confidential │ Do not distribute Tables & Formats BigQuery • trips_ext (500x CSV files, 490MB each) [245GB in total] • trips_nat (130GB total) AWS Athena • trips_csv (500x CSV files, 490MB each) • trips_par (4 files, 3.2GB each) • trips_parz (8 files, 1.7GB each) • trips_orc (8 files, 2GB each) • trips_orcz (8 files, 2.1GB each)
  • 16. DoIT International confidential │ Do not distribute Test Drive Summary Query Type AWS Athens (GB/time) Google BigQuery (GB/time) t.diff % [1] LOOKUP 48MB (4.1s) 130GB (2.0s) - 51% [2] LOOKUP & AGGR 331MB (4.35s) 13.4GB (2.7s) - 48% [3] GROUP/ORDER BY 5.74GB (8.85s) 8.26GB (5.4s) - 27% [4] TEXT FUNCTIONS 606MB (11.3s) 13.6GB (2.4s) - 470% [5] JSON FUNCTIONS 29MB (17.8s) 63.9GB (8.9s) - 100% [6] REGEX FUNCTIONS (1.3s) 5.45GB (1.9s) + 31% [7] FEDERATED DATA 133GB (19.4s) 133GB (36.4s) +47%
  • 17. DoIT International confidential │ Do not distribute What Athena does better than BigQuery? Advantages: • Can be faster than BigQuery, especially with federated/external tables • Ability to use regex to define a schema (query files without needing to change the format) • Can be faster and cheaper than BigQuery when using a partitioned/columnar format • Tables can be partitioned on any column Issues: • It’s not easy to convert data between formats • Doesn’t support DDL, i.e. no insert/update/delete • Randomly giving the HIVE_UNKNOWN_ERROR • No streaming support • Struggles with really large datasets
  • 18. DoIT International confidential │ Do not distribute What BigQuery does better than Athena? • It has native table support giving it better performance and more features • It’s easy to manipulate data, insert/update records and write query results back to a table • Querying native tables is very fast • Easy to convert non-columnar formats into a native table for columnar queries • Supports UDFs, although they will be available in the future for Athena • Supports nested tables (nested and repeated fields) • Works well for petabyte scale queries
  • 19. Section Slide Template Option 2 Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you. Make the subtitle something clever. People will think it’s neat. Questions?
  • 20. DoIT International confidential │ Do not distribute [1] Lookup Query SELECT * FROM trips_par WHERE vendor_id = 'VTS' LIMIT 10 Back
  • 21. DoIT International confidential │ Do not distribute [2] Lookup & Aggregation SELECT max(passenger_count) FROM trips_par WHERE vendor_id <> 'VTS' Back
  • 22. DoIT International confidential │ Do not distribute [3] GROUP BY / ORDER BY Query SELECT substr(string(pickup_datetime),1,7) month, COUNT(*) trips FROM [doit-playground:playground.trips_nat] WHERE substr(string(pickup_datetime),1,4) = '2014' GROUP BY 1 ORDER BY 1 Back
  • 23. DoIT International confidential │ Do not distribute [4] ‘LIKE’ Functions Query SELECT count(*) FROM log_par WHERE UA LIKE '%AppleWebKit%' OR Back
  • 24. DoIT International confidential │ Do not distribute [5] JSON Functions Query SELECT JSON_EXTRACT(Misc_Fields,'$.network.edgeIP') AS edgeIP, COUNT(*) AS total FROM [doit-playground:playground.akamai_errors] GROUP BY edgeIP ORDER BY total DESC LIMIT 10 Back
  • 25. DoIT International confidential │ Do not distribute [6] Regex Functions Query SELECT * FROM log_par WHERE REGEXP_MATCH(reqPath, r'msn.*-home') LIMIT 10 Back