My Presentation as Speaker in Google Cloud Platform Big Data Summit in Google Venice Office, talking about how we utilize Google Big Query and other Big Data stack in BlueCava
IAC 2024 - IA Fast Track to Search Focused AI Solutions
BlueCava and Google Cloud Platform
1. BLUECAVA, INC. / 2015BLUECAVA, INC. / 2015 PAGE 1
CROSS SCREEN STARTS HERE
2. BLUECAVA, INC. / 2015
BLUECAVA
Business / Product / Challenges
PAGE 2
3. BLUECAVA, INC. / 2015
INTRODUCTION
PAGE 3
Reza Qorbani
CTO @ BlueCava
• Work with Google Big Data Team in past 1.5 years
• Move from 100% Private Cloud to Hybrid Environment
• Deep Integration with Big Query
Email
reza.qorbani@bluecava.com
Twitter
@qorbani
4. BLUECAVA, INC. / 2015
DISPLAYMOBILEVIDEOEXCHANGESOCIAL
Real-time
Intelligence
ABOUT – BlueCava
PAGE 4
VALIDATIONDEMOGRAPHLOCATIONEXCHANGECOVERAGE
Association
Graph
DataTech
Platforms
AdTech
PlatformsOpen Network that Optimizes
Cross-Screen Marketing
MARTECH PLATFORMS & SERVICES
5. BLUECAVA, INC. / 2015
ABOUT – Association Graph
PAGE 5
House Hold
Consumer B Consumer A Consumer C
IDFA APN BCID
6. BLUECAVA, INC. / 2015
ABOUT – Coverage
PAGE 6
100M/ House Holds
240M/ Consumers
600M/ Devices
7. BLUECAVA, INC. / 2015
ABOUT – Volume
PAGE 7
5 TB Daily
Daily RAW Logs
250k req/sec
From Partners and Exchanges
1.3 Petabyte
Total Storage
25 Billion IDs
Including our Partner IDs
8. BLUECAVA, INC. / 2015
ABOUT – Challenge
PAGE 8
− Generate data for customers
− Multiple extraction at time
− Keep data for months
− Highly Available
− Easily run Ad-Hoc queries
− Handle lots of POCs
− Flexible to Change
− Unified Data Store
− Bandwidth Cost
− Storage Cost
− Infrastructure Cost
− Operation Cost
Cost Flexibility Delivery
9. BLUECAVA, INC. / 2015
ARCHITECTURE
BlueCava Platform Overview / Before / Now / Future!
PAGE 9
10. BLUECAVA, INC. / 2015
ARCHITECTURE – BlueCava Platform Overview
PAGE 10
CORE INTERNAL CUSTOMER
PLATFORM
EDGEX BIDDER OPERATIONS QUALITY API PORTAL
METADATA PREPARE
LOGGING AGGREGATE
FILTER
DETECTOR
TRANSFER / PREPARE PROCESS / ASSOCIATION ANALYZE / REPORT
AG AE DB
11. BLUECAVA, INC. / 2015
ARCHITECTURE – Before
PAGE 11
WEST (IRVINE) EAST (ASHBURN)
CORECORECUSTOMERINTERNAL
PLATFORM
BACKUP / DR
Geographic Load Balancing
XDC NET
12. BLUECAVA, INC. / 2015
ARCHITECTURE – Before / Challenges
PAGE 12
Cost
Estimate of $1.5M upfront to scale up
High Monthly Bandwidth cost
Need to Extend Operation team
Scalability
Performance
Storage
Complexity
Resource Limitations
Datacenter Issue with Traffic spikes
Need to scale down after POC finishes
Some processes took more than a day
Customer delivery takes 5-10 hours
Ad-Hoc queries taking hours
Need more historical data to increase quality
Need to keep customer data for months
Deliver large amount of data to customers
Simple Tasks Require Data Engineering Expertise
Customizing Data Output was hard
Data Scientists need meaningful data set
QA/Dev Environment Separation
Ad-Hoc queries create issue for production
13. BLUECAVA, INC. / 2015
ARCHITECTURE – Before / Solution
PAGE 13
Big Query
§ Big Data as a Service
§ Extremely cost effective for our use-case
§ Support Hierarchical Data Model
§ Extremely fast
§ Query using SQL
§ Solve most of our Big Data challenges
§ Fraction of cost (It was Unbelievable)
§ Customer Delivery in Seconds!!!
§ We dropped Delivery Spark Cluster (10 nodes)
§ We dropped Ad-Hoc Hadoop Cluster (100x nodes)
§ Offload ALL Customer Facing Jobs
§ Only 2 Sprints Development (6 Weeks)
14. BLUECAVA, INC. / 2015
ARCHITECTURE – Before / Solution
PAGE 14
Cloud Storage
§ Nice integration with Big Query
§ No file size limit like S3
§ HDFS Integration using Hadoop Connector
§ Seamless Cost Saving: DRA and Nearline
§ Solved most of our Storage challenges
§ Simplified our file delivery
§ Extremely competitive pricing
§ No need for Backup J
15. BLUECAVA, INC. / 2015
ARCHITECTURE – Before / Solution
PAGE 15
Compute Engine
§ Great Sustained Pricing
§ No need for long-termcontract
§ Simple CLI for Automation
§ BDUtil Library for Hadoop
§ Elastic Environment which saved us on Cost
§ 100+ nodes Hadoop under 6 minutes
§ Use as On-Demand Resource as needed
§ Stop purchasing more hardware!
16. BLUECAVA, INC. / 2015
ARCHITECTURE – Now
PAGE 16
WEST (IRVINE) Google Cloud Platform
CORE CUSTOMERINTERNAL
PLATFORM
Cloud Storage
Simple DNS
Interconnect Big Query
17. BLUECAVA, INC. / 2015
ARCHITECTURE – Future!
PAGE 17
Cost
Move all in Cloud
Scalability
World-wide Coverage
Performance
Real-time Association
Simplify
Data Science Lab
Container Engine Dataproc Dataflow Datalab
18. BLUECAVA, INC. / 2015
ARCHITECTURE – Future!
PAGE 18
CORE REALTIME PROCESS
ASSOCIATION GRAPH
QUERY
LAB
STORAGE
INTERNAL
CUSTOMER
BATCH PROCESS