Big Data in Action

•

21 gefällt mir•4,091 views

ngonpham

Big Data technologies and applications in Vietnam

Technologie Business

Big Data in Action
Ngon Pham, Lana Engineer

Introduction
●
●
●
●
●

Introduction
Problem
Approach
Demo
Big Data in Vietnam

Introduction
● Internet-enabled devices
○ Tons of data generated every second

● Hardware becomes much cheaper
○ We can now store and process much more data

Problem
● How to process 10TB, how long and how
much?
○ Assume
■ Amazon EC2
■ HDD read at 50MB/s
■ Computation time is less than I/O time

Problem
● 1 machine, 1 core, 1 HDD
○ Time: 55.56 hours
○ Amazon Cost: $0.12 x 55.56 = $6.67

● 10 machines, 40 cores, 40 HDD
○ Time: 1.39 hours
○ Amazon Cost: $0.48 x 10 x 1.39 = $6.67
⇒ The same cost but 40x faster

Question
● How to divide data/process between
machines?
● How to make each process read data inside
the machine directly instead of another?
● How to replicate data, restore the process if
there is failure?
● Lots of task management questions...

Hadoop Approach
● Computation
○ MapReduce

MongoDB Approach
● Computation
○ SQL
○ Aggregation
○ MapReduce

Spark Approach
● Storage
○ Resilient distributed
dataset (RDD)
○ Persistent backed by
HDFS / HBase...

Spark Approach
● Computation
○ Mixed
○ In-memory
computing

Demo
● Hadoop
○ Run script to create Amazon cluster
○ Play with Hadoop / HDFS / Spark
○ Process Wikipedia data

● MongoDB
○ Collect data from different sources and analyze

Big Data in Vietnam
● Why is MongoDB popular?
○ Lots of PHP developers prefer
○ Simple to setup and use
○ Similar to MySQL

Big Data in Vietnam
● Hadoop is used by a few big local online
companies & international startups
○ Analyze tons of data
○ Create new competitive advantage
⇒ But there is a big shortage of skilled engineers

Empfohlen

Mongodb meetupEytan Daniyalzade

Scaling Up with PHP and AWSHeath Dutton ☕

Go frugal with web servicesDaniel Fireman

Omaha Rails User Group - Ec2BrightMix

Boulder JS meet up presentation for April 16Matthew Schrepel

Evolutionary software designLior Bar-On

Big Data for Startupsngonpham

Building a scalable infrastructure for social mobile web appsngonpham

Empfohlen

Mongodb meetupEytan Daniyalzade

Scaling Up with PHP and AWSHeath Dutton ☕

Go frugal with web servicesDaniel Fireman

Omaha Rails User Group - Ec2BrightMix

Boulder JS meet up presentation for April 16Matthew Schrepel

Evolutionary software designLior Bar-On

Big Data for Startupsngonpham

Building a scalable infrastructure for social mobile web appsngonpham

The Dark Side Of Go -- Go runtime related problems in TiDB in productionPingCAP

Piano Media - approach to data gathering and processingMartinStrycek

Big data on google platform dev fest presentationPrzemysław Pastuszka

Cloud arch patternsCorey Huinker

Speeding up Page Load Times by Using StarlingErik Osterman

Cloud accounting software ukArcus Universe Ltd

kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community

Scalable, good, cheapMarc Cluet

Bandwidth, Throughput, Iops, And Flopsbillmenger

TRHUG 2015 - Veloxity Big Data Migration Use CaseHakan Ilter

Beyond 1000 bosh Deploymentsanynines GmbH

Unit 1 - Computer memory.pptxMrAdhit1

AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty

Web performance optimization - MercadoLibrePablo Moretti

From nothing to a video under 2 seconds / Mikhail Sychev (YouTube)Ontico

Harnessing the cloud_for_saa_s_hosted_platforLuke Summerfield

OpenNebulaConf2018 - Our Journey to OpenNebula - Germán Gutierrez - Booking.comOpenNebula Project

Scaling up with Aerospike!Anshu Prateek

Document Similarity with Cloud ComputingBryan Bende

MapReduceJoseph Duimstra, Ph.D.

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

Training state-of-the-art general text embeddingZilliz

Weitere ähnliche Inhalte

Ähnlich wie Big Data in Action

The Dark Side Of Go -- Go runtime related problems in TiDB in productionPingCAP

Piano Media - approach to data gathering and processingMartinStrycek

Big data on google platform dev fest presentationPrzemysław Pastuszka

Cloud arch patternsCorey Huinker

Speeding up Page Load Times by Using StarlingErik Osterman

Cloud accounting software ukArcus Universe Ltd

kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community

Scalable, good, cheapMarc Cluet

Bandwidth, Throughput, Iops, And Flopsbillmenger

TRHUG 2015 - Veloxity Big Data Migration Use CaseHakan Ilter

Beyond 1000 bosh Deploymentsanynines GmbH

Unit 1 - Computer memory.pptxMrAdhit1

AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty

Web performance optimization - MercadoLibrePablo Moretti

From nothing to a video under 2 seconds / Mikhail Sychev (YouTube)Ontico

Harnessing the cloud_for_saa_s_hosted_platforLuke Summerfield

OpenNebulaConf2018 - Our Journey to OpenNebula - Germán Gutierrez - Booking.comOpenNebula Project

Scaling up with Aerospike!Anshu Prateek

Document Similarity with Cloud ComputingBryan Bende

MapReduceJoseph Duimstra, Ph.D.

Ähnlich wie Big Data in Action (20)

The Dark Side Of Go -- Go runtime related problems in TiDB in production

Piano Media - approach to data gathering and processing

Big data on google platform dev fest presentation

Cloud arch patterns

Speeding up Page Load Times by Using Starling

Cloud accounting software uk

kranonit S06E01 Игорь Цинько: High load

Scalable, good, cheap

Bandwidth, Throughput, Iops, And Flops

TRHUG 2015 - Veloxity Big Data Migration Use Case

Beyond 1000 bosh Deployments

Unit 1 - Computer memory.pptx

AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English

Web performance optimization - MercadoLibre

From nothing to a video under 2 seconds / Mikhail Sychev (YouTube)

Harnessing the cloud_for_saa_s_hosted_platfor

OpenNebulaConf2018 - Our Journey to OpenNebula - Germán Gutierrez - Booking.com

Scaling up with Aerospike!

Document Similarity with Cloud Computing

MapReduce

Kürzlich hochgeladen

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

Training state-of-the-art general text embeddingZilliz

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

"ML in Production",Oleksandr BaganFwdays

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

A Journey Into the Emotions of Software DevelopersNicole Novielli

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

How to write a Business Continuity PlanDatabarracks

Kürzlich hochgeladen (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx

Training state-of-the-art general text embedding

Moving Beyond Passwords: FIDO Paris Seminar.pdf

"ML in Production",Oleksandr Bagan

Anypoint Exchange: It’s Not Just a Repo!

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

A Journey Into the Emotions of Software Developers

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

WordPress Websites for Engineers: Elevate Your Brand

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

DevoxxFR 2024 Reproducible Builds with Apache Maven

The Ultimate Guide to Choosing WordPress Pros and Cons

"Debugging python applications inside k8s environment", Andrii Soldatenko

How AI, OpenAI, and ChatGPT impact business and software.

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

How to write a Business Continuity Plan

Big Data in Action

1. Big Data in Action Ngon Pham, Lana Engineer

2. Introduction ● ● ● ● ● Introduction Problem Approach Demo Big Data in Vietnam

3. Introduction ● Internet-enabled devices ○ Tons of data generated every second ● Hardware becomes much cheaper ○ We can now store and process much more data

4. Problem ● How to process 10TB, how long and how much? ○ Assume ■ Amazon EC2 ■ HDD read at 50MB/s ■ Computation time is less than I/O time

5. Problem ● 1 machine, 1 core, 1 HDD ○ Time: 55.56 hours ○ Amazon Cost: $0.12 x 55.56 = $6.67 ● 10 machines, 40 cores, 40 HDD ○ Time: 1.39 hours ○ Amazon Cost: $0.48 x 10 x 1.39 = $6.67 ⇒ The same cost but 40x faster

6. Question ● How to divide data/process between machines? ● How to make each process read data inside the machine directly instead of another? ● How to replicate data, restore the process if there is failure? ● Lots of task management questions...

7. Approach ● Hadoop ● MongoDB ● Spark

8. Hadoop Approach ● Storage ○ HDFS

9. Hadoop Approach ● Computation ○ MapReduce

10. MongoDB Approach ● Storage ○ Document

11. MongoDB Approach ● Computation ○ SQL ○ Aggregation ○ MapReduce

12. Spark Approach ● Storage ○ Resilient distributed dataset (RDD) ○ Persistent backed by HDFS / HBase...

13. Spark Approach ● Computation ○ Mixed ○ In-memory computing

14. Demo ● Hadoop ○ Run script to create Amazon cluster ○ Play with Hadoop / HDFS / Spark ○ Process Wikipedia data ● MongoDB ○ Collect data from different sources and analyze

15. Big Data in Vietnam

16. Big Data in Vietnam ● Why is MongoDB popular? ○ Lots of PHP developers prefer ○ Simple to setup and use ○ Similar to MySQL

17. Big Data in Vietnam ● Hadoop is used by a few big local online companies & international startups ○ Analyze tons of data ○ Create new competitive advantage ⇒ But there is a big shortage of skilled engineers

18. Q&A Q&A