1) Introduction to the key Big Data concepts
1.1 The Origins of Big Data
1.2 What is Big Data ?
1.3 Why is Big Data So Important ?
1.4 How Is Big Data Used In Practice ?
2) Introduction to the key principles of Big Data Systems
2.1 How to design Data Pipeline in 6 steps
2.2 Using Lambda Architecture for big data processing
3) Practical case study : Chat bot with Video Recommendation Engine
4) FAQ for student
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Concepts, use cases and principles to build big data systems (1)
1. Concepts, use cases
and principles to build
big data systems
http://www.bigdatavietnam.org
https://www.facebook.com/bigdatavn Compiled by Nguyễn Tấn Triều
2. Key Contents
1. Introduction to the key Big Data concepts
○ The Origins of Big Data
○ What is Big Data ?
○ Why is Big Data So Important ?
○ How Is Big Data Used In Practice ?
2. Introduction to the key principles of Big Data Systems
○ How to design Data Pipeline in 6 steps
○ Using Lambda Architecture for big data processing
3. Practical case study
○ Chat bot with Video Recommendation Engine
4. FAQ for student
3. Introduction to the
key Big Data
concepts
○ The Origins of Big Data
○ What is Big Data ?
○ Why is Big Data so
important ?
○ How Is Big Data used in
practice ?
16. How Is Big Data Used In Practice ?
Device Analytics
Which device is most
popular used ?
17. How Is Big Data Used In Practice ?
Time-series Analytics
The peak hours of system
18. How Is Big Data Used In Practice ?
GeoLocation Heatmap Analytics
19. Introduction to the
key principles of
Big Data Systems
○ How to design Data
Pipeline in 6 steps
○ Using Lambda
Architecture for big
data processing
20. How to design Data Pipeline Systems
Collecting → Storing → Processing → Analyzing → Learning → Visualizing
Data engineering process: 3 tasks
1. Collecting
a. Concepts
b. Technology
2. Storing
a. Big Data Storage Concepts
b. Big Data Storage Technology
3. Processing
a. Big Data Processing Concepts
b. Big Data Processing Technology
Data Science/Machine Learning process: 3 tasks
4) Analyzing → 5) Learning → 5) Visualizing
21. Data Engineer Tasks Data Analyst Tasks
Big Data Analytics Lifecycle
Collecting
Storing
Processing
Analyzing
Learning
Visualizing
26. Storing Concepts
● Clusters
● Scale-Up vs Scale-Out
● File Systems and Distributed File Systems
● NoSQL
● Sharding
● Replication
● Sharding and Replication
● CAP Theorem
47. When standard relational database
(Oracle,MySQL, ...) is not good enough
the “analytic system” MySQL database from a startup, tracking all actions in
mobile games: iOS, Android, ...
48. 3 common problems in Big Data System
1. Size: the volume of the datasets is a critical factor.
2. Complexity: the structure, behaviour and permutations of the datasets is
a critical factor.
3. Technologies: the tools and techniques which are used to process a
sizable or complex dataset is a critical factor.
49. Key ideas of Lambda Architecture in Big Data System
51. Problem
● A company want to develop a chat bot for
news recommendation
● They want to classify data into standard
categories (26 categories) for
user-friendly query
● The engineering team have develop a
data pipeline for system
56. How to learn Big Data ?
1. Have lots of passion, curiosity with data
2. Knowledge about data structure, statistics and basic maths
3. Love to solve complex problems with data-driven mindset
4. Database knowledge: when to use NoSQL vs RDBMS
5. Knowledge about distributed computing
6. Linux / Open Source Tools
7. Programming language: Python / Java / SQL / JavaScript
8. English skills
57. Big Data Job Market is really hot
https://www.class-central.com/subject/big-data
58. Some good books for self-learning
● http://sachvui.com/ebook/du-lieu-lon-big-data.281.html
● https://drive.google.com/open?id=0B3dHGVpTXDOhQXJCR01PVkpQMGM
● https://drive.google.com/file/d/1rPvfio6EkaUvGtgfQoq9p9Fa2ljOMIn1/view?usp=sharing
● https://drive.google.com/open?id=0B3dHGVpTXDOhVTBKX09NUnlLcm8