SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
Overview of Rfx Framework / Platform
https://docs.google.com/document/d/1wutns90tuW1PGR03tXhDE_­DkrdWZtfvh9R_cJRtrXk/edit?usp=sharing
Big Data Infrastructure - TODO Tasks
Update March 12, 2014 by Triều (@tantrieuf31)
● Module HTTP Log Server:
○ Hot deployment/restart/shutdown Http Log Server
○ Reactive streaming for Kafka Producer (RxJava) 
■ https://github.com/Netflix/RxJava/wiki/Transforming­Observables
● Module Messaging (Kafka): https://bitbucket.org/trieunt/kafka
○ Tìm 1 cơ chế quản lý configs và rotate kafka logs 1 cách an toàn hơn (hiện đang bị 1 issue 
Kafka Consumer chưa đọc xong mà Kafka log đã move đi => kg tìm thấy offset để đọc tiếp => 
thiếu data)
○ Dự đoán tốc độ tăng file Kafka log để chọn 1 configs tối ưu cho từng loại sản phẩm 
(machine learning (linear regression) for system performance)
○ Tạo mapping (thời gian, offset và binary offset files) (lúc cần parse lại thì dễ tìm files)
○ Quản lý + index lại offset của Kafka theo thời gian (giờ, ngày, ...), lúc cần thì set vào là chạy 
reparse lại (hiện chưa implement)
● Module Stream Data Processing: https://bitbucket.org/trieunt/rfx/wiki/Home
○ Quản lý memory của worker node (nếu set HeapSize quá thấp => Worker sẽ die/restart liên 
tục do kg đủ memory để chạy vì log nhiều)
○ Cơ chế extensions/plugins/hooking  vào hệ thống (phân chia core và applications)
○ Refactoring (tổ chức lại code cho rõ ràng) giữa logic code công việc giữa: 
■ parse => ghi vào Redis (chỉ parse, counting và check rules)
■ parse => ghi ra raw log files trong 1 worker (chỉ parse và write raw logs)
○ Unit Test Tools (Kafka Producer) + Test Tools (integration test) cho Reactive Topologies 
○ Cải thiện chức năng debug log của Worker (ElasticSearch+Kibana)
○ Monitor Front End cho tất cả các critical metrics:
■ worker nodes (logs, memory, restart time, running, died, uptime, downtime )
■ alert/notification
■ số lượng log đọc từ Kafka, parsed OK, check OK, save OK
■ Disk Free, memory cho worker
■ Backup Redis Data
■ Simple Analytics Dashboard cho logs (analytics)
○ New Job Server (dùng Groovy script để dễ deploy và control qua Pub/Sub Redis)
■ Synchronized Data job
● Module Active Intelligence (tính năng mới )
● social data crawler Facebook/Twitter/Google+ (Rfx Social Data Crawler)
● Clustering Stream Data (test case: tin tức về các vụ tai nạn xe cột / cướp giật / thảm họa thiên 
nhiên) ­ dùng Apache Spark http://spark.apache.org
● Realtime Visualization Engine with HTML5 Web Socket (D3.js + Netty + Akka Actor)

Weitere ähnliche Inhalte

Ähnlich wie Big data infrastructure todo-tasks Rfx Framework

Ähnlich wie Big data infrastructure todo-tasks Rfx Framework (16)

Graylog advanced v1.2
Graylog advanced v1.2Graylog advanced v1.2
Graylog advanced v1.2
 
Bai 7
Bai 7Bai 7
Bai 7
 
[Duong daitran] vliw
[Duong daitran] vliw[Duong daitran] vliw
[Duong daitran] vliw
 
Hadoop trong triển khai Big Data
Hadoop trong triển khai Big DataHadoop trong triển khai Big Data
Hadoop trong triển khai Big Data
 
Hadoop
HadoopHadoop
Hadoop
 
Room 1 - 2 - Nguyễn Văn Thắng & Dzung Nguyen - Proxmox VE và ZFS over iscsi
Room 1 - 2 - Nguyễn Văn Thắng & Dzung Nguyen - Proxmox VE và ZFS over iscsiRoom 1 - 2 - Nguyễn Văn Thắng & Dzung Nguyen - Proxmox VE và ZFS over iscsi
Room 1 - 2 - Nguyễn Văn Thắng & Dzung Nguyen - Proxmox VE và ZFS over iscsi
 
Dsd02 sta
Dsd02 staDsd02 sta
Dsd02 sta
 
Virtual cluster thesis
Virtual   cluster thesisVirtual   cluster thesis
Virtual cluster thesis
 
BaoCaoFreeRTOS.pptx
BaoCaoFreeRTOS.pptxBaoCaoFreeRTOS.pptx
BaoCaoFreeRTOS.pptx
 
Map reduce hdfs
Map reduce hdfsMap reduce hdfs
Map reduce hdfs
 
Sapo Microservices Architecture
Sapo Microservices ArchitectureSapo Microservices Architecture
Sapo Microservices Architecture
 
Kinh nghiệm triển khai Microservices tại Sapo.vn
Kinh nghiệm triển khai Microservices tại Sapo.vnKinh nghiệm triển khai Microservices tại Sapo.vn
Kinh nghiệm triển khai Microservices tại Sapo.vn
 
Đánh giá tải với Gatling [Meetup #21 - 02]
Đánh giá tải với Gatling [Meetup #21 - 02]Đánh giá tải với Gatling [Meetup #21 - 02]
Đánh giá tải với Gatling [Meetup #21 - 02]
 
Report
ReportReport
Report
 
Linux+02
Linux+02Linux+02
Linux+02
 
Process and thread
Process and threadProcess and thread
Process and thread
 

Mehr von Trieu Nguyen

Building Your Customer Data Platform with LEO CDP in Travel Industry.pdf
Building Your Customer Data Platform with LEO CDP in Travel Industry.pdfBuilding Your Customer Data Platform with LEO CDP in Travel Industry.pdf
Building Your Customer Data Platform with LEO CDP in Travel Industry.pdfTrieu Nguyen
 
Building Your Customer Data Platform with LEO CDP - Spa and Hotel Business
Building Your Customer Data Platform with LEO CDP - Spa and Hotel BusinessBuilding Your Customer Data Platform with LEO CDP - Spa and Hotel Business
Building Your Customer Data Platform with LEO CDP - Spa and Hotel BusinessTrieu Nguyen
 
Building Your Customer Data Platform with LEO CDP
Building Your Customer Data Platform with LEO CDP Building Your Customer Data Platform with LEO CDP
Building Your Customer Data Platform with LEO CDP Trieu Nguyen
 
How to track and improve Customer Experience with LEO CDP
How to track and improve Customer Experience with LEO CDPHow to track and improve Customer Experience with LEO CDP
How to track and improve Customer Experience with LEO CDPTrieu Nguyen
 
[Notes] Customer 360 Analytics with LEO CDP
[Notes] Customer 360 Analytics with LEO CDP[Notes] Customer 360 Analytics with LEO CDP
[Notes] Customer 360 Analytics with LEO CDPTrieu Nguyen
 
Leo CDP - Pitch Deck
Leo CDP - Pitch DeckLeo CDP - Pitch Deck
Leo CDP - Pitch DeckTrieu Nguyen
 
LEO CDP - What's new in 2022
LEO CDP  - What's new in 2022LEO CDP  - What's new in 2022
LEO CDP - What's new in 2022Trieu Nguyen
 
Lộ trình triển khai LEO CDP cho ngành bất động sản
Lộ trình triển khai LEO CDP cho ngành bất động sảnLộ trình triển khai LEO CDP cho ngành bất động sản
Lộ trình triển khai LEO CDP cho ngành bất động sảnTrieu Nguyen
 
Why is LEO CDP important for digital business ?
Why is LEO CDP important for digital business ?Why is LEO CDP important for digital business ?
Why is LEO CDP important for digital business ?Trieu Nguyen
 
From Dataism to Customer Data Platform
From Dataism to Customer Data PlatformFrom Dataism to Customer Data Platform
From Dataism to Customer Data PlatformTrieu Nguyen
 
Data collection, processing & organization with USPA framework
Data collection, processing & organization with USPA frameworkData collection, processing & organization with USPA framework
Data collection, processing & organization with USPA frameworkTrieu Nguyen
 
Part 1: Introduction to digital marketing technology
Part 1: Introduction to digital marketing technologyPart 1: Introduction to digital marketing technology
Part 1: Introduction to digital marketing technologyTrieu Nguyen
 
Why is Customer Data Platform (CDP) ?
Why is Customer Data Platform (CDP) ?Why is Customer Data Platform (CDP) ?
Why is Customer Data Platform (CDP) ?Trieu Nguyen
 
How to build a Personalized News Recommendation Platform
How to build a Personalized News Recommendation PlatformHow to build a Personalized News Recommendation Platform
How to build a Personalized News Recommendation PlatformTrieu Nguyen
 
How to grow your business in the age of digital marketing 4.0
How to grow your business  in the age of digital marketing 4.0How to grow your business  in the age of digital marketing 4.0
How to grow your business in the age of digital marketing 4.0Trieu Nguyen
 
Video Ecosystem and some ideas about video big data
Video Ecosystem and some ideas about video big dataVideo Ecosystem and some ideas about video big data
Video Ecosystem and some ideas about video big dataTrieu Nguyen
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Trieu Nguyen
 
Open OTT - Video Content Platform
Open OTT - Video Content PlatformOpen OTT - Video Content Platform
Open OTT - Video Content PlatformTrieu Nguyen
 
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisApache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisTrieu Nguyen
 
Introduction to Recommendation Systems (Vietnam Web Submit)
Introduction to Recommendation Systems (Vietnam Web Submit)Introduction to Recommendation Systems (Vietnam Web Submit)
Introduction to Recommendation Systems (Vietnam Web Submit)Trieu Nguyen
 

Mehr von Trieu Nguyen (20)

Building Your Customer Data Platform with LEO CDP in Travel Industry.pdf
Building Your Customer Data Platform with LEO CDP in Travel Industry.pdfBuilding Your Customer Data Platform with LEO CDP in Travel Industry.pdf
Building Your Customer Data Platform with LEO CDP in Travel Industry.pdf
 
Building Your Customer Data Platform with LEO CDP - Spa and Hotel Business
Building Your Customer Data Platform with LEO CDP - Spa and Hotel BusinessBuilding Your Customer Data Platform with LEO CDP - Spa and Hotel Business
Building Your Customer Data Platform with LEO CDP - Spa and Hotel Business
 
Building Your Customer Data Platform with LEO CDP
Building Your Customer Data Platform with LEO CDP Building Your Customer Data Platform with LEO CDP
Building Your Customer Data Platform with LEO CDP
 
How to track and improve Customer Experience with LEO CDP
How to track and improve Customer Experience with LEO CDPHow to track and improve Customer Experience with LEO CDP
How to track and improve Customer Experience with LEO CDP
 
[Notes] Customer 360 Analytics with LEO CDP
[Notes] Customer 360 Analytics with LEO CDP[Notes] Customer 360 Analytics with LEO CDP
[Notes] Customer 360 Analytics with LEO CDP
 
Leo CDP - Pitch Deck
Leo CDP - Pitch DeckLeo CDP - Pitch Deck
Leo CDP - Pitch Deck
 
LEO CDP - What's new in 2022
LEO CDP  - What's new in 2022LEO CDP  - What's new in 2022
LEO CDP - What's new in 2022
 
Lộ trình triển khai LEO CDP cho ngành bất động sản
Lộ trình triển khai LEO CDP cho ngành bất động sảnLộ trình triển khai LEO CDP cho ngành bất động sản
Lộ trình triển khai LEO CDP cho ngành bất động sản
 
Why is LEO CDP important for digital business ?
Why is LEO CDP important for digital business ?Why is LEO CDP important for digital business ?
Why is LEO CDP important for digital business ?
 
From Dataism to Customer Data Platform
From Dataism to Customer Data PlatformFrom Dataism to Customer Data Platform
From Dataism to Customer Data Platform
 
Data collection, processing & organization with USPA framework
Data collection, processing & organization with USPA frameworkData collection, processing & organization with USPA framework
Data collection, processing & organization with USPA framework
 
Part 1: Introduction to digital marketing technology
Part 1: Introduction to digital marketing technologyPart 1: Introduction to digital marketing technology
Part 1: Introduction to digital marketing technology
 
Why is Customer Data Platform (CDP) ?
Why is Customer Data Platform (CDP) ?Why is Customer Data Platform (CDP) ?
Why is Customer Data Platform (CDP) ?
 
How to build a Personalized News Recommendation Platform
How to build a Personalized News Recommendation PlatformHow to build a Personalized News Recommendation Platform
How to build a Personalized News Recommendation Platform
 
How to grow your business in the age of digital marketing 4.0
How to grow your business  in the age of digital marketing 4.0How to grow your business  in the age of digital marketing 4.0
How to grow your business in the age of digital marketing 4.0
 
Video Ecosystem and some ideas about video big data
Video Ecosystem and some ideas about video big dataVideo Ecosystem and some ideas about video big data
Video Ecosystem and some ideas about video big data
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
 
Open OTT - Video Content Platform
Open OTT - Video Content PlatformOpen OTT - Video Content Platform
Open OTT - Video Content Platform
 
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisApache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
 
Introduction to Recommendation Systems (Vietnam Web Submit)
Introduction to Recommendation Systems (Vietnam Web Submit)Introduction to Recommendation Systems (Vietnam Web Submit)
Introduction to Recommendation Systems (Vietnam Web Submit)
 

Big data infrastructure todo-tasks Rfx Framework

  • 1. Overview of Rfx Framework / Platform https://docs.google.com/document/d/1wutns90tuW1PGR03tXhDE_­DkrdWZtfvh9R_cJRtrXk/edit?usp=sharing Big Data Infrastructure - TODO Tasks Update March 12, 2014 by Triều (@tantrieuf31) ● Module HTTP Log Server: ○ Hot deployment/restart/shutdown Http Log Server ○ Reactive streaming for Kafka Producer (RxJava)  ■ https://github.com/Netflix/RxJava/wiki/Transforming­Observables ● Module Messaging (Kafka): https://bitbucket.org/trieunt/kafka ○ Tìm 1 cơ chế quản lý configs và rotate kafka logs 1 cách an toàn hơn (hiện đang bị 1 issue  Kafka Consumer chưa đọc xong mà Kafka log đã move đi => kg tìm thấy offset để đọc tiếp =>  thiếu data) ○ Dự đoán tốc độ tăng file Kafka log để chọn 1 configs tối ưu cho từng loại sản phẩm  (machine learning (linear regression) for system performance) ○ Tạo mapping (thời gian, offset và binary offset files) (lúc cần parse lại thì dễ tìm files) ○ Quản lý + index lại offset của Kafka theo thời gian (giờ, ngày, ...), lúc cần thì set vào là chạy  reparse lại (hiện chưa implement) ● Module Stream Data Processing: https://bitbucket.org/trieunt/rfx/wiki/Home ○ Quản lý memory của worker node (nếu set HeapSize quá thấp => Worker sẽ die/restart liên  tục do kg đủ memory để chạy vì log nhiều) ○ Cơ chế extensions/plugins/hooking  vào hệ thống (phân chia core và applications) ○ Refactoring (tổ chức lại code cho rõ ràng) giữa logic code công việc giữa:  ■ parse => ghi vào Redis (chỉ parse, counting và check rules) ■ parse => ghi ra raw log files trong 1 worker (chỉ parse và write raw logs) ○ Unit Test Tools (Kafka Producer) + Test Tools (integration test) cho Reactive Topologies  ○ Cải thiện chức năng debug log của Worker (ElasticSearch+Kibana) ○ Monitor Front End cho tất cả các critical metrics: ■ worker nodes (logs, memory, restart time, running, died, uptime, downtime ) ■ alert/notification ■ số lượng log đọc từ Kafka, parsed OK, check OK, save OK ■ Disk Free, memory cho worker ■ Backup Redis Data ■ Simple Analytics Dashboard cho logs (analytics) ○ New Job Server (dùng Groovy script để dễ deploy và control qua Pub/Sub Redis) ■ Synchronized Data job ● Module Active Intelligence (tính năng mới ) ● social data crawler Facebook/Twitter/Google+ (Rfx Social Data Crawler) ● Clustering Stream Data (test case: tin tức về các vụ tai nạn xe cột / cướp giật / thảm họa thiên  nhiên) ­ dùng Apache Spark http://spark.apache.org ● Realtime Visualization Engine with HTML5 Web Socket (D3.js + Netty + Akka Actor)