SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Hung Tien Tran, Hiep Tuan Nguyen, Viet-Trung Tran
Hanoi University of Science and Technology
Introduction
 What is Geographically Weighted Regression?
 What is our work?
Source: http://desktop.arcgis.com
GWR + =
- Large-scale spatial data
- Improve performance
- Distributed
Outline
 Background
 Problem
 Scalable GWR on Spark
 Experiments
 Discussion
 Conclusion
Background
 First Law of Geography - Waldo Tobler:
“Everything is related with everything else, but closer
things are more related”.
 Model GWR
 The OLS estimator takes the form
yi (u) = β0i (u) + β1i (u)x1i +β2i (u)x2i + ... + βmi (u)xmi
βˆ(u) = (X TW (u)X )−1 X TW (u)Y
Background
 Kernel function
 Gaussian function
 Bandwidth
5
fixed bandwidth adaptive bandwidth
Problem
 Estimating a local model
 Bandwidth selection
 Evaluation model
Choose kernel function
βˆ(u) = (X TW (u)X )−1 X TW (u)Y
Source: http://rose.bris.ac.uk
O(n3)
Which bandwidth is good
Problem
 How to apply the model for large-scale data?
 Data points
 Features
 Regression points
Large-Scale GWR on Spark
 Why is Spark?
 In-memory cluster-computing platform
 Support parallel programming
 Develop applications by high-level APIs
 Provides resilient distributed datasets and parallel
operations
 Integration with other components on Spark
Large-Scale GWR on Spark
 We propose three approach to scaling GWR
 Scaling Weighted Linear Regression
 Parallel Multiple WLR models
 Parallel Geographically Weighted Regression (combine
the first two approach)
Scalable GWR on Spark
 Naïve approach – Scaling Weighted Linear Regression
Foreach regPoint
Compute weight
Fit Weighted
Linear
Regression
Summary model
Compute weight
parallel
Compute WLR
model parallel
Scalable GWR on Spark
 Naïve approach
Scalable GWR on Spark
 Parallel Multiple WLR models
Regression dataset
Training dataset
WL
R
Compute weight
WL
R
Compute parallel
multiple WLR
models
Summary
Scalable GWR on Spark
 Parallel Multiple WLR models
Scalable GWR on Spark
 Parallel Geographically Weighted Regression
R
R
R
T
T
T
R
T
R
T
R
T
Regressio
n dataset
Training
dataset
Combin
e dataset
Distributed GWR Computation
Scalable GWR on Spark
 Parallel Geographically Weighted Regression
Scalable GWR on Spark
 Parallel Geographically Weighted Regression
Experiments
 Environment
 Cluster: 8 nodes on Amazon Web Service
 4 cores Inte Xeon E5-2670 v2 2.5 GHz
 16 GB RAM, 2x40 GB SSD
 Hadoop 2.7.2 and Spark 1.6.1
 Dataset
| − −x : double(nullable = false)
| − −y : double(nullable = false)
| − −label : double(nullable = false)
| − −f eatures : vector(nullable = false)
Experiments
 Testing large training dataset
0
200
400
600
800
1000
1200
10000 100000 1000000 2000000 5000000
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
time (sec).
Number of training points
Experiments
 Testing large regression dataset
0
200
400
600
800
1000
1200
1000 5000 10000 20000 50000
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
time
(sec).
Number of regression
points
Experiments
 Testing large dataset with increasing number of
features
0
200
400
600
800
1000
1200
1400
1600
1800
10 20 50 100 200
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
time
(sec).
Number of regression
points
Experiments
 Cluster
0
200
400
600
800
1000
1200
1400
1600
1800
2000
2-node 4-node 8-node
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
time (sec).
Number of nodes
Discussion
 Related work
 Many library GWR on local
 Spgwr (multiR on GRID)
 Using GPU
 Our work
 First study distributed GWR on Spark
 Easy deployment and the advantages of Spark
 Scalable and work well on cluster
Conclusion
 We have
 Propose three approach
 Implement four algorithms base on Spark
 Evaluate our implementation
 Future work
 Improve performance by using Pipeline and Partitions
 Release as open-source library
Large-Scale Geographically Weighted Regression on Spark

Weitere ähnliche Inhalte

Was ist angesagt?

Remote sensing principles-spectral signature-spectural range
Remote sensing principles-spectral signature-spectural rangeRemote sensing principles-spectral signature-spectural range
Remote sensing principles-spectral signature-spectural rangeMohsin Siddique
 
Remote Sensing: Principal Component Analysis
Remote Sensing: Principal Component AnalysisRemote Sensing: Principal Component Analysis
Remote Sensing: Principal Component AnalysisKamlesh Kumar
 
Remote Sensing - Fundamentals
Remote Sensing - FundamentalsRemote Sensing - Fundamentals
Remote Sensing - FundamentalsAjay Singh Lodhi
 
GeoServer, an introduction for beginners
GeoServer, an introduction for beginnersGeoServer, an introduction for beginners
GeoServer, an introduction for beginnersGeoSolutions
 
Database gis fundamentals
Database gis fundamentalsDatabase gis fundamentals
Database gis fundamentalsSumant Diwakar
 
Lecture for landsat
Lecture for landsatLecture for landsat
Lecture for landsatGeoMedeelel
 
Spatial data analysis
Spatial data analysisSpatial data analysis
Spatial data analysisJohan Blomme
 
Inverse distance weighting
Inverse distance weightingInverse distance weighting
Inverse distance weightingPenchala Vineeth
 
Spatial data analysis 1
Spatial data analysis 1Spatial data analysis 1
Spatial data analysis 1Johan Blomme
 
Morphometric analysis of vrishabhavathi watershed using remote sensing and gis
Morphometric analysis of vrishabhavathi watershed using remote sensing and gisMorphometric analysis of vrishabhavathi watershed using remote sensing and gis
Morphometric analysis of vrishabhavathi watershed using remote sensing and giseSAT Journals
 
What is Geography Information Systems (GIS)
What is Geography Information Systems (GIS)What is Geography Information Systems (GIS)
What is Geography Information Systems (GIS)John Lanser
 
Basics of remote sensing, pk mani
Basics of remote sensing, pk maniBasics of remote sensing, pk mani
Basics of remote sensing, pk maniP.K. Mani
 
Hardware and software requirements for gis
Hardware and software requirements for gisHardware and software requirements for gis
Hardware and software requirements for gisSumant Diwakar
 
Shortest route and mst
Shortest route and mstShortest route and mst
Shortest route and mstAlona Salva
 
DATA in GIS and DATA Query
DATA in GIS and DATA QueryDATA in GIS and DATA Query
DATA in GIS and DATA QueryKU Leuven
 

Was ist angesagt? (20)

Gpr
GprGpr
Gpr
 
Remote sensing principles-spectral signature-spectural range
Remote sensing principles-spectral signature-spectural rangeRemote sensing principles-spectral signature-spectural range
Remote sensing principles-spectral signature-spectural range
 
Remote Sensing: Principal Component Analysis
Remote Sensing: Principal Component AnalysisRemote Sensing: Principal Component Analysis
Remote Sensing: Principal Component Analysis
 
Remote Sensing - Fundamentals
Remote Sensing - FundamentalsRemote Sensing - Fundamentals
Remote Sensing - Fundamentals
 
GeoServer, an introduction for beginners
GeoServer, an introduction for beginnersGeoServer, an introduction for beginners
GeoServer, an introduction for beginners
 
Database gis fundamentals
Database gis fundamentalsDatabase gis fundamentals
Database gis fundamentals
 
Lecture for landsat
Lecture for landsatLecture for landsat
Lecture for landsat
 
Spatial data analysis
Spatial data analysisSpatial data analysis
Spatial data analysis
 
Geographic information system
Geographic information systemGeographic information system
Geographic information system
 
Inverse distance weighting
Inverse distance weightingInverse distance weighting
Inverse distance weighting
 
Spatial data analysis 1
Spatial data analysis 1Spatial data analysis 1
Spatial data analysis 1
 
Morphometric analysis of vrishabhavathi watershed using remote sensing and gis
Morphometric analysis of vrishabhavathi watershed using remote sensing and gisMorphometric analysis of vrishabhavathi watershed using remote sensing and gis
Morphometric analysis of vrishabhavathi watershed using remote sensing and gis
 
Vector data model
Vector data modelVector data model
Vector data model
 
What is Geography Information Systems (GIS)
What is Geography Information Systems (GIS)What is Geography Information Systems (GIS)
What is Geography Information Systems (GIS)
 
Basics of remote sensing, pk mani
Basics of remote sensing, pk maniBasics of remote sensing, pk mani
Basics of remote sensing, pk mani
 
Hardware and software requirements for gis
Hardware and software requirements for gisHardware and software requirements for gis
Hardware and software requirements for gis
 
Shortest route and mst
Shortest route and mstShortest route and mst
Shortest route and mst
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
Remote Sensin
Remote SensinRemote Sensin
Remote Sensin
 
DATA in GIS and DATA Query
DATA in GIS and DATA QueryDATA in GIS and DATA Query
DATA in GIS and DATA Query
 

Andere mochten auch

giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studyViet-Trung TRAN
 
Exploring housing patterns and dynamics in low demand neighbourhoods using Ge...
Exploring housing patterns and dynamics in low demand neighbourhoods using Ge...Exploring housing patterns and dynamics in low demand neighbourhoods using Ge...
Exploring housing patterns and dynamics in low demand neighbourhoods using Ge...Graham Squires
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkViet-Trung TRAN
 
Neural Networks for OCR
Neural Networks for OCRNeural Networks for OCR
Neural Networks for OCRDavid Stark
 
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents Viet-Trung TRAN
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forestsViet-Trung TRAN
 
success factors for project proposals
success factors for project proposalssuccess factors for project proposals
success factors for project proposalsViet-Trung TRAN
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learningViet-Trung TRAN
 
Deep Learning Class #3 - Take Two LSTMs
Deep Learning Class #3 - Take Two LSTMsDeep Learning Class #3 - Take Two LSTMs
Deep Learning Class #3 - Take Two LSTMsHolberton School
 
3 - Finding similar items
3 - Finding similar items3 - Finding similar items
3 - Finding similar itemsViet-Trung TRAN
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Viet-Trung TRAN
 
ABC ELP Program - Innovation in government
ABC ELP Program - Innovation in governmentABC ELP Program - Innovation in government
ABC ELP Program - Innovation in governmentAnne-Marie Elias
 
Video Encoding and HTML5 Playback With Native DRM
Video Encoding and HTML5 Playback With Native DRMVideo Encoding and HTML5 Playback With Native DRM
Video Encoding and HTML5 Playback With Native DRMStefan Lederer
 
"Year of the Selfie" [INFOGRAPHIC]
"Year of the Selfie" [INFOGRAPHIC]"Year of the Selfie" [INFOGRAPHIC]
"Year of the Selfie" [INFOGRAPHIC]Unmetric
 
Living Wall - Arabic
Living Wall - ArabicLiving Wall - Arabic
Living Wall - ArabicYousef Taibeh
 

Andere mochten auch (20)

giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case study
 
Exploring housing patterns and dynamics in low demand neighbourhoods using Ge...
Exploring housing patterns and dynamics in low demand neighbourhoods using Ge...Exploring housing patterns and dynamics in low demand neighbourhoods using Ge...
Exploring housing patterns and dynamics in low demand neighbourhoods using Ge...
 
Time Series
Time SeriesTime Series
Time Series
 
Riset Sosial
Riset SosialRiset Sosial
Riset Sosial
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
Neural Networks for OCR
Neural Networks for OCRNeural Networks for OCR
Neural Networks for OCR
 
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
 
Giasan.vn @rstars
Giasan.vn @rstarsGiasan.vn @rstars
Giasan.vn @rstars
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
success factors for project proposals
success factors for project proposalssuccess factors for project proposals
success factors for project proposals
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learning
 
Deep Learning Class #3 - Take Two LSTMs
Deep Learning Class #3 - Take Two LSTMsDeep Learning Class #3 - Take Two LSTMs
Deep Learning Class #3 - Take Two LSTMs
 
3 - Finding similar items
3 - Finding similar items3 - Finding similar items
3 - Finding similar items
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
Tamil OCR using Tesseract OCR Engine
Tamil OCR using Tesseract OCR EngineTamil OCR using Tesseract OCR Engine
Tamil OCR using Tesseract OCR Engine
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
 
ABC ELP Program - Innovation in government
ABC ELP Program - Innovation in governmentABC ELP Program - Innovation in government
ABC ELP Program - Innovation in government
 
Video Encoding and HTML5 Playback With Native DRM
Video Encoding and HTML5 Playback With Native DRMVideo Encoding and HTML5 Playback With Native DRM
Video Encoding and HTML5 Playback With Native DRM
 
"Year of the Selfie" [INFOGRAPHIC]
"Year of the Selfie" [INFOGRAPHIC]"Year of the Selfie" [INFOGRAPHIC]
"Year of the Selfie" [INFOGRAPHIC]
 
Living Wall - Arabic
Living Wall - ArabicLiving Wall - Arabic
Living Wall - Arabic
 

Ähnlich wie Large-Scale Geographically Weighted Regression on Spark

Processing Reachability Queries with Realistic Constraints on Massive Network...
Processing Reachability Queries with Realistic Constraints on Massive Network...Processing Reachability Queries with Realistic Constraints on Massive Network...
Processing Reachability Queries with Realistic Constraints on Massive Network...BigMine
 
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...Ganesan Narayanasamy
 
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...Sangmin Park
 
Introduction to Chainer Chemistry
Introduction to Chainer ChemistryIntroduction to Chainer Chemistry
Introduction to Chainer ChemistryPreferred Networks
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slidesSara Asher
 
How to Layer a Directed Acyclic Graph (GD 2001)
How to Layer a Directed Acyclic Graph (GD 2001)How to Layer a Directed Acyclic Graph (GD 2001)
How to Layer a Directed Acyclic Graph (GD 2001)Nikola S. Nikolov
 
"An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ..."An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ...butest
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer PerceptronsESCOM
 
PAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningPAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningMark Chang
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientFabian Pedregosa
 
Paper.pdf
Paper.pdfPaper.pdf
Paper.pdfDavCla1
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the WeightsMark Chang
 
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7tingyuansenastro
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the WeightsMark Chang
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineSoma Boubou
 
Implementation of the fully adaptive radar framework: Practical limitations
Implementation of the fully adaptive radar framework: Practical limitationsImplementation of the fully adaptive radar framework: Practical limitations
Implementation of the fully adaptive radar framework: Practical limitationsLuis Úbeda Medina
 
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...Cemal Ardil
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOpsPooyan Jamshidi
 
Imecs2012 pp440 445
Imecs2012 pp440 445Imecs2012 pp440 445
Imecs2012 pp440 445Rasha Orban
 

Ähnlich wie Large-Scale Geographically Weighted Regression on Spark (20)

Processing Reachability Queries with Realistic Constraints on Massive Network...
Processing Reachability Queries with Realistic Constraints on Massive Network...Processing Reachability Queries with Realistic Constraints on Massive Network...
Processing Reachability Queries with Realistic Constraints on Massive Network...
 
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
 
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
 
Introduction to Chainer Chemistry
Introduction to Chainer ChemistryIntroduction to Chainer Chemistry
Introduction to Chainer Chemistry
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slides
 
How to Layer a Directed Acyclic Graph (GD 2001)
How to Layer a Directed Acyclic Graph (GD 2001)How to Layer a Directed Acyclic Graph (GD 2001)
How to Layer a Directed Acyclic Graph (GD 2001)
 
"An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ..."An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ...
 
EAGE_prsentation_Anderson.pptx
EAGE_prsentation_Anderson.pptxEAGE_prsentation_Anderson.pptx
EAGE_prsentation_Anderson.pptx
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
 
PAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningPAC Bayesian for Deep Learning
PAC Bayesian for Deep Learning
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
 
Paper.pdf
Paper.pdfPaper.pdf
Paper.pdf
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
 
Implementation of the fully adaptive radar framework: Practical limitations
Implementation of the fully adaptive radar framework: Practical limitationsImplementation of the fully adaptive radar framework: Practical limitations
Implementation of the fully adaptive radar framework: Practical limitations
 
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOps
 
Imecs2012 pp440 445
Imecs2012 pp440 445Imecs2012 pp440 445
Imecs2012 pp440 445
 

Mehr von Viet-Trung TRAN

Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Viet-Trung TRAN
 
Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreViet-Trung TRAN
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnViet-Trung TRAN
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processingViet-Trung TRAN
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookViet-Trung TRAN
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkViet-Trung TRAN
 
Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Viet-Trung TRAN
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learningViet-Trung TRAN
 
Dimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsDimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsViet-Trung TRAN
 
Introduction to mining massive datasets
Introduction to mining massive datasetsIntroduction to mining massive datasets
Introduction to mining massive datasetsViet-Trung TRAN
 
Tachyon memory centric, fault tolerance storage for cluster framworks
Tachyon  memory centric, fault tolerance storage for cluster framworksTachyon  memory centric, fault tolerance storage for cluster framworks
Tachyon memory centric, fault tolerance storage for cluster framworksViet-Trung TRAN
 
Interactive big data analytics
Interactive big data analyticsInteractive big data analytics
Interactive big data analyticsViet-Trung TRAN
 
Hệ thống phân tích tình trạng giao thông: Ứng dụng công cụ xử lý dữ liệu lớn...
Hệ thống phân tích tình trạng giao thông:  Ứng dụng công cụ xử lý dữ liệu lớn...Hệ thống phân tích tình trạng giao thông:  Ứng dụng công cụ xử lý dữ liệu lớn...
Hệ thống phân tích tình trạng giao thông: Ứng dụng công cụ xử lý dữ liệu lớn...Viet-Trung TRAN
 
Overview of big data in cloud computing
Overview of big data in cloud computingOverview of big data in cloud computing
Overview of big data in cloud computingViet-Trung TRAN
 
Vanilla Hadoop vs. the rest
Vanilla Hadoop vs. the rest Vanilla Hadoop vs. the rest
Vanilla Hadoop vs. the rest Viet-Trung TRAN
 

Mehr von Viet-Trung TRAN (20)

Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
 
Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value Store
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớn
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processing
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
GPSinsights poster
GPSinsights posterGPSinsights poster
GPSinsights poster
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
 
Dimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsDimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applications
 
Introduction to mining massive datasets
Introduction to mining massive datasetsIntroduction to mining massive datasets
Introduction to mining massive datasets
 
6 clustering
6 clustering6 clustering
6 clustering
 
2 association rules
2 association rules2 association rules
2 association rules
 
Tachyon memory centric, fault tolerance storage for cluster framworks
Tachyon  memory centric, fault tolerance storage for cluster framworksTachyon  memory centric, fault tolerance storage for cluster framworks
Tachyon memory centric, fault tolerance storage for cluster framworks
 
Interactive big data analytics
Interactive big data analyticsInteractive big data analytics
Interactive big data analytics
 
Hệ thống phân tích tình trạng giao thông: Ứng dụng công cụ xử lý dữ liệu lớn...
Hệ thống phân tích tình trạng giao thông:  Ứng dụng công cụ xử lý dữ liệu lớn...Hệ thống phân tích tình trạng giao thông:  Ứng dụng công cụ xử lý dữ liệu lớn...
Hệ thống phân tích tình trạng giao thông: Ứng dụng công cụ xử lý dữ liệu lớn...
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
Overview of big data in cloud computing
Overview of big data in cloud computingOverview of big data in cloud computing
Overview of big data in cloud computing
 
Vanilla Hadoop vs. the rest
Vanilla Hadoop vs. the rest Vanilla Hadoop vs. the rest
Vanilla Hadoop vs. the rest
 

Kürzlich hochgeladen

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 

Kürzlich hochgeladen (20)

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 

Large-Scale Geographically Weighted Regression on Spark

  • 1. Hung Tien Tran, Hiep Tuan Nguyen, Viet-Trung Tran Hanoi University of Science and Technology
  • 2. Introduction  What is Geographically Weighted Regression?  What is our work? Source: http://desktop.arcgis.com GWR + = - Large-scale spatial data - Improve performance - Distributed
  • 3. Outline  Background  Problem  Scalable GWR on Spark  Experiments  Discussion  Conclusion
  • 4. Background  First Law of Geography - Waldo Tobler: “Everything is related with everything else, but closer things are more related”.  Model GWR  The OLS estimator takes the form yi (u) = β0i (u) + β1i (u)x1i +β2i (u)x2i + ... + βmi (u)xmi βˆ(u) = (X TW (u)X )−1 X TW (u)Y
  • 5. Background  Kernel function  Gaussian function  Bandwidth 5 fixed bandwidth adaptive bandwidth
  • 6. Problem  Estimating a local model  Bandwidth selection  Evaluation model Choose kernel function βˆ(u) = (X TW (u)X )−1 X TW (u)Y Source: http://rose.bris.ac.uk O(n3) Which bandwidth is good
  • 7. Problem  How to apply the model for large-scale data?  Data points  Features  Regression points
  • 8. Large-Scale GWR on Spark  Why is Spark?  In-memory cluster-computing platform  Support parallel programming  Develop applications by high-level APIs  Provides resilient distributed datasets and parallel operations  Integration with other components on Spark
  • 9. Large-Scale GWR on Spark  We propose three approach to scaling GWR  Scaling Weighted Linear Regression  Parallel Multiple WLR models  Parallel Geographically Weighted Regression (combine the first two approach)
  • 10. Scalable GWR on Spark  Naïve approach – Scaling Weighted Linear Regression Foreach regPoint Compute weight Fit Weighted Linear Regression Summary model Compute weight parallel Compute WLR model parallel
  • 11. Scalable GWR on Spark  Naïve approach
  • 12. Scalable GWR on Spark  Parallel Multiple WLR models Regression dataset Training dataset WL R Compute weight WL R Compute parallel multiple WLR models Summary
  • 13. Scalable GWR on Spark  Parallel Multiple WLR models
  • 14. Scalable GWR on Spark  Parallel Geographically Weighted Regression R R R T T T R T R T R T Regressio n dataset Training dataset Combin e dataset Distributed GWR Computation
  • 15. Scalable GWR on Spark  Parallel Geographically Weighted Regression
  • 16. Scalable GWR on Spark  Parallel Geographically Weighted Regression
  • 17. Experiments  Environment  Cluster: 8 nodes on Amazon Web Service  4 cores Inte Xeon E5-2670 v2 2.5 GHz  16 GB RAM, 2x40 GB SSD  Hadoop 2.7.2 and Spark 1.6.1  Dataset | − −x : double(nullable = false) | − −y : double(nullable = false) | − −label : double(nullable = false) | − −f eatures : vector(nullable = false)
  • 18. Experiments  Testing large training dataset 0 200 400 600 800 1000 1200 10000 100000 1000000 2000000 5000000 Algorithm 1 Algorithm 2 Algorithm 3 Algorithm 4 time (sec). Number of training points
  • 19. Experiments  Testing large regression dataset 0 200 400 600 800 1000 1200 1000 5000 10000 20000 50000 Algorithm 1 Algorithm 2 Algorithm 3 Algorithm 4 time (sec). Number of regression points
  • 20. Experiments  Testing large dataset with increasing number of features 0 200 400 600 800 1000 1200 1400 1600 1800 10 20 50 100 200 Algorithm 1 Algorithm 2 Algorithm 3 Algorithm 4 time (sec). Number of regression points
  • 21. Experiments  Cluster 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2-node 4-node 8-node Algorithm 1 Algorithm 2 Algorithm 3 Algorithm 4 time (sec). Number of nodes
  • 22. Discussion  Related work  Many library GWR on local  Spgwr (multiR on GRID)  Using GPU  Our work  First study distributed GWR on Spark  Easy deployment and the advantages of Spark  Scalable and work well on cluster
  • 23. Conclusion  We have  Propose three approach  Implement four algorithms base on Spark  Evaluate our implementation  Future work  Improve performance by using Pipeline and Partitions  Release as open-source library

Hinweis der Redaktion

  1. Scalability , Performance User-friendly APIs