SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
Microsoft R Server on Spark
Purpose:
This lab will demonstrate how to use Microsoft R Server on a Spark cluster. It will start by
outlining the steps to spin up the cluster in Azure, how to install RStudio with R Server, and an
example of how to use ScaleR to analyze data in a Spark cluster.
Pre-requisites
1. Be sure to have your Azure subscription enabled.
2. You will need to have a Secure Shell (SSH) client installed to remotely connect to the
HDInsight cluster and run commands directly on the cluster. This is needed since the
cluster will be using a Linux OS. The recommended client is PuTTY. Use the following link
to download and install PuTTY: PuTTY Download
a. Optionally, you can create an SSH key to connect to your cluster. The following
steps will assume that you are using a password. The following links include more
information on how to create and use SSH keys with HDInsight:
Use SSH with Linux-based Hadoop on HDInsight from Windows
Use SSH with Linux-based Hadoop on HDInsight from Linux, Unix, or OS X
Creating the R Server on Spark Cluster
1. In the Azure portal, select New > Data + Analytics > HDInsight
2. Enter a name in the Cluster Name field and select the appropriate Azure
subscription in the Subscription field.
3. Click Select Cluster Type. On the Cluster Type blade, select the following
options:
a. Cluster Type: R Server on Spark
b. Cluster Tier: Premium
Click Select to save the cluster type configuration.
4. Click Credentials to create the cluster login username and password and the SSH
username and password. This is also where you can upload a key instead of using
a username/password for SSH authentication.
5. Click the Data Source field. Create a new storage account and a default container
for the cluster to use.
6. Click the Pricing field. Here you will be able to specify the number of Worker
nodes, the size of the Worker nodes, the size of the Head nodes and the R server
node size (this is the edge node that you will connect to using SSH to run your R
code). For demo purposes, you can leave the default settings in place.
7. Optionally, you can select External Metastores for Hive and Oozie in the Optional
Configuration field if you have SQL Databases created to store Hive/Oozie job
metadata. For this demo, this option will remain blank.
8. Either create a new Resource group or select an existing on in the Resource
Group field.
9. Click Create to create the cluster.
Installing RStudio with R Server on HDInsight
The following steps assume that you have downloaded and installed PuTTY. Please refer
to the Prerequisites section at the top of this document for the link to download PuTTY.
1. Identify the edge node of the cluster. To find the name of the edge node, select
the recently created HDInsight cluster in the HDInsight Clusters blade. From
there, select Settings > Applications > R Server for HDInsight. The SSH
Endpoint is the name of the edge node for the cluster.
2. SSH into the edge node. Use the following steps to connect to the edge node:
a. To connect to the edge node, open PuTTY. The following is a screenshot of
PuTTY when it is opened up:
b. In the Category pane, select Session. Enter the SSH address of the
HDInsight server in the Host Name (or IP address) text box. This address
could be either the address of the head node or the address of the edge
node. Use the address of the edge node to connect to the edge node and
configure RStudio. Click Open to connect to the cluster.
c. Log in with the SSH credentials that were created when the cluster was
created.
3. Once connected, become a root user on the cluster. Use the following command
in the SSH session:
sudo su -
4. Download the custom script to install RStudio. Use the following command in the
SSH session
wget http://mrsactionscripts.blob.core.windows.net/rstudio-server-community-
v01/InstallRStudio.sh
5. Change the permissions on the custom script file and run the script. Use the
following commands:
chmod 755 InstallRStudio.sh
./InstallRStudio.sh
6. Create an SSH tunnel to the cluster by mapping localhost:8787 on the HDInsight
Cluster to the client machine. This can be done through PuTTY.
a. Open PuTTY, and enter your connection information.
b. In the Category pane, expand Connection, expand SSH, and select
Tunnels.
c. Enter 8787 as the Source port and localhost:8787 as the Destination.
Click Add and then click Open to open an SSH connection.
d. When prompted, log in to the server with your SSH credentials. This will
establish an SSH session and enable the tunnel.
7. Open a web browser and enter the following URL based on the port entered for
the tunnel:
http://localhost:8787/
8. You will be prompted to enter the SSH username and password to connect to the
cluster.
9. The following command will download a test script that executes R based Spark
jobs on the cluster. Run this command from the PuTTY session:
wget http://mrsactionscripts.blob.core.windows.net/rstudio-server-community-
v01/testhdi_spark.r
10. In RStudio, you will see the test script that was just downloaded in the lower right
pane. Double click the file to open it and click Run to run the code.
Use a compute context and simple statistics with ScaleR
A compute context allows you to control whether computation will be performed locally
on the edge node, or whether it will be distributed across the nodes in the HDInsight
cluster.
1. From the R console, use the following to load example data into the default
storage for HDInsight.
# Set the HDFS (WASB) location of example data
bigDataDirRoot <- "/example/data"
# create a local folder for storaging data temporarily
source <- "/tmp/AirOnTimeCSV2012"
dir.create(source)
# Download data to the tmp folder
remoteDir <- "http://packages.revolutionanalytics.com/datasets/AirOnTimeCSV2012"
download.file(file.path(remoteDir, "airOT201201.csv"), file.path(source,
"airOT201201.csv"))
download.file(file.path(remoteDir, "airOT201202.csv"), file.path(source,
"airOT201202.csv"))
download.file(file.path(remoteDir, "airOT201203.csv"), file.path(source,
"airOT201203.csv"))
download.file(file.path(remoteDir, "airOT201204.csv"), file.path(source,
"airOT201204.csv"))
download.file(file.path(remoteDir, "airOT201205.csv"), file.path(source,
"airOT201205.csv"))
download.file(file.path(remoteDir, "airOT201206.csv"), file.path(source,
"airOT201206.csv"))
download.file(file.path(remoteDir, "airOT201207.csv"), file.path(source,
"airOT201207.csv"))
download.file(file.path(remoteDir, "airOT201208.csv"), file.path(source,
"airOT201208.csv"))
download.file(file.path(remoteDir, "airOT201209.csv"), file.path(source,
"airOT201209.csv"))
download.file(file.path(remoteDir, "airOT201210.csv"), file.path(source,
"airOT201210.csv"))
download.file(file.path(remoteDir, "airOT201211.csv"), file.path(source,
"airOT201211.csv"))
download.file(file.path(remoteDir, "airOT201212.csv"), file.path(source,
"airOT201212.csv"))
# Set directory in bigDataDirRoot to load the data into
inputDir <- file.path(bigDataDirRoot,"AirOnTimeCSV2012")
# Make the directory
rxHadoopMakeDir(inputDir)
# Copy the data from source to input
rxHadoopCopyFromLocal(source, bigDataDirRoot)
2. Next, let's create some data info and define two data sources so that we can work
with the data.
# Define the HDFS (WASB) file system
hdfsFS <- RxHdfsFileSystem()
# Create info list for the airline data
airlineColInfo <- list(
DAY_OF_WEEK = list(type = "factor"),
ORIGIN = list(type = "factor"),
DEST = list(type = "factor"),
DEP_TIME = list(type = "integer"),
ARR_DEL15 = list(type = "logical"))
# get all the column names
varNames <- names(airlineColInfo)
# Define the text data source in hdfs
airOnTimeData <- RxTextData(inputDir, colInfo = airlineColInfo, varsToKeep =
varNames, fileSystem = hdfsFS)
# Define the text data source in local system
airOnTimeDataLocal <- RxTextData(source, colInfo = airlineColInfo, varsToKeep =
varNames)
# formula to use
formula = "ARR_DEL15 ~ ORIGIN + DAY_OF_WEEK + DEP_TIME + DEST"
3. Let's run a logistic regression over the data using the local compute context.
# Set a local compute context
rxSetComputeContext("local")
# Run a logistic regression
system.time(
modelLocal <- rxLogit(formula, data = airOnTimeDataLocal)
)
# Display a summary
summary(modelLocal)
4. Next, let's run the same logistic regression using the Spark context. The Spark
context will distribute the processing over all the worker nodes in the HDInsight
cluster.
# Define the Spark compute context
mySparkCluster <- RxSpark()
# Set the compute context
rxSetComputeContext(mySparkCluster)
# Run a logistic regression
system.time(
modelSpark <- rxLogit(formula, data = airOnTimeData)
)
# Display a summary
summary(modelSpark)
ScaleR Example with Linear Regression and Plots
This example will show different compute contexts, how to do linear regression in
RevoScaleR and how to do some simple plots. It utilized airline delay data for airports
across the United States.
#copy local file to HDFS
rxHadoopMakeDir("/share")
rxHadoopCopyFromLocal(system.file("SampleData/AirlineDemoSmall.csv",package="RevoScaleR"), "/share")
myNameNode <- "default"
myPort <- 0
# Location of the data
bigDataDirRoot <- "/share"
# define HDFS file system
hdfsFS <- RxHdfsFileSystem(hostName=myNameNode, port=myPort)
# specify the input file in HDFS to analyze
inputFile <-file.path(bigDataDirRoot,"AirlineDemoSmall.csv")
# create Factors for days of the week
colInfo <- list(DayOfWeek = list(type = "factor",
levels = c("Monday","Tuesday","Wednesday",
"Thursday","Friday","Saturday","Sunday")))
# define the data source
airDS <- RxTextData(file = inputFile, missingValueString = "M",
colInfo = colInfo, fileSystem = hdfsFS)
# First test the "local" compute context
rxSetComputeContext("local")
# Run a linear regression
system.time(
model <- rxLinMod(ArrDelay~CRSDepTime+DayOfWeek, data = airDS)
)
# display a summary of model
summary(model)
# define MapReduce compute context
myHadoopMRCluster <- RxHadoopMR(consoleOutput=TRUE,
nameNode=myNameNode,
port=myPort,
hadoopSwitches="-libjars /etc/hadoop/conf")
# set compute context
rxSetComputeContext(myHadoopMRCluster)
# Run a linear regression
system.time(
model1 <- rxLinMod(ArrDelay~CRSDepTime+DayOfWeek, data = airDS)
)
# display a summary of model
summary(model1)
rxLinePlot(ArrDelay~DayOfWeek, data= airDS)
# define Spark compute context
mySparkCluster <- RxSpark(consoleOutput=TRUE)
# set compute context
rxSetComputeContext(mySparkCluster)
# Run a linear regression
system.time(
model2 <- rxLinMod(ArrDelay~CRSDepTime+DayOfWeek, data = airDS)
)
# display a summary of model
summary(model2)
# Run 4 tasks via rxExec
rxExec( function() {Sys.info()["nodename"]}, timesToRun = 4 )
Wrap Up
This lab was meant to demonstrate how to use Microsoft R Server on a Spark cluster. For
more information, refer to the references listed in the References section.
References
1. https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-r-
server-get-started/
Microsoft R server for distributed computing
The First NIDA Business Analytics and Data Sciences Contest/Conference
วันที่ 1-2 กันยายน 2559 ณ อาคารนวมินทราธิราช สถาบันบัณฑิตพัฒนบริหารศาสตร์
-แนะนํา Microsoft R Server
-Distributed Computing มีวิธีการอย่างไร และมีประโยชน์อย่างไร
-แนะนําวิธีการ Configuration สําหรับ Distributed Computing
https://businessanalyticsnida.wordpress.com
https://www.facebook.com/BusinessAnalyticsNIDA/
กฤษฏิ์ คําตื้อ,
Technical Evangelist,
Microsoft (Thailand)
-Distributed computing กับ Big Data
-Analytics บน R server
-สาธิตและสอนในลักษณะ workshop
Computer Lab 2 ชั้น 10 อาคารสยามบรมราชกุมารี
1 กันยายน 2559 เวลา 9.00-12.30
2. https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-r-
server-install-r-studio/
3. https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-
linux-use-ssh-windows/#connect-to-a-linux-based-hdinsight-cluster

Weitere ähnliche Inhalte

Was ist angesagt?

Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629Mark Tabladillo
 
Introduction to TitanDB
Introduction to TitanDB Introduction to TitanDB
Introduction to TitanDB Knoldus Inc.
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsDataWorks Summit
 
An introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveAn introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveMike Frampton
 
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016StampedeCon
 
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedIs Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedRevolution Analytics
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on HadoopCarol McDonald
 
NoSQL no more: SQL on Druid with Apache Calcite
NoSQL no more: SQL on Druid with Apache CalciteNoSQL no more: SQL on Druid with Apache Calcite
NoSQL no more: SQL on Druid with Apache Calcitegianmerlino
 
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald NowlingInsights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald NowlingSpark Summit
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Spark Summit
 
The Future of Sharding
The Future of ShardingThe Future of Sharding
The Future of ShardingEDB
 
Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop BigDataEverywhere
 
R & Python on Hadoop
R & Python on HadoopR & Python on Hadoop
R & Python on HadoopMing Yuan
 

Was ist angesagt? (20)

Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629
 
Introduction to TitanDB
Introduction to TitanDB Introduction to TitanDB
Introduction to TitanDB
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and Analytics
 
An introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveAn introduction to Apache Hadoop Hive
An introduction to Apache Hadoop Hive
 
MATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and CapabilitiesMATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and Capabilities
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
 
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedIs Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on Hadoop
 
NoSQL no more: SQL on Druid with Apache Calcite
NoSQL no more: SQL on Druid with Apache CalciteNoSQL no more: SQL on Druid with Apache Calcite
NoSQL no more: SQL on Druid with Apache Calcite
 
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald NowlingInsights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
 
Matlab, Big Data, and HDF Server
Matlab, Big Data, and HDF ServerMatlab, Big Data, and HDF Server
Matlab, Big Data, and HDF Server
 
The Future of Sharding
The Future of ShardingThe Future of Sharding
The Future of Sharding
 
Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop
 
HDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the CloudHDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the Cloud
 
R & Python on Hadoop
R & Python on HadoopR & Python on Hadoop
R & Python on Hadoop
 

Andere mochten auch

microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computingBAINIDA
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Revolution Analytics
 
Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)Srivatsan Ramanujam
 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopVictoria López
 
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
Accelerating R analytics with Spark and  Microsoft R Server  for HadoopAccelerating R analytics with Spark and  Microsoft R Server  for Hadoop
Accelerating R analytics with Spark and Microsoft R Server for HadoopWilly Marroquin (WillyDevNET)
 
Distributed Computing Patterns in R
Distributed Computing Patterns in RDistributed Computing Patterns in R
Distributed Computing Patterns in Rarmstrtw
 
Nida event oracle business analytics 1 sep2016
Nida event   oracle business analytics 1 sep2016Nida event   oracle business analytics 1 sep2016
Nida event oracle business analytics 1 sep2016BAINIDA
 
ระบบการเรียนการสอนระยะไกลโดยใช้เทคโนโลยีคลาวด์ โดย รศ. ดร. พิพัฒน์ หิรัญวณิชช...
ระบบการเรียนการสอนระยะไกลโดยใช้เทคโนโลยีคลาวด์ โดย รศ. ดร. พิพัฒน์ หิรัญวณิชช...ระบบการเรียนการสอนระยะไกลโดยใช้เทคโนโลยีคลาวด์ โดย รศ. ดร. พิพัฒน์ หิรัญวณิชช...
ระบบการเรียนการสอนระยะไกลโดยใช้เทคโนโลยีคลาวด์ โดย รศ. ดร. พิพัฒน์ หิรัญวณิชช...BAINIDA
 
Second prize data analysis @ the First NIDA business analytics and data scie...
Second prize data analysis @ the First NIDA  business analytics and data scie...Second prize data analysis @ the First NIDA  business analytics and data scie...
Second prize data analysis @ the First NIDA business analytics and data scie...BAINIDA
 
R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...
R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...
R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...BAINIDA
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint packageRevolution Analytics
 
Big Data Analysis With RHadoop
Big Data Analysis With RHadoopBig Data Analysis With RHadoop
Big Data Analysis With RHadoopDavid Chiu
 
Oracle Enterprise Performance Management
Oracle Enterprise Performance ManagementOracle Enterprise Performance Management
Oracle Enterprise Performance ManagementBAINIDA
 
DeployR: Revolution R Enterprise with Business Intelligence Applications
DeployR: Revolution R Enterprise with Business Intelligence ApplicationsDeployR: Revolution R Enterprise with Business Intelligence Applications
DeployR: Revolution R Enterprise with Business Intelligence ApplicationsRevolution Analytics
 
Tableau for statistical graphic and data visualization
Tableau for statistical graphic and data visualizationTableau for statistical graphic and data visualization
Tableau for statistical graphic and data visualizationBAINIDA
 
Second prize business plan @ the First NIDA business analytics and data scien...
Second prize business plan @ the First NIDA business analytics and data scien...Second prize business plan @ the First NIDA business analytics and data scien...
Second prize business plan @ the First NIDA business analytics and data scien...BAINIDA
 
ผลการวิเคราะห์ข้อมูลของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analyti...
ผลการวิเคราะห์ข้อมูลของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analyti...ผลการวิเคราะห์ข้อมูลของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analyti...
ผลการวิเคราะห์ข้อมูลของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analyti...BAINIDA
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionRevolution Analytics
 
cybersecurity regulation for thai capital market ดร.กำพล ศรธนะรัตน์ ผู้อำนวย...
cybersecurity regulation for thai capital market  ดร.กำพล ศรธนะรัตน์ ผู้อำนวย...cybersecurity regulation for thai capital market  ดร.กำพล ศรธนะรัตน์ ผู้อำนวย...
cybersecurity regulation for thai capital market ดร.กำพล ศรธนะรัตน์ ผู้อำนวย...BAINIDA
 

Andere mochten auch (20)

microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
 
All thingspython@pivotal
All thingspython@pivotalAll thingspython@pivotal
All thingspython@pivotal
 
Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)
 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoop
 
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
Accelerating R analytics with Spark and  Microsoft R Server  for HadoopAccelerating R analytics with Spark and  Microsoft R Server  for Hadoop
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
 
Distributed Computing Patterns in R
Distributed Computing Patterns in RDistributed Computing Patterns in R
Distributed Computing Patterns in R
 
Nida event oracle business analytics 1 sep2016
Nida event   oracle business analytics 1 sep2016Nida event   oracle business analytics 1 sep2016
Nida event oracle business analytics 1 sep2016
 
ระบบการเรียนการสอนระยะไกลโดยใช้เทคโนโลยีคลาวด์ โดย รศ. ดร. พิพัฒน์ หิรัญวณิชช...
ระบบการเรียนการสอนระยะไกลโดยใช้เทคโนโลยีคลาวด์ โดย รศ. ดร. พิพัฒน์ หิรัญวณิชช...ระบบการเรียนการสอนระยะไกลโดยใช้เทคโนโลยีคลาวด์ โดย รศ. ดร. พิพัฒน์ หิรัญวณิชช...
ระบบการเรียนการสอนระยะไกลโดยใช้เทคโนโลยีคลาวด์ โดย รศ. ดร. พิพัฒน์ หิรัญวณิชช...
 
Second prize data analysis @ the First NIDA business analytics and data scie...
Second prize data analysis @ the First NIDA  business analytics and data scie...Second prize data analysis @ the First NIDA  business analytics and data scie...
Second prize data analysis @ the First NIDA business analytics and data scie...
 
R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...
R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...
R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
 
Big Data Analysis With RHadoop
Big Data Analysis With RHadoopBig Data Analysis With RHadoop
Big Data Analysis With RHadoop
 
Oracle Enterprise Performance Management
Oracle Enterprise Performance ManagementOracle Enterprise Performance Management
Oracle Enterprise Performance Management
 
DeployR: Revolution R Enterprise with Business Intelligence Applications
DeployR: Revolution R Enterprise with Business Intelligence ApplicationsDeployR: Revolution R Enterprise with Business Intelligence Applications
DeployR: Revolution R Enterprise with Business Intelligence Applications
 
Tableau for statistical graphic and data visualization
Tableau for statistical graphic and data visualizationTableau for statistical graphic and data visualization
Tableau for statistical graphic and data visualization
 
Second prize business plan @ the First NIDA business analytics and data scien...
Second prize business plan @ the First NIDA business analytics and data scien...Second prize business plan @ the First NIDA business analytics and data scien...
Second prize business plan @ the First NIDA business analytics and data scien...
 
ผลการวิเคราะห์ข้อมูลของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analyti...
ผลการวิเคราะห์ข้อมูลของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analyti...ผลการวิเคราะห์ข้อมูลของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analyti...
ผลการวิเคราะห์ข้อมูลของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analyti...
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and Revolution
 
cybersecurity regulation for thai capital market ดร.กำพล ศรธนะรัตน์ ผู้อำนวย...
cybersecurity regulation for thai capital market  ดร.กำพล ศรธนะรัตน์ ผู้อำนวย...cybersecurity regulation for thai capital market  ดร.กำพล ศรธนะรัตน์ ผู้อำนวย...
cybersecurity regulation for thai capital market ดร.กำพล ศรธนะรัตน์ ผู้อำนวย...
 

Ähnlich wie R server and spark

Quick-Start Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Quick-Start Guide: Deploying Your Cloudian HyperStore Hybrid Storage ServiceQuick-Start Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Quick-Start Guide: Deploying Your Cloudian HyperStore Hybrid Storage ServiceCloudian
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2benjaminwootton
 
How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...
How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...
How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...Cloudian
 
How to become cloud backup provider
How to become cloud backup providerHow to become cloud backup provider
How to become cloud backup providerCLOUDIAN KK
 
How to Become Cloud Backup Provider
How to Become Cloud Backup ProviderHow to Become Cloud Backup Provider
How to Become Cloud Backup ProviderCloudian
 
reModernize-Updating and Consolidating MySQL
reModernize-Updating and Consolidating MySQLreModernize-Updating and Consolidating MySQL
reModernize-Updating and Consolidating MySQLAmazon Web Services
 
Usage Note of SWIG for PHP
Usage Note of SWIG for PHPUsage Note of SWIG for PHP
Usage Note of SWIG for PHPWilliam Lee
 
Lab Manual reModernize - Updating and Consolidating MySQL
Lab Manual reModernize - Updating and Consolidating MySQLLab Manual reModernize - Updating and Consolidating MySQL
Lab Manual reModernize - Updating and Consolidating MySQLAmazon Web Services
 
Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2
 Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2   Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2
Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2 Adil Khan
 
Content server installation guide
Content server installation guideContent server installation guide
Content server installation guideNaveed Bashir
 
Hands-on Lab: re-Modernize - Updating and Consolidating MySQL
Hands-on Lab: re-Modernize - Updating and Consolidating MySQLHands-on Lab: re-Modernize - Updating and Consolidating MySQL
Hands-on Lab: re-Modernize - Updating and Consolidating MySQLAmazon Web Services
 
Drupal Continuous Integration with Jenkins - Deploy
Drupal Continuous Integration with Jenkins - DeployDrupal Continuous Integration with Jenkins - Deploy
Drupal Continuous Integration with Jenkins - DeployJohn Smith
 
Get started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseGet started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseHenk van der Valk
 
Securing Windows Remote Desktop With Copssh
Securing Windows Remote Desktop With CopsshSecuring Windows Remote Desktop With Copssh
Securing Windows Remote Desktop With CopsshCrismer La Pignola
 
Cloud init and cloud provisioning [openstack summit vancouver]
Cloud init and cloud provisioning [openstack summit vancouver]Cloud init and cloud provisioning [openstack summit vancouver]
Cloud init and cloud provisioning [openstack summit vancouver]Joshua Harlow
 
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context ConstraintsAlessandro Arrichiello
 

Ähnlich wie R server and spark (20)

Quick-Start Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Quick-Start Guide: Deploying Your Cloudian HyperStore Hybrid Storage ServiceQuick-Start Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Quick-Start Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
 
Big datademo
Big datademoBig datademo
Big datademo
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
 
linux installation.pdf
linux installation.pdflinux installation.pdf
linux installation.pdf
 
Book
BookBook
Book
 
How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...
How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...
How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...
 
One-Man Ops
One-Man OpsOne-Man Ops
One-Man Ops
 
How to become cloud backup provider
How to become cloud backup providerHow to become cloud backup provider
How to become cloud backup provider
 
How to Become Cloud Backup Provider
How to Become Cloud Backup ProviderHow to Become Cloud Backup Provider
How to Become Cloud Backup Provider
 
reModernize-Updating and Consolidating MySQL
reModernize-Updating and Consolidating MySQLreModernize-Updating and Consolidating MySQL
reModernize-Updating and Consolidating MySQL
 
Usage Note of SWIG for PHP
Usage Note of SWIG for PHPUsage Note of SWIG for PHP
Usage Note of SWIG for PHP
 
Lab Manual reModernize - Updating and Consolidating MySQL
Lab Manual reModernize - Updating and Consolidating MySQLLab Manual reModernize - Updating and Consolidating MySQL
Lab Manual reModernize - Updating and Consolidating MySQL
 
Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2
 Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2   Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2
Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2
 
Content server installation guide
Content server installation guideContent server installation guide
Content server installation guide
 
Hands-on Lab: re-Modernize - Updating and Consolidating MySQL
Hands-on Lab: re-Modernize - Updating and Consolidating MySQLHands-on Lab: re-Modernize - Updating and Consolidating MySQL
Hands-on Lab: re-Modernize - Updating and Consolidating MySQL
 
Drupal Continuous Integration with Jenkins - Deploy
Drupal Continuous Integration with Jenkins - DeployDrupal Continuous Integration with Jenkins - Deploy
Drupal Continuous Integration with Jenkins - Deploy
 
Get started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseGet started with Microsoft SQL Polybase
Get started with Microsoft SQL Polybase
 
Securing Windows Remote Desktop With Copssh
Securing Windows Remote Desktop With CopsshSecuring Windows Remote Desktop With Copssh
Securing Windows Remote Desktop With Copssh
 
Cloud init and cloud provisioning [openstack summit vancouver]
Cloud init and cloud provisioning [openstack summit vancouver]Cloud init and cloud provisioning [openstack summit vancouver]
Cloud init and cloud provisioning [openstack summit vancouver]
 
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
 

Mehr von BAINIDA

Mixed methods in social and behavioral sciences
Mixed methods in social and behavioral sciencesMixed methods in social and behavioral sciences
Mixed methods in social and behavioral sciencesBAINIDA
 
Advanced quantitative research methods in political science and pa
Advanced quantitative  research methods in political science and paAdvanced quantitative  research methods in political science and pa
Advanced quantitative research methods in political science and paBAINIDA
 
Latest thailand election2019report
Latest thailand election2019reportLatest thailand election2019report
Latest thailand election2019reportBAINIDA
 
Data science in medicine
Data science in medicineData science in medicine
Data science in medicineBAINIDA
 
Nursing data science
Nursing data scienceNursing data science
Nursing data scienceBAINIDA
 
Financial time series analysis with R@the 3rd NIDA BADS conference by Asst. p...
Financial time series analysis with R@the 3rd NIDA BADS conference by Asst. p...Financial time series analysis with R@the 3rd NIDA BADS conference by Asst. p...
Financial time series analysis with R@the 3rd NIDA BADS conference by Asst. p...BAINIDA
 
Statistics and big data for justice and fairness
Statistics and big data for justice and fairnessStatistics and big data for justice and fairness
Statistics and big data for justice and fairnessBAINIDA
 
Data science and big data for business and industrial application
Data science and big data  for business and industrial applicationData science and big data  for business and industrial application
Data science and big data for business and industrial applicationBAINIDA
 
Update trend: Free digital marketing metrics for start-up
Update trend: Free digital marketing metrics for start-upUpdate trend: Free digital marketing metrics for start-up
Update trend: Free digital marketing metrics for start-upBAINIDA
 
Advent of ds and stat adjustment
Advent of ds and stat adjustmentAdvent of ds and stat adjustment
Advent of ds and stat adjustmentBAINIDA
 
เมื่อ Data Science เข้ามา สถิติศาสตร์จะปรับตัวอย่างไร
เมื่อ Data Science เข้ามา สถิติศาสตร์จะปรับตัวอย่างไร เมื่อ Data Science เข้ามา สถิติศาสตร์จะปรับตัวอย่างไร
เมื่อ Data Science เข้ามา สถิติศาสตร์จะปรับตัวอย่างไร BAINIDA
 
Data visualization. map
Data visualization. map Data visualization. map
Data visualization. map BAINIDA
 
Dark data by Worapol Alex Pongpech
Dark data by Worapol Alex PongpechDark data by Worapol Alex Pongpech
Dark data by Worapol Alex PongpechBAINIDA
 
Deepcut Thai word Segmentation @ NIDA
Deepcut Thai word Segmentation @ NIDADeepcut Thai word Segmentation @ NIDA
Deepcut Thai word Segmentation @ NIDABAINIDA
 
Professionals and wanna be in Business Analytics and Data Science
Professionals and wanna be in Business Analytics and Data ScienceProfessionals and wanna be in Business Analytics and Data Science
Professionals and wanna be in Business Analytics and Data ScienceBAINIDA
 
Deep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitDeep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitBAINIDA
 
Visualizing for impact final
Visualizing for impact finalVisualizing for impact final
Visualizing for impact finalBAINIDA
 
Python programming workshop
Python programming workshopPython programming workshop
Python programming workshopBAINIDA
 
แผนธุรกิจ ของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analytics and Dat...
แผนธุรกิจ ของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analytics and Dat...แผนธุรกิจ ของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analytics and Dat...
แผนธุรกิจ ของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analytics and Dat...BAINIDA
 
Oracle Enterprise Performance Management Overview
Oracle Enterprise Performance Management OverviewOracle Enterprise Performance Management Overview
Oracle Enterprise Performance Management OverviewBAINIDA
 

Mehr von BAINIDA (20)

Mixed methods in social and behavioral sciences
Mixed methods in social and behavioral sciencesMixed methods in social and behavioral sciences
Mixed methods in social and behavioral sciences
 
Advanced quantitative research methods in political science and pa
Advanced quantitative  research methods in political science and paAdvanced quantitative  research methods in political science and pa
Advanced quantitative research methods in political science and pa
 
Latest thailand election2019report
Latest thailand election2019reportLatest thailand election2019report
Latest thailand election2019report
 
Data science in medicine
Data science in medicineData science in medicine
Data science in medicine
 
Nursing data science
Nursing data scienceNursing data science
Nursing data science
 
Financial time series analysis with R@the 3rd NIDA BADS conference by Asst. p...
Financial time series analysis with R@the 3rd NIDA BADS conference by Asst. p...Financial time series analysis with R@the 3rd NIDA BADS conference by Asst. p...
Financial time series analysis with R@the 3rd NIDA BADS conference by Asst. p...
 
Statistics and big data for justice and fairness
Statistics and big data for justice and fairnessStatistics and big data for justice and fairness
Statistics and big data for justice and fairness
 
Data science and big data for business and industrial application
Data science and big data  for business and industrial applicationData science and big data  for business and industrial application
Data science and big data for business and industrial application
 
Update trend: Free digital marketing metrics for start-up
Update trend: Free digital marketing metrics for start-upUpdate trend: Free digital marketing metrics for start-up
Update trend: Free digital marketing metrics for start-up
 
Advent of ds and stat adjustment
Advent of ds and stat adjustmentAdvent of ds and stat adjustment
Advent of ds and stat adjustment
 
เมื่อ Data Science เข้ามา สถิติศาสตร์จะปรับตัวอย่างไร
เมื่อ Data Science เข้ามา สถิติศาสตร์จะปรับตัวอย่างไร เมื่อ Data Science เข้ามา สถิติศาสตร์จะปรับตัวอย่างไร
เมื่อ Data Science เข้ามา สถิติศาสตร์จะปรับตัวอย่างไร
 
Data visualization. map
Data visualization. map Data visualization. map
Data visualization. map
 
Dark data by Worapol Alex Pongpech
Dark data by Worapol Alex PongpechDark data by Worapol Alex Pongpech
Dark data by Worapol Alex Pongpech
 
Deepcut Thai word Segmentation @ NIDA
Deepcut Thai word Segmentation @ NIDADeepcut Thai word Segmentation @ NIDA
Deepcut Thai word Segmentation @ NIDA
 
Professionals and wanna be in Business Analytics and Data Science
Professionals and wanna be in Business Analytics and Data ScienceProfessionals and wanna be in Business Analytics and Data Science
Professionals and wanna be in Business Analytics and Data Science
 
Deep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitDeep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr Sanparit
 
Visualizing for impact final
Visualizing for impact finalVisualizing for impact final
Visualizing for impact final
 
Python programming workshop
Python programming workshopPython programming workshop
Python programming workshop
 
แผนธุรกิจ ของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analytics and Dat...
แผนธุรกิจ ของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analytics and Dat...แผนธุรกิจ ของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analytics and Dat...
แผนธุรกิจ ของทีมที่ได้รางวัลชนะเลิศ The First NIDA Business Analytics and Dat...
 
Oracle Enterprise Performance Management Overview
Oracle Enterprise Performance Management OverviewOracle Enterprise Performance Management Overview
Oracle Enterprise Performance Management Overview
 

Kürzlich hochgeladen

4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsManeerUddin
 

Kürzlich hochgeladen (20)

4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture hons
 

R server and spark

  • 1. Microsoft R Server on Spark Purpose: This lab will demonstrate how to use Microsoft R Server on a Spark cluster. It will start by outlining the steps to spin up the cluster in Azure, how to install RStudio with R Server, and an example of how to use ScaleR to analyze data in a Spark cluster. Pre-requisites 1. Be sure to have your Azure subscription enabled. 2. You will need to have a Secure Shell (SSH) client installed to remotely connect to the HDInsight cluster and run commands directly on the cluster. This is needed since the cluster will be using a Linux OS. The recommended client is PuTTY. Use the following link to download and install PuTTY: PuTTY Download a. Optionally, you can create an SSH key to connect to your cluster. The following steps will assume that you are using a password. The following links include more information on how to create and use SSH keys with HDInsight: Use SSH with Linux-based Hadoop on HDInsight from Windows Use SSH with Linux-based Hadoop on HDInsight from Linux, Unix, or OS X Creating the R Server on Spark Cluster 1. In the Azure portal, select New > Data + Analytics > HDInsight
  • 2. 2. Enter a name in the Cluster Name field and select the appropriate Azure subscription in the Subscription field. 3. Click Select Cluster Type. On the Cluster Type blade, select the following options: a. Cluster Type: R Server on Spark b. Cluster Tier: Premium Click Select to save the cluster type configuration. 4. Click Credentials to create the cluster login username and password and the SSH username and password. This is also where you can upload a key instead of using a username/password for SSH authentication. 5. Click the Data Source field. Create a new storage account and a default container for the cluster to use. 6. Click the Pricing field. Here you will be able to specify the number of Worker nodes, the size of the Worker nodes, the size of the Head nodes and the R server
  • 3. node size (this is the edge node that you will connect to using SSH to run your R code). For demo purposes, you can leave the default settings in place. 7. Optionally, you can select External Metastores for Hive and Oozie in the Optional Configuration field if you have SQL Databases created to store Hive/Oozie job metadata. For this demo, this option will remain blank. 8. Either create a new Resource group or select an existing on in the Resource Group field. 9. Click Create to create the cluster. Installing RStudio with R Server on HDInsight The following steps assume that you have downloaded and installed PuTTY. Please refer to the Prerequisites section at the top of this document for the link to download PuTTY. 1. Identify the edge node of the cluster. To find the name of the edge node, select the recently created HDInsight cluster in the HDInsight Clusters blade. From there, select Settings > Applications > R Server for HDInsight. The SSH Endpoint is the name of the edge node for the cluster. 2. SSH into the edge node. Use the following steps to connect to the edge node:
  • 4. a. To connect to the edge node, open PuTTY. The following is a screenshot of PuTTY when it is opened up: b. In the Category pane, select Session. Enter the SSH address of the HDInsight server in the Host Name (or IP address) text box. This address could be either the address of the head node or the address of the edge node. Use the address of the edge node to connect to the edge node and configure RStudio. Click Open to connect to the cluster.
  • 5. c. Log in with the SSH credentials that were created when the cluster was created. 3. Once connected, become a root user on the cluster. Use the following command in the SSH session: sudo su - 4. Download the custom script to install RStudio. Use the following command in the SSH session wget http://mrsactionscripts.blob.core.windows.net/rstudio-server-community- v01/InstallRStudio.sh 5. Change the permissions on the custom script file and run the script. Use the following commands: chmod 755 InstallRStudio.sh ./InstallRStudio.sh
  • 6. 6. Create an SSH tunnel to the cluster by mapping localhost:8787 on the HDInsight Cluster to the client machine. This can be done through PuTTY. a. Open PuTTY, and enter your connection information. b. In the Category pane, expand Connection, expand SSH, and select Tunnels. c. Enter 8787 as the Source port and localhost:8787 as the Destination. Click Add and then click Open to open an SSH connection. d. When prompted, log in to the server with your SSH credentials. This will establish an SSH session and enable the tunnel. 7. Open a web browser and enter the following URL based on the port entered for the tunnel: http://localhost:8787/ 8. You will be prompted to enter the SSH username and password to connect to the cluster.
  • 7. 9. The following command will download a test script that executes R based Spark jobs on the cluster. Run this command from the PuTTY session: wget http://mrsactionscripts.blob.core.windows.net/rstudio-server-community- v01/testhdi_spark.r 10. In RStudio, you will see the test script that was just downloaded in the lower right pane. Double click the file to open it and click Run to run the code. Use a compute context and simple statistics with ScaleR A compute context allows you to control whether computation will be performed locally on the edge node, or whether it will be distributed across the nodes in the HDInsight cluster. 1. From the R console, use the following to load example data into the default storage for HDInsight. # Set the HDFS (WASB) location of example data bigDataDirRoot <- "/example/data" # create a local folder for storaging data temporarily source <- "/tmp/AirOnTimeCSV2012" dir.create(source) # Download data to the tmp folder remoteDir <- "http://packages.revolutionanalytics.com/datasets/AirOnTimeCSV2012" download.file(file.path(remoteDir, "airOT201201.csv"), file.path(source, "airOT201201.csv")) download.file(file.path(remoteDir, "airOT201202.csv"), file.path(source, "airOT201202.csv")) download.file(file.path(remoteDir, "airOT201203.csv"), file.path(source, "airOT201203.csv")) download.file(file.path(remoteDir, "airOT201204.csv"), file.path(source, "airOT201204.csv")) download.file(file.path(remoteDir, "airOT201205.csv"), file.path(source, "airOT201205.csv")) download.file(file.path(remoteDir, "airOT201206.csv"), file.path(source, "airOT201206.csv")) download.file(file.path(remoteDir, "airOT201207.csv"), file.path(source, "airOT201207.csv")) download.file(file.path(remoteDir, "airOT201208.csv"), file.path(source, "airOT201208.csv")) download.file(file.path(remoteDir, "airOT201209.csv"), file.path(source, "airOT201209.csv")) download.file(file.path(remoteDir, "airOT201210.csv"), file.path(source, "airOT201210.csv"))
  • 8. download.file(file.path(remoteDir, "airOT201211.csv"), file.path(source, "airOT201211.csv")) download.file(file.path(remoteDir, "airOT201212.csv"), file.path(source, "airOT201212.csv")) # Set directory in bigDataDirRoot to load the data into inputDir <- file.path(bigDataDirRoot,"AirOnTimeCSV2012") # Make the directory rxHadoopMakeDir(inputDir) # Copy the data from source to input rxHadoopCopyFromLocal(source, bigDataDirRoot) 2. Next, let's create some data info and define two data sources so that we can work with the data. # Define the HDFS (WASB) file system hdfsFS <- RxHdfsFileSystem() # Create info list for the airline data airlineColInfo <- list( DAY_OF_WEEK = list(type = "factor"), ORIGIN = list(type = "factor"), DEST = list(type = "factor"), DEP_TIME = list(type = "integer"), ARR_DEL15 = list(type = "logical")) # get all the column names varNames <- names(airlineColInfo) # Define the text data source in hdfs airOnTimeData <- RxTextData(inputDir, colInfo = airlineColInfo, varsToKeep = varNames, fileSystem = hdfsFS) # Define the text data source in local system airOnTimeDataLocal <- RxTextData(source, colInfo = airlineColInfo, varsToKeep = varNames) # formula to use formula = "ARR_DEL15 ~ ORIGIN + DAY_OF_WEEK + DEP_TIME + DEST" 3. Let's run a logistic regression over the data using the local compute context. # Set a local compute context rxSetComputeContext("local") # Run a logistic regression system.time( modelLocal <- rxLogit(formula, data = airOnTimeDataLocal) ) # Display a summary summary(modelLocal)
  • 9. 4. Next, let's run the same logistic regression using the Spark context. The Spark context will distribute the processing over all the worker nodes in the HDInsight cluster. # Define the Spark compute context mySparkCluster <- RxSpark() # Set the compute context rxSetComputeContext(mySparkCluster) # Run a logistic regression system.time( modelSpark <- rxLogit(formula, data = airOnTimeData) ) # Display a summary summary(modelSpark) ScaleR Example with Linear Regression and Plots This example will show different compute contexts, how to do linear regression in RevoScaleR and how to do some simple plots. It utilized airline delay data for airports across the United States. #copy local file to HDFS rxHadoopMakeDir("/share") rxHadoopCopyFromLocal(system.file("SampleData/AirlineDemoSmall.csv",package="RevoScaleR"), "/share") myNameNode <- "default" myPort <- 0 # Location of the data bigDataDirRoot <- "/share" # define HDFS file system hdfsFS <- RxHdfsFileSystem(hostName=myNameNode, port=myPort) # specify the input file in HDFS to analyze inputFile <-file.path(bigDataDirRoot,"AirlineDemoSmall.csv") # create Factors for days of the week colInfo <- list(DayOfWeek = list(type = "factor", levels = c("Monday","Tuesday","Wednesday", "Thursday","Friday","Saturday","Sunday"))) # define the data source airDS <- RxTextData(file = inputFile, missingValueString = "M", colInfo = colInfo, fileSystem = hdfsFS) # First test the "local" compute context rxSetComputeContext("local") # Run a linear regression system.time(
  • 10. model <- rxLinMod(ArrDelay~CRSDepTime+DayOfWeek, data = airDS) ) # display a summary of model summary(model) # define MapReduce compute context myHadoopMRCluster <- RxHadoopMR(consoleOutput=TRUE, nameNode=myNameNode, port=myPort, hadoopSwitches="-libjars /etc/hadoop/conf") # set compute context rxSetComputeContext(myHadoopMRCluster) # Run a linear regression system.time( model1 <- rxLinMod(ArrDelay~CRSDepTime+DayOfWeek, data = airDS) ) # display a summary of model summary(model1) rxLinePlot(ArrDelay~DayOfWeek, data= airDS) # define Spark compute context mySparkCluster <- RxSpark(consoleOutput=TRUE) # set compute context rxSetComputeContext(mySparkCluster) # Run a linear regression system.time( model2 <- rxLinMod(ArrDelay~CRSDepTime+DayOfWeek, data = airDS) ) # display a summary of model summary(model2) # Run 4 tasks via rxExec rxExec( function() {Sys.info()["nodename"]}, timesToRun = 4 ) Wrap Up This lab was meant to demonstrate how to use Microsoft R Server on a Spark cluster. For more information, refer to the references listed in the References section. References 1. https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-r- server-get-started/
  • 11. Microsoft R server for distributed computing The First NIDA Business Analytics and Data Sciences Contest/Conference วันที่ 1-2 กันยายน 2559 ณ อาคารนวมินทราธิราช สถาบันบัณฑิตพัฒนบริหารศาสตร์ -แนะนํา Microsoft R Server -Distributed Computing มีวิธีการอย่างไร และมีประโยชน์อย่างไร -แนะนําวิธีการ Configuration สําหรับ Distributed Computing https://businessanalyticsnida.wordpress.com https://www.facebook.com/BusinessAnalyticsNIDA/ กฤษฏิ์ คําตื้อ, Technical Evangelist, Microsoft (Thailand) -Distributed computing กับ Big Data -Analytics บน R server -สาธิตและสอนในลักษณะ workshop Computer Lab 2 ชั้น 10 อาคารสยามบรมราชกุมารี 1 กันยายน 2559 เวลา 9.00-12.30