TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
R at Microsoft
1.
2.
3.
4.
5.
6.
7.
8.
9. Historical black box sensor
records and maintenance events
for many aircraft
Train and compare various models to predict maintenance
events
Scoring rules to predict likely
maintenance events from sensor
data
20. cloud
computing
2011 2016 5x increase
data
science
Universities filling
300,000 US talent gap
90% of the data in the world
today has been created in
the last two years alone
big
data
open
source
including R, Python, Linux, Hadoop,
Spark, …
21. Thank you!
David Smith
R Community Lead
Microsoft
@revodavid
davidsmi@microsoft.com
Revolutions blog
blog.revolutionanalytics.com
blog.revolutionanalytics.com/2016/02/xbox_usage_trends_r.html
www.microsoft.com/en-us/stories/88acres
powerbi.microsoft.com/en-us/industries/airline
blog.revolutionanalytics.com/2016/03/sql-server-2016-launch.html
studio.azureml.net
github.com/RevolutionAnalytics/AzureML
blog.revolutionanalytics.com/2016/03/scoring-r-models-with-excel.html
www.visualstudio.com/en-us/features/rtvs-vs.aspx
mran.microsoft.com/download
22.
23. Microsoft R Server
Big-data analytics and distributed computing on Linux,
Hadoop and Teradata
SQL Server 2016
Big-data analytics integrated with SQL Server database
(coming soon)
PowerBI Computations and charts from R scripts in dashboards
Azure ML Studio R Scripts in cloud-based Experiment workflows
Visual Studio
R Tools for Visual Studio: integrated development
environment for R (coming soon)
HDInsights R integrated with cloud-based Hadoop clusters
Cortana Analytics Cloud-based R APIs and Virtual Machines
https://www.microsoft.com/en-us/stories/88acres/
Energy prediction and load shaping for buildings
Energy costs for Microsoft’s 120 building main-campus are very high, particularly because of the almost exclusive usage of electric heating there. About 10% of these are demand charges (a peak-usage surcharge), and become very pronounced in the winter. To reduce these, we have modeled building energy consumption to predict demand peaks using random forest and boosted trees regression as implemented in the randomForest and gbm packages (sometimes together with caret) and then piloted in our operations center.
Now in a second phase more advanced models were developed to allow this peak-flattening without manual intervention. Transitioning a predictive-model to a command-and-control model like this was complex, and capturing the physical reality required the use of multiple cascaded models, also using tree-based regression techniques. Optimization (to find the best control parameters) and simulation (to gauges the overall impact of intervention) were used and the problems typical for dynamical systems (stabilization, non-convergence, etc.) had to be overcome; these will be addressed in the talks.
All of the development and modelling work was done in R and Shiny using R-Studio and later RTVS, afterwards the R-code and ggplot2 plots were deployed to various platforms including Azure ML, PowerBI and R Services for SQL Server.
Support for R
R packages: AzureML, checkpoint, doParallel, Rhadoop, DeployR Open
Full support for Linux in Azure, and now SQL
Own Linux Distribution
.NET open source