Automated Reports with Rstudio Server
Automated KPI reporting with Shiny Server
Process Validation Documentation with Jupyter Notebook
Automated Machine Learning with Dataiku
10. • Docker on Ubuntu 16.04 Server
• From the docker window, run:
• sudo docker run -d -p 8787:8787 rocker/rstudio
• e.g. http://yourIP:8787, and you should be greeted by the RStudio
welcome screen.
Log in using:
• username: rstudio
• password: rstudio
RStudio Server - Install
17. R – Scraper – OpenMpi
• MPI : Message Passing Interface is a specification for an API for passing
messages between different computers.
• Programming with MPI
• Difficult because of Rmpi package defines about 110 R functions
• Needs a parallel programming system to do the actual work in parallel
• The doMPI package acts as an adaptor to the Rmpi package, which in
turn is an R interface to an implementation of MPI
• Very easy to install Open MPI, and Rmpi on Debian / Ubuntu
• You can test with one computer
19. R – Scraper – Test doMpi
library(doMPI)
#start your cluster
cl <- startMPIcluster(count=20)
registerDoMPI(cl)
#
max <- dim(mydataset)[1]
x <- foreach(i=1:max, .combine="rbind") %dopar% seocrawlerThread(mydataset,i)
#close your cluster
closeCluster(cl)
27. • Dplyr
• Readxl
• SearchConsoleR
• googleAuthR
• googleAnalyticsR
R – Packages SEO
Thanks to Mark Edmondson
28. R – SearchConsoleR
library(googleAuthR)
library(searchConsoleR)
# get your password on google console api
options("searchConsoleR.client_id" = "41078866233615q3i3uXXXX.apps.googleusercontent.com")
options("searchConsoleR.client_secret" = "GO0m0XXXXXXXXXX")
## change this to the website you want to download data for. Include http
website <- "https://data-seo.fr"
## data is in search console reliably 3 days ago, so we donwnload from then
## today - 3 days
start <- Sys.Date() - 3
## one days data, but change it as needed
end <- Sys.Date() - 3
29. R – SearchConsoleR
## what to download, choose between data, query, page, device, country
download_dimensions <- c('date','query')
## what type of Google search, choose between 'web', 'video' or 'image'
type <- c('web')
## Authorize script with Search Console.
## First time you will need to login to Google but should auto-refresh after that so can be put in
## Authorize script with an account that has access to website.
googleAuthR::gar_auth()
## first time stop here and wait for authorisation
## get the search analytics data
data <- search_analytics(siteURL = website, startDate = start, endDate = end, dimensions =
download_dimensions, searchType = type)
30.
31. • Table: Crontab Fields and Allowed Ranges (Linux Crontab Syntax)
• MIN Minute field 0 to 59
• HOUR Hour field 0 to 23
• DOM Day of Month 1-31
• MON Month field 1-12
• DOW Day Of Week 0-6
• CMD Command Any command to be executed.
• $ crontab –e
• Run the R script filePath.R at 23:15 for every day of the year :
15 23 * * * Rscript filePath.R
R – CronTab – Method 1
32. • R Package : https://github.com/bnosac/cronR
R – Cron – Method 2
library(cronR)
cron_add(cmd, frequency = 'hourly', id = 'job4', at = '00:20',
days_of_week = c(1, 2))
cron_add(cmd, frequency = 'daily', id = 'job5', at = '14:20')
cron_add(cmd, frequency = 'daily', id = 'job6', at = '14:20',
days_of_week = c(0, 3, 5))
OR
36. Shiny Server – Where and How
• ShinyApps.io
• A local server
• Hosted on your server
37. • docker run --rm -p 3838:3838
-v /srv/shinyapps/:/srv/shiny-server/
-v /srv/shinylog/:/var/log/
rocker/shiny
• If you have an app in /srv/shinyapps/appdir, you can run the app
by visiting http://yourIP:3838/appdir/.
Shiny Server - Install
38. Shiny – ui.R
fluidPage(
titlePanel("Compute your internal pagerank"),
sidebarLayout(
sidebarPanel(
a("data-seo.com", href="https://data-seo.com"),
tags$hr(),
p('Step 1 : Export your outlinks data from ScreamingFrog'),
fileInput('file1', 'Choose file to upload (e.g. all_outlinks.csv)',
accept = c('text/csv'), multiple = FALSE
),
tags$hr(),
downloadButton('downloadData', 'Download CSV')
),
mainPanel(
h3(textOutput("caption")),
tags$hr(),
tableOutput('contents')
)
)
)
40. https://mark.shinyapps.io/GA-dashboard-demo
Code on Github: https://github.com/MarkEdmondson1234/ga-dashboard-demo
• Interactive trend graphs.
• Auto-updating Google Analytics data.
• Zoomable day-of-week heatmaps.
• Top Level Trends via Year on Year, Month on Month
and Last Month vs Month Last Year data modules.
• A MySQL connection for data blending your own data with GA data.
• An easy upload option to update a MySQL database.
• Analysis of the impact of marketing events via Google's CausalImpact.
• Detection of unusual time-points using Twitter's Anomaly Detection.
Shiny – Use case
52. $ adduser vincent sudo
$ sudo apt-get install default-jre
$ wget https://downloads.dataiku.com/public/studio/4.0.1/dataiku-dss-4.0.1.tar.gz
$ tar xzf dataiku-dss-4.0.1.tar.gz
$ cd dataiku-dss-4.0.1
>> install all prerequites
$ sudo -i "/home/dataiku-dss-4.0.1/scripts/install/install-deps.sh" -without-java
>> install dataiku
$ ./installer.sh -d DATA_DIR -p 11000
$ DATA_DIR/bin/dss start
http://<your server address>:11000.
Dataiku- Install on Instance Cloud
53. Go to the DSS data dir
$ cd DATADIR
Stop DSS
$ ./bin/dss stop
Run the installation script
$ ./bin/dssadmin install-R-integration
$ ./bin/dss start
Dataiku- Install R
56. • Get all your featured snippet with Ranxplorer
• Get SERP for each keywords with Ranxplorer
• Use homemade scraper to enrich data :
• 'Keyword' 'Domain' 'StatusCode' 'ContentType' 'LastModified' 'Location'
• 'Title' 'TitleLength' 'TitleDist' 'TitleIsQuestion'
• 'noSnippet' 'isJsonLD' 'isItemType' 'isItemProp'
• 'Wordcount' 'Size' 'ResponseTime'
• 'H1' 'H1Length' 'H1Dist' 'H1IsQuestion'
• 'H2' 'H2Length' 'H2Dist' 'H2IsQuestion‘
• Use AML to find importance features
Dataiku : Featured Snippet
64. Dataiku : My Plugins
• SEMrush
• SearchConsole
• Majestic
• Visiblis [ongoing]
A DSS plugin is a zip file.
Inside DSS, click the top right gear → Administration → Plugins → Store.
https://github.com/voltek62/Dataiku-SEO-Plugins
67. • Learn from the success of others with AML
• Use all methods at your disposal to show Google you are the
answer to the question. ( Title, H1, H2, … )
Dataiku : Results
70. • Yes, you can because :
• Great advertising
• Get customers for specific features and trainings
Open Source & SEO ?
• Showing your work
• Attract talent
• Teaching the next generation
71. • Automated Reports with Rstudio Server
• Automated KPI reporting with Shiny Server
• Process Validation Documentation with Jupyter Notebook
• Automated Machine Learning with Dataiku
Take away
72. Now, machines can learn and adapt,
it is time to take advantage of the
opportunity to create new jobs.
Data-SEO, Data-Doctor, Data-Journalist …
R est un langage informatique dédié aux statistiques et à la science des données. L'implémentation la plus connue du langage R est le logiciel GNU R.
Header de la response HTTP : collect the contents of the header of an HTTP response
Itoken : This function creates iterators over input objects to vocabularies, corpora, or DTM and TCM matrices. This iterator is usually used in following functions : create_vocabulary, create_corpus, create_dtm, vectorizers,create_tcm. See them for details.
create_vocabulary : This function collects unique terms and corresponding statistics. See the below for details.
Email ,…..
Shiny is a toolkit from RStudio that makes creating web applications much easier. (HTML, CSS, Java, JavaScript et jQuery )
Shiny is licensed GPLv3, and the source is available on GitHub.
Shiny is a toolkit from RStudio that makes creating web applications much easier. (HTML, CSS, Java, JavaScript et jQuery )
Shiny is licensed GPLv3, and the source is available on GitHub.
Install one line
2 fichiers UI.R et server.R
Changer crawler par scraper
Benchmarking : AML can quickly present a lot of models using the same training set
Detecting Target Leakage: AML builds candidate models extremely fast in an automated way
Diagnostics: Diagnostics can be automatically generated such as learning curves, feature importances, etc.
Automation : Tasks like exploratory data analysis, pre-processing of data, model selection and putting models into production can be automated.