SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Data Organization & Big Data Architecture
 Data Organization
 Big Data Architecture
 Recruitment
Agenda
Data Organization
Line Of Business
HR Finance Sales Customers
Competitors Markets Products Supply
Trafic
Acquisition
Communication Security Prospects
* If you read this text, work in the data field and are interested in joining us, please go to: https://www.ovh.com/fr/careers/
Use Line Of Business
•LOB 1
( Customer )
BI Team
DataScience
Team
LOB 2
( Support )
BI Team
DataScience
Team
LOB 3
…
BI Team
DataScience
Team
Data Office
Data
Centralization
Datalake
Cleansing
Data
Integration
Data Office
CRM
BI Team
Data Science
Team
• ExtractsData
Analyst
•Events
•Actions
Customer
Animation
•Product Analysis
•Global AnalysisBUS
•Country Analysis
SUBS
•PAC
•Analyse AdhocDigital
•Onsite
•PartnerBIZDEV
•Campaigns
•Text mining
Trafic
Acquistion
•Segmentation
•Normalisation
Targeting
Channel
Incaseyoumisseditonthepreviousslide,ifyouworkinthedatafield,
weareinterestedinyourprofile!
Data Maturity
Level 1:
POC
Data are manually created or extracted once
Data are modified by one data scientist
Data are assessed by a data analyst and manually sent to a business analyst post control
Data Maturity
Level 1:
POC
Data are manually created or extracted once
Data are modified by one data scientist
Data are assessed by a data analyst and manually sent to a business analyst post control
Level 2:
Manual
Data are manually created on a regular basis
Data are manually added to the enterprise model with an automated process
Data can be used by all data scientists, data analysts or business analysts
Data Maturity
Level 1:
POC
Data are manually created or extracted once
Data are modified by one data scientist
Data are assessed by a data analyst and manually sent to a business analyst post control
Level 2:
Manual
Data are manually created on a regular basis
Data are manually added to the enterprise model with an automated process
Data can be used by all data scientists, data analysts or business analysts
Level 3:
Automatic
Data are created through a controlled business process
Data are automatically added to the enterprise model
Data can be used by all data scientists, data analysts or business analysts
Data Maturity Matrix
Customers Competitors Products
Advanced 5 Potential Strategy
4 Attrition New Product
3 Churn Rank
2 Adds Event
Basic 1 NIC Pricing …
Exploration : Code First Industrialisation : Model first
Data Scientists
Data Analysts
Business Analysts
Analyse
Test
Validation
Data Management Team ( Architect + Data Integrator )
Business Intelligence Team
Data Lake Team
Data Lake Team
Tool / Infrastructure
Exploration : Code First Industrialisation : Model first
Data Scientists
Data Analysts
Business Analysts
Technical model
Analyse
Test
Validation
Data Management Team ( Architect + Data Integrator )
Business Intelligence Team
Tool / Infrastructure
Exploration : Code First Industrialisation : Model first
Data preparation :
80%
Data Scientists
Data Analysts
Business Analysts
Technical model
Machine
Learning :
20%
Analyse
Test
Validation
Data Management Team ( Architect + Data Integrator )
Business Intelligence Team
Data Lake Team
Tool / Infrastructure
Exploration : Code First Industrialisation : Model first
Data preparation :
80%
Data Scientists
Data Analysts
Business Analysts
Technical model
Machine
Learning :
20%
Analyse
Test
Validation
Data Analysis /
Creation
Data
Analysis
Data Management Team ( Architect + Data Integrator )
DataViz
Model
Business Intelligence Team
POC
Expose
POC
POC Mode
Data Lake Team
Tool / Infrastructure
Exploration : Code First Industrialisation : Model first
Data preparation :
80%
Data Scientists
Data Analysts
Business Analysts
Technical model
Machine
Learning :
20%
Analyse
Test
Validation
Data Analysis /
Creation
Data
Analysis
DataCommitee
Data Management Team ( Architect + Data Integrator )
DataViz
Model
Enterprise Model Building
Datamart and report
building
Business Intelligence Team
DTM
Data Prepare:
industrialise
POC
Datastore 360
Level 2 & 3
mode
Expose
POC
Entreprise model
POC Mode
Data Lake Team
Tool / Infrastructure
Exploration : Code First Industrialisation : Model first
Data preparation :
80%
Data Scientists
Data Analysts
Business Analysts
Technical model
Machine
Learning :
20%
Analyse
Test
Validation
Data Analysis /
Creation
Data
Analysis
DataCommitee
Data Management Team ( Architect + Data Integrator )
DataViz
Model
Enterprise Model Building
Datamart and report
building
Business Intelligence Team
DTM
Data Prepare:
industrialise
Build Datamart and
Dashboard
POC
Datastore 360
Expose
POC
Entreprise model
POC Mode
Level 2 & 3
mode
Data Lake Team
Data Commitee
 Define data that needs to be added to
enterprise data
 Define priority and owners by subject
 Industrialise New data production : from
excel to full business process
 Validate enterprise model
– Common vocabulary
– Business and/or Functional model
 Be informed of evolution
Participant
 Data Scientist
 Data Analyst
 Business Analyst
 Data Management Team
Periodicity
 Every month
Objectives
Datastore 360
EDS 360
History
 Get all data from
– Front office application
– Back Office Application
– External Data
 Stores data in a business oriented model
 Responsable to historize data when this makes
sense for the business
– What data do we want to keep ? What will I need in 20 years ?
 Expose data to all application that requires it
– Business Intelligence : reporting or datamart
– Front office Application
Current
Client Produit Activity
Client Produit Activity
…
…
Data Scientist
Data Analyst
Business Analyst
DataViz
User APPs
(CRM,
Support
api
api Direct
read
Big Data Architecture
Context
~ 50 Replicas SQL
~ 700 DB
~ 300K tables
~ 100TB
~ 500K events/s
Datalake Hardware view
Private network
OVH Dedicated server
OVH Public Cloud High scalability
Security
Performance
Reliability
Lille Grand Palais – 28 Février 2017
Datalake software view
Pig
Flink
Spark
HDFS
HBase
Phoenix
Kafka
(Queue)Couch
Base
Jobs
Job Skills Output
Data Analyst Excel
Dataviz : Tableau, PowerBI
Data strategy
Data Scientist Scala, Java, R, Python, Cube Datasets, Flows, Patterns,
Models
Data Integrator Flink, Hbase, Pig, Spark Data preparation
Data Dev Ops Kafka, Hbase, Go, Apache
Beam, …
Datalake
Thank you !
Join us : ovh.com/fr/careers

Weitere ähnliche Inhalte

Was ist angesagt?

OWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuOWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuDataiku
 
Visualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaVisualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaObjectRocket
 
Your data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureYour data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureObjectRocket
 
BigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearchBigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearchTO THE NEW | Technology
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentationTao Feng
 
GraphDB Cloud: Enterprise Ready RDF Database on Demand
GraphDB Cloud: Enterprise Ready RDF Database on DemandGraphDB Cloud: Enterprise Ready RDF Database on Demand
GraphDB Cloud: Enterprise Ready RDF Database on DemandOntotext
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Imply
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discoverymarkgrover
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadatamarkgrover
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big DataLewis Crawford
 
Smarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing PlatformSmarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing PlatformOntotext
 
How to migrate to GraphDB in 10 easy to follow steps
How to migrate to GraphDB in 10 easy to follow steps How to migrate to GraphDB in 10 easy to follow steps
How to migrate to GraphDB in 10 easy to follow steps Ontotext
 
Democratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data DiscoveryDemocratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data DiscoveryMark Grover
 
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryIntro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryChris Schalk
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperMárton Kodok
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentationTao Feng
 

Was ist angesagt? (20)

OWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuOWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - Dataiku
 
Visualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaVisualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and Kibana
 
Your data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureYour data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the future
 
BigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearchBigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearch
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
GraphDB Cloud: Enterprise Ready RDF Database on Demand
GraphDB Cloud: Enterprise Ready RDF Database on DemandGraphDB Cloud: Enterprise Ready RDF Database on Demand
GraphDB Cloud: Enterprise Ready RDF Database on Demand
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big Data
 
Smarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing PlatformSmarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing Platform
 
How to migrate to GraphDB in 10 easy to follow steps
How to migrate to GraphDB in 10 easy to follow steps How to migrate to GraphDB in 10 easy to follow steps
How to migrate to GraphDB in 10 easy to follow steps
 
Democratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data DiscoveryDemocratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data Discovery
 
Redshift VS BigQuery
Redshift VS BigQueryRedshift VS BigQuery
Redshift VS BigQuery
 
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryIntro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
 
Elastic Stack Roadmap
Elastic Stack RoadmapElastic Stack Roadmap
Elastic Stack Roadmap
 

Ähnlich wie Meetup Data-science OVH

Strategy session 5 - unlocking the data dividend - andy steer
Strategy   session 5 - unlocking the data dividend - andy steerStrategy   session 5 - unlocking the data dividend - andy steer
Strategy session 5 - unlocking the data dividend - andy steerAndy Steer
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseDatabricks
 
M Chambers and RapidMiner Overview for Babson class
M Chambers and RapidMiner Overview for Babson classM Chambers and RapidMiner Overview for Babson class
M Chambers and RapidMiner Overview for Babson classmcAnalytics99
 
Project Deliverable 1 Project Plan InceptionBy Justin M. Bla.docx
Project Deliverable 1 Project Plan InceptionBy Justin M. Bla.docxProject Deliverable 1 Project Plan InceptionBy Justin M. Bla.docx
Project Deliverable 1 Project Plan InceptionBy Justin M. Bla.docxwkyra78
 
How to make your data scientists happy
How to make your data scientists happy How to make your data scientists happy
How to make your data scientists happy Hussain Sultan
 
Big Data Analytics Webinar
Big Data Analytics WebinarBig Data Analytics Webinar
Big Data Analytics WebinarEckerson Group
 
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsOracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsMark Rittman
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
Thought leadership Oct2015 selfserve
Thought leadership Oct2015 selfserveThought leadership Oct2015 selfserve
Thought leadership Oct2015 selfserveRon Krzoska
 
Predictive Data Analytics and Artificial Intelligence by 40°
Predictive Data Analytics and Artificial Intelligence by 40°Predictive Data Analytics and Artificial Intelligence by 40°
Predictive Data Analytics and Artificial Intelligence by 40°40° Labor für Innovation
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationDenodo
 
Bi presentation to bkk
Bi presentation to bkkBi presentation to bkk
Bi presentation to bkkguest4e975e2
 
Big Data Meetup: Analytical Systems Evolution
Big Data Meetup: Analytical Systems EvolutionBig Data Meetup: Analytical Systems Evolution
Big Data Meetup: Analytical Systems EvolutionProvectus
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Matt Stubbs
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMBig Data Joe™ Rossi
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMBig Data Joe™ Rossi
 

Ähnlich wie Meetup Data-science OVH (20)

Agile BI success factors
Agile BI success factorsAgile BI success factors
Agile BI success factors
 
Strategy session 5 - unlocking the data dividend - andy steer
Strategy   session 5 - unlocking the data dividend - andy steerStrategy   session 5 - unlocking the data dividend - andy steer
Strategy session 5 - unlocking the data dividend - andy steer
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
M Chambers and RapidMiner Overview for Babson class
M Chambers and RapidMiner Overview for Babson classM Chambers and RapidMiner Overview for Babson class
M Chambers and RapidMiner Overview for Babson class
 
Project Deliverable 1 Project Plan InceptionBy Justin M. Bla.docx
Project Deliverable 1 Project Plan InceptionBy Justin M. Bla.docxProject Deliverable 1 Project Plan InceptionBy Justin M. Bla.docx
Project Deliverable 1 Project Plan InceptionBy Justin M. Bla.docx
 
How to make your data scientists happy
How to make your data scientists happy How to make your data scientists happy
How to make your data scientists happy
 
Big Data Analytics Webinar
Big Data Analytics WebinarBig Data Analytics Webinar
Big Data Analytics Webinar
 
Taming Big Data With Modern Software Architecture
Taming Big Data  With Modern Software ArchitectureTaming Big Data  With Modern Software Architecture
Taming Big Data With Modern Software Architecture
 
The Manulife Journey
The Manulife JourneyThe Manulife Journey
The Manulife Journey
 
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsOracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Thought leadership Oct2015 selfserve
Thought leadership Oct2015 selfserveThought leadership Oct2015 selfserve
Thought leadership Oct2015 selfserve
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
Predictive Data Analytics and Artificial Intelligence by 40°
Predictive Data Analytics and Artificial Intelligence by 40°Predictive Data Analytics and Artificial Intelligence by 40°
Predictive Data Analytics and Artificial Intelligence by 40°
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
Bi presentation to bkk
Bi presentation to bkkBi presentation to bkk
Bi presentation to bkk
 
Big Data Meetup: Analytical Systems Evolution
Big Data Meetup: Analytical Systems EvolutionBig Data Meetup: Analytical Systems Evolution
Big Data Meetup: Analytical Systems Evolution
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
 

Mehr von Vincent Terrasi

IA générative : Menace ou Opportunité pour le SEO
IA générative : Menace ou Opportunité pour le SEOIA générative : Menace ou Opportunité pour le SEO
IA générative : Menace ou Opportunité pour le SEOVincent Terrasi
 
slides SEO CAMP'us Paris 2022 - Google et tools SEO On vous a menti
slides SEO CAMP'us Paris 2022 - Google et tools SEO  On vous a mentislides SEO CAMP'us Paris 2022 - Google et tools SEO  On vous a menti
slides SEO CAMP'us Paris 2022 - Google et tools SEO On vous a mentiVincent Terrasi
 
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEO
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEOUne IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEO
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEOVincent Terrasi
 
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...Vincent Terrasi
 
Génération de contenu pour le SEO
Génération de contenu pour le SEOGénération de contenu pour le SEO
Génération de contenu pour le SEOVincent Terrasi
 
Comment faire du Data SEO sans savoir programmer ?
Comment faire du Data SEO sans savoir programmer ?Comment faire du Data SEO sans savoir programmer ?
Comment faire du Data SEO sans savoir programmer ?Vincent Terrasi
 
Explainable Machine Learning for Ranking Factors
Explainable Machine Learning for Ranking FactorsExplainable Machine Learning for Ranking Factors
Explainable Machine Learning for Ranking FactorsVincent Terrasi
 
Fausses données et Bad Data : restez vigilant !
Fausses données et Bad Data : restez vigilant !Fausses données et Bad Data : restez vigilant !
Fausses données et Bad Data : restez vigilant !Vincent Terrasi
 
Comment les plateformes de Data Science métamorphosent le SEO ?
Comment les plateformes de Data Science métamorphosent le SEO ?Comment les plateformes de Data Science métamorphosent le SEO ?
Comment les plateformes de Data Science métamorphosent le SEO ?Vincent Terrasi
 
Find out how DataScience has revolutionized SEO for OVH
Find out how DataScience has revolutionized SEO for OVHFind out how DataScience has revolutionized SEO for OVH
Find out how DataScience has revolutionized SEO for OVHVincent Terrasi
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?Vincent Terrasi
 
How to automate all your SEO projects
How to automate all your SEO projectsHow to automate all your SEO projects
How to automate all your SEO projectsVincent Terrasi
 
How Data Science can boost your SEO ?
How Data Science can boost your SEO ?How Data Science can boost your SEO ?
How Data Science can boost your SEO ?Vincent Terrasi
 
Analyse your SEO Data with R and Kibana
Analyse your SEO Data with R and KibanaAnalyse your SEO Data with R and Kibana
Analyse your SEO Data with R and KibanaVincent Terrasi
 

Mehr von Vincent Terrasi (14)

IA générative : Menace ou Opportunité pour le SEO
IA générative : Menace ou Opportunité pour le SEOIA générative : Menace ou Opportunité pour le SEO
IA générative : Menace ou Opportunité pour le SEO
 
slides SEO CAMP'us Paris 2022 - Google et tools SEO On vous a menti
slides SEO CAMP'us Paris 2022 - Google et tools SEO  On vous a mentislides SEO CAMP'us Paris 2022 - Google et tools SEO  On vous a menti
slides SEO CAMP'us Paris 2022 - Google et tools SEO On vous a menti
 
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEO
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEOUne IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEO
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEO
 
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...
 
Génération de contenu pour le SEO
Génération de contenu pour le SEOGénération de contenu pour le SEO
Génération de contenu pour le SEO
 
Comment faire du Data SEO sans savoir programmer ?
Comment faire du Data SEO sans savoir programmer ?Comment faire du Data SEO sans savoir programmer ?
Comment faire du Data SEO sans savoir programmer ?
 
Explainable Machine Learning for Ranking Factors
Explainable Machine Learning for Ranking FactorsExplainable Machine Learning for Ranking Factors
Explainable Machine Learning for Ranking Factors
 
Fausses données et Bad Data : restez vigilant !
Fausses données et Bad Data : restez vigilant !Fausses données et Bad Data : restez vigilant !
Fausses données et Bad Data : restez vigilant !
 
Comment les plateformes de Data Science métamorphosent le SEO ?
Comment les plateformes de Data Science métamorphosent le SEO ?Comment les plateformes de Data Science métamorphosent le SEO ?
Comment les plateformes de Data Science métamorphosent le SEO ?
 
Find out how DataScience has revolutionized SEO for OVH
Find out how DataScience has revolutionized SEO for OVHFind out how DataScience has revolutionized SEO for OVH
Find out how DataScience has revolutionized SEO for OVH
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
 
How to automate all your SEO projects
How to automate all your SEO projectsHow to automate all your SEO projects
How to automate all your SEO projects
 
How Data Science can boost your SEO ?
How Data Science can boost your SEO ?How Data Science can boost your SEO ?
How Data Science can boost your SEO ?
 
Analyse your SEO Data with R and Kibana
Analyse your SEO Data with R and KibanaAnalyse your SEO Data with R and Kibana
Analyse your SEO Data with R and Kibana
 

Kürzlich hochgeladen

Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 

Kürzlich hochgeladen (20)

Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 

Meetup Data-science OVH

  • 1. Data Organization & Big Data Architecture
  • 2.  Data Organization  Big Data Architecture  Recruitment Agenda
  • 4. Line Of Business HR Finance Sales Customers Competitors Markets Products Supply Trafic Acquisition Communication Security Prospects * If you read this text, work in the data field and are interested in joining us, please go to: https://www.ovh.com/fr/careers/
  • 5. Use Line Of Business •LOB 1 ( Customer ) BI Team DataScience Team LOB 2 ( Support ) BI Team DataScience Team LOB 3 … BI Team DataScience Team
  • 6. Data Office Data Centralization Datalake Cleansing Data Integration Data Office CRM BI Team Data Science Team • ExtractsData Analyst •Events •Actions Customer Animation •Product Analysis •Global AnalysisBUS •Country Analysis SUBS •PAC •Analyse AdhocDigital •Onsite •PartnerBIZDEV •Campaigns •Text mining Trafic Acquistion •Segmentation •Normalisation Targeting Channel Incaseyoumisseditonthepreviousslide,ifyouworkinthedatafield, weareinterestedinyourprofile!
  • 7. Data Maturity Level 1: POC Data are manually created or extracted once Data are modified by one data scientist Data are assessed by a data analyst and manually sent to a business analyst post control
  • 8. Data Maturity Level 1: POC Data are manually created or extracted once Data are modified by one data scientist Data are assessed by a data analyst and manually sent to a business analyst post control Level 2: Manual Data are manually created on a regular basis Data are manually added to the enterprise model with an automated process Data can be used by all data scientists, data analysts or business analysts
  • 9. Data Maturity Level 1: POC Data are manually created or extracted once Data are modified by one data scientist Data are assessed by a data analyst and manually sent to a business analyst post control Level 2: Manual Data are manually created on a regular basis Data are manually added to the enterprise model with an automated process Data can be used by all data scientists, data analysts or business analysts Level 3: Automatic Data are created through a controlled business process Data are automatically added to the enterprise model Data can be used by all data scientists, data analysts or business analysts
  • 10. Data Maturity Matrix Customers Competitors Products Advanced 5 Potential Strategy 4 Attrition New Product 3 Churn Rank 2 Adds Event Basic 1 NIC Pricing …
  • 11. Exploration : Code First Industrialisation : Model first Data Scientists Data Analysts Business Analysts Analyse Test Validation Data Management Team ( Architect + Data Integrator ) Business Intelligence Team Data Lake Team
  • 12. Data Lake Team Tool / Infrastructure Exploration : Code First Industrialisation : Model first Data Scientists Data Analysts Business Analysts Technical model Analyse Test Validation Data Management Team ( Architect + Data Integrator ) Business Intelligence Team
  • 13. Tool / Infrastructure Exploration : Code First Industrialisation : Model first Data preparation : 80% Data Scientists Data Analysts Business Analysts Technical model Machine Learning : 20% Analyse Test Validation Data Management Team ( Architect + Data Integrator ) Business Intelligence Team Data Lake Team
  • 14. Tool / Infrastructure Exploration : Code First Industrialisation : Model first Data preparation : 80% Data Scientists Data Analysts Business Analysts Technical model Machine Learning : 20% Analyse Test Validation Data Analysis / Creation Data Analysis Data Management Team ( Architect + Data Integrator ) DataViz Model Business Intelligence Team POC Expose POC POC Mode Data Lake Team
  • 15. Tool / Infrastructure Exploration : Code First Industrialisation : Model first Data preparation : 80% Data Scientists Data Analysts Business Analysts Technical model Machine Learning : 20% Analyse Test Validation Data Analysis / Creation Data Analysis DataCommitee Data Management Team ( Architect + Data Integrator ) DataViz Model Enterprise Model Building Datamart and report building Business Intelligence Team DTM Data Prepare: industrialise POC Datastore 360 Level 2 & 3 mode Expose POC Entreprise model POC Mode Data Lake Team
  • 16. Tool / Infrastructure Exploration : Code First Industrialisation : Model first Data preparation : 80% Data Scientists Data Analysts Business Analysts Technical model Machine Learning : 20% Analyse Test Validation Data Analysis / Creation Data Analysis DataCommitee Data Management Team ( Architect + Data Integrator ) DataViz Model Enterprise Model Building Datamart and report building Business Intelligence Team DTM Data Prepare: industrialise Build Datamart and Dashboard POC Datastore 360 Expose POC Entreprise model POC Mode Level 2 & 3 mode Data Lake Team
  • 17. Data Commitee  Define data that needs to be added to enterprise data  Define priority and owners by subject  Industrialise New data production : from excel to full business process  Validate enterprise model – Common vocabulary – Business and/or Functional model  Be informed of evolution Participant  Data Scientist  Data Analyst  Business Analyst  Data Management Team Periodicity  Every month Objectives
  • 18. Datastore 360 EDS 360 History  Get all data from – Front office application – Back Office Application – External Data  Stores data in a business oriented model  Responsable to historize data when this makes sense for the business – What data do we want to keep ? What will I need in 20 years ?  Expose data to all application that requires it – Business Intelligence : reporting or datamart – Front office Application Current Client Produit Activity Client Produit Activity … … Data Scientist Data Analyst Business Analyst DataViz User APPs (CRM, Support api api Direct read
  • 20. Context ~ 50 Replicas SQL ~ 700 DB ~ 300K tables ~ 100TB ~ 500K events/s
  • 21. Datalake Hardware view Private network OVH Dedicated server OVH Public Cloud High scalability Security Performance Reliability
  • 22. Lille Grand Palais – 28 Février 2017
  • 24. Jobs Job Skills Output Data Analyst Excel Dataviz : Tableau, PowerBI Data strategy Data Scientist Scala, Java, R, Python, Cube Datasets, Flows, Patterns, Models Data Integrator Flink, Hbase, Pig, Spark Data preparation Data Dev Ops Kafka, Hbase, Go, Apache Beam, … Datalake
  • 25. Thank you ! Join us : ovh.com/fr/careers

Hinweis der Redaktion

  1. A secured cluster accessible through a gateaway Computing layer is based on Public cloud instances in order to scale fastly On the other hand Cold Storage is based on dedicated server for higher performances Technologie vRACK pour le réseau dédié Public Cloud pour la scalabilité
  2. A secured cluster accessible through a gateaway Computing layer is based on Public cloud instances in order to scale fastly On the other hand Cold Storage is based on dedicated server for higher performances Technologie vRACK pour le réseau dédié Public Cloud pour la scalabilité -> datanode
  3. Hadoop ecosystem with HDFS for data storage, Hbase plus phoenix for SQL support on columnar storage -> Relationnal data storage layer CouchBase for document data storage. Key, value can either be stored into HDFS or couchbase depending on their access rate Processing is made by Spark / Flink / Pig. Each of these solution has its strong points, but spark and flink may be abstracted as a apache Beam layer in incoming versions.