SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Data Mining With SQL Server
Nguyen To Hoan Phuc
Pham Huu Khanh
2
Objectives
• Understand What is Data Mining
• Learn the Data Mining Process
• How to work with Data Mining
3
Agenda
• Data vs. Information
• What is Data Mining?
• Technical Platform
• Data Mining Process Overview
• Key Concepts and Terminology
• DEMO: Data Mining Process in Detail Using DMX
4
Data vs. Information
5
6
7
8
What is Data Mining?
9
Data Mining
• Technologies for analysis of data and discovery of
(very) hidden patterns
• Uses a combination of statistics, probability analysis
and database technologies
• Fairly young (<20 years old) but clever algorithms
developed through database research
10
What does Data Mining Do?
Explores
Your Data
Finds
Patterns -
Trends
Performs
Predictions
11
Data Mining Tasks
• Classification
• Phân loại, xếp hạng
• Regression
• Hồi quy
• Segmentation
• Phân khúc
• Association
• Liên kết
• Sequence Analysis
• Phân tích chuỗi, dãy
12
The Technical Platform
13
 Data acquisition and
integration from
multiple sources
 Data transformation
and synthesis
 Knowledge and
pattern detection
through Data Mining
 Data enrichment with
logic rules and
hierarchical views
 Data presentation
and distribution
 Data publishing for
mass recipients
Integrate Analyze Report
SQL Server
We Need More Than Just Database Engine
14
DM – Part of Microsoft SQL Server
15
Server Mining Architecture
Analysis Services
Server
Mining Model
Data Mining Algorithm Data
Source
Excel/Visio/SSRS/Your App
OLE DB/ADOMD/XMLA
Deploy
BIDS
Excel
Visio
SSMS
App
Data
16
Data Mining Process Overview
17
Mining Model Mining ModelMining Model
Mining Process
DM EngineDM Engine
Training data
Data to be
predictedMining Model
With
predictions
18
Steps for Building a DM Model
1. Model Creation
• Define columns for cases: visually (BIDS), using DMX, or from PMML
2. Model Training
• Feed lots of data from a real database, or from a system log
Congratulations! We now have a model
3. Model Testing
• Test on sample data to check predictions.
• Testing data must be different from training
• If we get nonsense, adjust the algorithm, its parameters, model design, or even
data
4. Model Use (Exploration and Prediction)
• Use the model on new data to predict outcomes
19
Many Approaches
• Work the way you like:
• Database experts and SQL veterans:
• Write queries in DMX (similar to T-SQL)
• Everyone else:
• Use Business Intelligence Development Studio (BIDS) – rich GUI
included with SSAS
• Hosted in Visual Studio (included!)
• You don’t have to program – click-click instead
• Use Excel 2007 with Data Mining Add-Ins
• The “Data Mining” tab has everything you need
• “Table Analysis” tab is easier but simplified
20
Key Concepts and Terminology
21
Mining Structure
• Describes data to be mined
• Columns from a data source and their:
• Data Type
• Content Type
• Contains Mining Models
• Often we build several different models in one structure
• Holds training data, known as Cases (if required)
• Holds testing data, known as Holdout (in SQL 2008)
22
Mining Structure
23
Data Mining Model
• Container of patterns discovered by a Data Mining
Algorithm amongst the training Cases
• A table containing patterns
• Expressed by visualisers
• Specifies usage of columns already defined in the
Mining Structure
24
Cases: The Things We Study
• Case – set of columns (attributes) you want to analyse
• Age, Gender, Region, Annual Spending
• Case Key – unique ID of a case
25
DEMO
Data Mining Process in Detail
Using DMX
26
Data Mining Extensions
DMX
• “T-SQL” for Data Mining
• Easy! Like scripting for IT Pros
• Two types of statements:
• Data Definition
• CREATE, ALTER, EXPORT, IMPORT, DROP
• Data Manipulation
• INSERT INTO, SELECT, DELETE
27
DMX – Just Like T-SQL
CREATE MINING MODEL CreditRisk
(CustID LONG KEY,
Gender TEXT DISCRETE,
Income LONG CONTINUOUS,
Profession TEXT DISCRETE,
Risk TEXT DISCRETE PREDICT)
USING Microsoft_Decision_Trees
INSERT INTO CreditRisk
(CustId, Gender, Income, Profession,
Risk)
Select
CustomerID, Gender, Income,
Profession,Risk
From Customers
Select NewCustomers.CustomerID, CreditRisk.Risk,
PredictProbability(CreditRisk.Risk)
FROM CreditRisk PREDICTION JOIN NewCustomers
ON CreditRisk.Gender=NewCustomer.Gender
AND CreditRisk.Income=NewCustomer.Income
AND CreditRisk.Profession=NewCustomer.Profession
28
Demo’s Steps
1
• Create Mining Structure
2
• Create Mining Model
3
• Process Mining Model
4
• Test Model
5
• Execute Prediction
29
30

Weitere ähnliche Inhalte

Ähnlich wie Data Mining With SQL Server

For Beginners - Ado.net
For Beginners - Ado.netFor Beginners - Ado.net
For Beginners - Ado.netTarun Jain
 
SQL SCIPY STREAMLIT_Introduction to the basic of SQL SCIPY STREAMLIT
SQL SCIPY STREAMLIT_Introduction to the basic of SQL SCIPY STREAMLITSQL SCIPY STREAMLIT_Introduction to the basic of SQL SCIPY STREAMLIT
SQL SCIPY STREAMLIT_Introduction to the basic of SQL SCIPY STREAMLITchaitalidarode1
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1malathieswaran29
 
Data Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQLData Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQLBasho Technologies
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Rittman Analytics
 
Best Practices for Meeting State Data Management Objectives
Best Practices for Meeting State Data Management ObjectivesBest Practices for Meeting State Data Management Objectives
Best Practices for Meeting State Data Management ObjectivesEmbarcadero Technologies
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationSunderland City Council
 
Ibm datastage online training in hyderabad
Ibm datastage online training in hyderabadIbm datastage online training in hyderabad
Ibm datastage online training in hyderabadGoLogica Technologies
 
Getting Started with BigQuery ML
Getting Started with BigQuery MLGetting Started with BigQuery ML
Getting Started with BigQuery MLDan Sullivan, Ph.D.
 
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015 Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015 Vladi Vexler
 
Data mining by example - building predictive model using microsoft decision t...
Data mining by example - building predictive model using microsoft decision t...Data mining by example - building predictive model using microsoft decision t...
Data mining by example - building predictive model using microsoft decision t...Shaoli Lu
 
Process.ppt
Process.pptProcess.ppt
Process.pptSK Chew
 
EM12c: Capacity Planning with OEM Metrics
EM12c: Capacity Planning with OEM MetricsEM12c: Capacity Planning with OEM Metrics
EM12c: Capacity Planning with OEM MetricsMaaz Anjum
 
Data mining by example forecasting and cross prediction using microsoft time ...
Data mining by example forecasting and cross prediction using microsoft time ...Data mining by example forecasting and cross prediction using microsoft time ...
Data mining by example forecasting and cross prediction using microsoft time ...Shaoli Lu
 
finalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxfinalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxshumPanwar
 

Ähnlich wie Data Mining With SQL Server (20)

For Beginners - Ado.net
For Beginners - Ado.netFor Beginners - Ado.net
For Beginners - Ado.net
 
SQL SCIPY STREAMLIT_Introduction to the basic of SQL SCIPY STREAMLIT
SQL SCIPY STREAMLIT_Introduction to the basic of SQL SCIPY STREAMLITSQL SCIPY STREAMLIT_Introduction to the basic of SQL SCIPY STREAMLIT
SQL SCIPY STREAMLIT_Introduction to the basic of SQL SCIPY STREAMLIT
 
NoSql Brownbag
NoSql BrownbagNoSql Brownbag
NoSql Brownbag
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
Data Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQLData Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQL
 
Lecture2 (1).ppt
Lecture2 (1).pptLecture2 (1).ppt
Lecture2 (1).ppt
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
 
Best Practices for Meeting State Data Management Objectives
Best Practices for Meeting State Data Management ObjectivesBest Practices for Meeting State Data Management Objectives
Best Practices for Meeting State Data Management Objectives
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data Visualisation
 
Ibm datastage online training in hyderabad
Ibm datastage online training in hyderabadIbm datastage online training in hyderabad
Ibm datastage online training in hyderabad
 
Data mining & column stores
Data mining & column storesData mining & column stores
Data mining & column stores
 
Getting Started with BigQuery ML
Getting Started with BigQuery MLGetting Started with BigQuery ML
Getting Started with BigQuery ML
 
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015 Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
 
dwdm unit 1.ppt
dwdm unit 1.pptdwdm unit 1.ppt
dwdm unit 1.ppt
 
Data mining by example - building predictive model using microsoft decision t...
Data mining by example - building predictive model using microsoft decision t...Data mining by example - building predictive model using microsoft decision t...
Data mining by example - building predictive model using microsoft decision t...
 
Process.ppt
Process.pptProcess.ppt
Process.ppt
 
EM12c: Capacity Planning with OEM Metrics
EM12c: Capacity Planning with OEM MetricsEM12c: Capacity Planning with OEM Metrics
EM12c: Capacity Planning with OEM Metrics
 
Data mining by example forecasting and cross prediction using microsoft time ...
Data mining by example forecasting and cross prediction using microsoft time ...Data mining by example forecasting and cross prediction using microsoft time ...
Data mining by example forecasting and cross prediction using microsoft time ...
 
Compilerpt
CompilerptCompilerpt
Compilerpt
 
finalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxfinalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptx
 

Kürzlich hochgeladen

Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 

Kürzlich hochgeladen (20)

Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 

Data Mining With SQL Server

  • 1. Data Mining With SQL Server Nguyen To Hoan Phuc Pham Huu Khanh
  • 2. 2 Objectives • Understand What is Data Mining • Learn the Data Mining Process • How to work with Data Mining
  • 3. 3 Agenda • Data vs. Information • What is Data Mining? • Technical Platform • Data Mining Process Overview • Key Concepts and Terminology • DEMO: Data Mining Process in Detail Using DMX
  • 5. 5
  • 6. 6
  • 7. 7
  • 8. 8 What is Data Mining?
  • 9. 9 Data Mining • Technologies for analysis of data and discovery of (very) hidden patterns • Uses a combination of statistics, probability analysis and database technologies • Fairly young (<20 years old) but clever algorithms developed through database research
  • 10. 10 What does Data Mining Do? Explores Your Data Finds Patterns - Trends Performs Predictions
  • 11. 11 Data Mining Tasks • Classification • Phân loại, xếp hạng • Regression • Hồi quy • Segmentation • Phân khúc • Association • Liên kết • Sequence Analysis • Phân tích chuỗi, dãy
  • 13. 13  Data acquisition and integration from multiple sources  Data transformation and synthesis  Knowledge and pattern detection through Data Mining  Data enrichment with logic rules and hierarchical views  Data presentation and distribution  Data publishing for mass recipients Integrate Analyze Report SQL Server We Need More Than Just Database Engine
  • 14. 14 DM – Part of Microsoft SQL Server
  • 15. 15 Server Mining Architecture Analysis Services Server Mining Model Data Mining Algorithm Data Source Excel/Visio/SSRS/Your App OLE DB/ADOMD/XMLA Deploy BIDS Excel Visio SSMS App Data
  • 17. 17 Mining Model Mining ModelMining Model Mining Process DM EngineDM Engine Training data Data to be predictedMining Model With predictions
  • 18. 18 Steps for Building a DM Model 1. Model Creation • Define columns for cases: visually (BIDS), using DMX, or from PMML 2. Model Training • Feed lots of data from a real database, or from a system log Congratulations! We now have a model 3. Model Testing • Test on sample data to check predictions. • Testing data must be different from training • If we get nonsense, adjust the algorithm, its parameters, model design, or even data 4. Model Use (Exploration and Prediction) • Use the model on new data to predict outcomes
  • 19. 19 Many Approaches • Work the way you like: • Database experts and SQL veterans: • Write queries in DMX (similar to T-SQL) • Everyone else: • Use Business Intelligence Development Studio (BIDS) – rich GUI included with SSAS • Hosted in Visual Studio (included!) • You don’t have to program – click-click instead • Use Excel 2007 with Data Mining Add-Ins • The “Data Mining” tab has everything you need • “Table Analysis” tab is easier but simplified
  • 20. 20 Key Concepts and Terminology
  • 21. 21 Mining Structure • Describes data to be mined • Columns from a data source and their: • Data Type • Content Type • Contains Mining Models • Often we build several different models in one structure • Holds training data, known as Cases (if required) • Holds testing data, known as Holdout (in SQL 2008)
  • 23. 23 Data Mining Model • Container of patterns discovered by a Data Mining Algorithm amongst the training Cases • A table containing patterns • Expressed by visualisers • Specifies usage of columns already defined in the Mining Structure
  • 24. 24 Cases: The Things We Study • Case – set of columns (attributes) you want to analyse • Age, Gender, Region, Annual Spending • Case Key – unique ID of a case
  • 25. 25 DEMO Data Mining Process in Detail Using DMX
  • 26. 26 Data Mining Extensions DMX • “T-SQL” for Data Mining • Easy! Like scripting for IT Pros • Two types of statements: • Data Definition • CREATE, ALTER, EXPORT, IMPORT, DROP • Data Manipulation • INSERT INTO, SELECT, DELETE
  • 27. 27 DMX – Just Like T-SQL CREATE MINING MODEL CreditRisk (CustID LONG KEY, Gender TEXT DISCRETE, Income LONG CONTINUOUS, Profession TEXT DISCRETE, Risk TEXT DISCRETE PREDICT) USING Microsoft_Decision_Trees INSERT INTO CreditRisk (CustId, Gender, Income, Profession, Risk) Select CustomerID, Gender, Income, Profession,Risk From Customers Select NewCustomers.CustomerID, CreditRisk.Risk, PredictProbability(CreditRisk.Risk) FROM CreditRisk PREDICTION JOIN NewCustomers ON CreditRisk.Gender=NewCustomer.Gender AND CreditRisk.Income=NewCustomer.Income AND CreditRisk.Profession=NewCustomer.Profession
  • 28. 28 Demo’s Steps 1 • Create Mining Structure 2 • Create Mining Model 3 • Process Mining Model 4 • Test Model 5 • Execute Prediction
  • 29. 29
  • 30. 30

Hinweis der Redaktion

  1. Vấnđề lưutrữ, kinh phí, quảnlý, bảoquảnPhátsinhngàycàngnhiềuKhó khănchoviệc “hiểu” Nhucầudoanhnghiệp: rúttríchđượcthông tin từ dữ liệu, hỗ trợ raquyếtđịnh, tăngtínhcạnhtranhtrênthị trường.
  2. Nhucầuvề Information: dự đoán, hỗ trợ raquyếtđịnh
  3. Statistics: thongkeProbability: xacsuatLa mot linhvuc con khatre, duocphattriencach day chua den 20 namnhungcacthuattoanduocphattrienkharo rang
  4. Giải thuật phân loại (Classification Algorithm) – dự đoán ra một hoặc nhiều giá trị biến rời rạc, dựa trên các thuộc tính khác của tập dữ liệu. Điển hình là giải thuật Cây Quyết Định – Microsoft Decision Trees Algorithm.Giải thuật đệ qui (Regression Algorithm) – dự đoán một hoặc nhiều biến giá trị liên tục, như lợi nhuận và giá trị thua lỗ, dựa trên các thuộc tính dữ liệu khác trong tập dữ liệu. Điển hình là giải thuật chuỗi thời gian – Microsoft Time Series Algorithm.Giải thuật phân đoạn (Segmentation Algorithm) – phân chia dữ liệu thành nhiều nhóm gồm các thành phần có thuộc tính tương tự nhau. Giải thuật điển hình là Microsoft Clustering Algorithm.Giải thuật tương quan (Assocication Algorithm) – tìm sự tương quan giữa các thuộc tính trong củng tập dữ liệu. Ứng dụng phổ biến nhất của giải thuật này là xây dựng các luật tương quan, phân tích giỏ hàng. Giải thuật điển hình loại giải thuật này là Microsoft Assocciation AlgorithmGiải thuật phân tích tuyến tính (Sequence Analysis Allgorithm) – tổng kết các chuỗi hoặc mảng dữ liệu trong tập dữ liệu. Điển hình cho loại giải thuật này là Microsoft Sequence Clustering Algorithm
  5. SSMS: SQL Server Management StudioCông cụ để tạoracác Mining Model. Cáccông cụ được Microsoft cungcấpgồm có: Business Inteligence Development Studio, Excel, Visio, SQL Server Management Studio. Saukhitạoracác Mining Model, cầnphảitriểnkhailênhệ thống Analysis Services (A.S). Analysis Service là nơivậnhành, quảnlý các Model.Lưu ý rằngcác Model saukhiđượctriểnkhailên A.S chỉ là các Model rỗng. Để có thể đưavàosử dụng, cầnphải qua một quá trìnhgọi là Training Model (hoặc Process Model). Vì thế cầnđếnthànhphầnthứ 3 đó là Data Source. Data source là nơichứadữ liệucầnthiếtchoviệc Training Model và cả quá trình Test Model. Vì thế cầnphải chia lượng Data thành 2 phầnriêngbiệtđể phục vụ cho 2 tác vụ trên.Thànhphầnthứ 4, đó là cácứngdụngkhaitháccác Mining Model đã đượcxâydựng. Cácứngdụng có thể là cácphầnmềmđược Microsoft cungcấpnhư Excel, Visio hoặcứngdụng do ngườidùngxâydựng. Cácứngdụngnàygởidữ liệucủamìnhxuống Analysis Service và nhậnphảnhồi là kết quả của quá trình Data Mining trở lại.
  6. Approaches: phươngpháptiếpcận
  7. Concepts:kháiniệmTerminology: thuậtngữ
  8. Mining Structures (Analysis Services - Data Mining)The mining structure defines the data from which mining models are built: it specifies the source data view, the number and type of columns, and an optional partition into training and testing sets. A single mining structure can support multiple mining models that share the same domain. The following diagram illustrates the relationship of the data mining structure to the data source, and to its constituent data mining models.
  9. The mining structure in the diagram is based on a data source that contains multiple tables, joined on the CustomerID field. One table contains information about customers, such as the geographical region, age, income and gender, while the related nested table contains multiple rows of additional information about each customer, such as products the customer has purchased. The diagram shows that multiple models can be built on one mining structure, and that the models can use different columns from the structure. Model 1    Uses CustomerID, Income, Age, Region, and filters the data on Region.Model 2    Uses CustomerID, Income, Age, Region and filters the data on Age.Model 3    Uses CustomerID, Age, Gender, and the nested table, with no filter.Because the models use different columns for input, and because two of the models additionally restrict the data that is used in the model by applying a filter, the models might have very different results even though they are based on the same data. Note that the CustomerID column is required in all models because it is the only available column that can be used as the case key.This section explains the basic architecture of data mining structures. For more information about how to create, manage, modify, or view data mining structures, see Managing Data Mining Structures and Models.
  10. A data mining model gets data from a mining structure and then analyzes that data by using a data mining algorithm. The mining structure and mining model are separate objects. The mining structure stores information that defines the data source. A mining model stores information derived from statistical processing of the data, such as the patterns found as a result of analysis. A mining model is empty until the data provided by the mining structure has been processed and analyzed. After a mining model has been processed, it contains metadata, results, and bindings back to the mining structure.