SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Downloaden Sie, um offline zu lesen
Frieda Brioschi - frieda.brioschi@gmail.com
Emma Tracanella - emma.tracanella@gmail.com
AROUND DATA SCIENCE
LESSON 5 - 2020/21
LESSON 5
2
DESCRIBE YOUR PROJECT
Photo by William Iven on Unsplash
LESSON 5
A COUPLE OF DIGRESSIONS
▸ storage issues
▸ http://blog.odsi.co.uk/wp-content/uploads/2013/08/History-of-computer-
data-storage.png.jpg
▸ the rise of data center
▸ computational power
▸ the Internet
3
LESSON 5
MARGARET HAMILTON
4
LESSON 5
DATA CENTER CLOUD (4.563 IN 2019)
5https://www.digitalic.it/tecnologia/data-center-cloud-numeri-e-diffusione-nel-mondo-litalia-tra-i-paesi-europei-che-ne-ospita-di-piu
BIG DATA
WHAT ARE
Photo by ev on Unsplash
LESSON 5
DEFINITION
The term “big data” refers to data that is so large, fast or complex that it’s difficult or impossible to
process using traditional methods. The concept of big data gained momentum in the early 2000s
when industry analyst Doug Laney articulated the definition of big data as the three V’s:
▸ Volume: Organizations collect data from a variety of sources, including business transactions,
smart (IoT) devices, industrial equipment, videos, social media and more. In the past, storing it
would have been a problem.
▸ Velocity: With the growth in the Internet of Things, data streams in to businesses at an
unprecedented speed and must be handled in a timely manner, near-real time.
▸ Variety: Data comes in all types of formats – from structured, numeric data in traditional
databases to unstructured text documents, emails, videos, audios, stock ticker data and financial
transactions.
7
LESSON 5
(ACCORDING TO SAS)
8
LESSON 5
9
https://www.domo.com/learn/
data-never-sleeps-8
https://www.visualcapitalist.com/
big-data-keeps-getting-bigger/
LESSON 5
CORRELATION
When two sets of data are strongly linked together we say they have a High Correlation.
▸ Correlation is Positive when the values increase together, and
▸ Correlation is Negative when one value decreases as the other increases
Correlation can have a value:
▸ 1 is a perfect positive correlation
▸ 0 is no correlation (the values don't seem linked at all)
▸ -1 is a perfect negative correlation
12
LESSON 5
CORRELATION
Correlation is one of the most widely used statistical concepts.
Since the term "correlation" refers to a mutual relationship or association between
quantities, why is it a useful metric?
▸ Correlation can help in predicting one quantity from another
▸ Correlation can (but often does not) indicate the presence of a causal
relationship
▸ Correlation is used as a basic quantity and foundation for many other
modeling techniques
13
https://wearesocial.com/blog/2020/07/digital-use-around-the-world-in-july-2020
https://wearesocial.com/blog/2020/07/digital-use-around-the-world-in-july-2020
https://wearesocial.com/blog/2020/07/digital-use-around-the-world-in-july-2020
https://wearesocial.com/blog/2020/07/digital-use-around-the-world-in-july-2020
DATA
LINKED
LESSON 5
LINKED DATA / LOD
19
Linked data is structured data which is interlinked with other data so it becomes
more useful through semantic queries.It builds upon standard Web technologies
but rather than using them to serve web pages only for human readers, it extends
them to share information in a way that can be read automatically by computers.
Part of the vision of linked data is for the Internet to become a global database.
Linked data may also be open data, in which case it is usually described as linked
open data (LOD).
▸ https://en.wikipedia.org/wiki/Linked_data
LESSON 5
SCHEMA.ORG
http://schema.org/docs/full.html
20
LESSON 5
GOOGLE KNOWLEDGE GRAPH
21
https://www.youtube.com/watch?v=mmQl6VGvX-c
LESSON 5
WHY LINKED DATA MATTERS
Linked data is a method for publishing structured data using vocabularies like
schema.org that can be connected together and interpreted by machines. Using
linked data, statements encoded in triples can be spread across different
websites.
This enables data from different sources to be connected and queried.
▸ https://wordlift.io/blog/en/entity/linked-data/
22
DATA MINING
CLASSICAL
Photo by ev on Unsplash
LESSON 5
CONTEXT
You don’t have to be a fancy statistician to do data mining, but you do
have to know something about what the data signifies and how the
business works.
Only when you understand the data and the problem that you need to
solve can data-mining processes help you to discover useful
information and put it to use.
24
LESSON 5
NINE LAWS OF DATA MINING - 1
Pioneering data miner Thomas Khabaza developed his “Nine Laws of Data Mining”
to guide new data miners as they get down to work
▸ 1 - “Business Goals Law” 

Business objectives are the origin of every data mining solution.
A data miner is someone who discovers useful information from data to support
specific business goals. Data mining isn’t defined by the tool you use.
▸ 2 - “Business Knowledge Law”

Business Knowledge is central to every step of the data mining process.
You don’t have to be a fancy statistician to do data mining, but you do have to
know something about what the data signifies and how the business works.
25
LESSON 5
NINE LAWS OF DATA MINING - 2
▸ 3. “Data Preparation Law”

Data preparation is more than half of every data mining process.
Pretty much every data miner will spend more time on data preparation than on
analysis.
▸ 4. “No Free Lunch for the Data Miner”

The right model for a given application can only be discovered by experiment.
In data mining, models are selected through trial and error.
▸ 5 - “Patterns”

There are always patterns in the data.
As a data miner, you explore data in search of useful patterns. Understanding patterns
in the data enables you to influence what happens in the future.
26
LESSON 5
NINE LAWS OF DATA MINING - 3
▸ 6.  “Insight Law”

Data mining amplifies perception in the business domain.
Data mining methods enable you to understand your business better than you
could have done without them.
▸ 7 - “Prediction Law”

Prediction increases information locally by generalization.
Data mining helps us use what we know to make better predictions (or
estimates) of things we don’t know.
27
LESSON 5
NINE LAWS OF DATA MINING - 4
▸ 8. “Value Law”

The value of data mining results is not determined by the accuracy or stability
of predictive models.
Your model must produce good predictions, consistently. That’s it.
▸ 9. “Law of Change”

All patterns are subject to change.
Any model that gives you great predictions today may be useless tomorrow.
28
LESSON 5
PHASES OF THE DATA MINING PROCESS
The Cross-Industry Standard Process for
Data Mining (CRISP-DM) is the dominant
data-mining process framework. It’s an
open standard; anyone may use it.
29
LESSON 5
BUSINESS UNDERSTANDING
Get a clear understanding of the problem you’re out to solve, how it impacts your
organization, and your goals for addressing it.
Tasks in this phase include:
▸ Identifying your business goals
▸ Assessing your situation
▸ Defining your data mining goals
▸ Producing your project plan
30
LESSON 5
DATA UNDERSTANDING
Review the data that you have, document it, identify data management and data quality
issues.
Tasks in this phase include:
▸ Gathering data
▸ Describing
▸ Exploring
▸ Verifying quality
31
LESSON 5
DATA PREPARATION
Get your data ready to use for modeling.
Tasks in this phase include:
▸ Selecting data
▸ Cleaning data
▸ Constructing
▸ Integrating
▸ Formatting
32
LESSON 5
MODELING
Use mathematical techniques to identify patterns within your data.
Tasks in this phase include:
▸ Selecting techniques
▸ Designing tests
▸ Building models
▸ Assessing models
33
LESSON 5
EVALUATION
Review the patterns you have discovered and assess their potential for business
use.
Tasks in this phase include:
▸ Evaluating results
▸ Reviewing the process
▸ Determining the next steps
34
LESSON 5
DEPLOYMENT
Put your discoveries to work in everyday business. 
Tasks in this phase include:
▸ Planning deployment (your methods for integrating data mining discoveries
into use)
▸ Reporting final results
▸ Reviewing final results
35
DATA AGGREGATION
CLASSICAL
Photo by ev on Unsplash
LESSON 5
DATA AGGREGATION
Data aggregation is the process where raw data is gathered and expressed in a summary
form for statistical analysis.
For example, raw data can be aggregated over a given time period to provide statistics. After
the data is aggregated and written to a view or report, you can analyze the aggregated data
to gain insights about particular resources or resource groups.
There are two types of data aggregation:
▸ Time aggregation - All data points for a single resource over a specified time period.
▸ Spatial aggregation - All data points for a group of resources over a specified
geographical area.
37
LESSON 5
SUMMARY STATISTICS
When data is aggregated, groups of observations are replaced with summary statistics based on those observations.
Summary statistics are used tto communicate the largest amount of information as simply as possible.
▸ Mean
▸ Count
▸ Maximum
▸ Median
▸ Minimum
▸ Mode
▸ Range
▸ Sum
38
LESSON 5
TABLES
Tables are the format in which most numerical data are initially stored and analysed and
are likely to be the means you use to organise data collected during experiments and
dissertation research.
Tables are an effective way of presenting data:
• when you wish to show how a single category of information varies when
measured at different points (in time or space).
• when the dataset contains relatively few numbers.
• when the precise value is crucial to your argument and a graph would not convey
39
LESSON 5
BAR CHARTS
Bar charts are one of the most commonly
used types of graph and are used to display
and compare the number, frequency or other
measure for different discrete categories or
groups.
The bars can be drawn either vertically or
horizontally depending upon the number of
categories and length or complexity of the
category labels.
40
LESSON 5
HISTOGRAMS
Histograms are a special form of bar chart
where the data represent continuous rather
than discrete categories. Since a
continuous category may have a large
number of possible values the data are
often grouped to reduce the number of data
points.
41
LESSON 5
PIE CHARTS
Pie charts are a visual way of displaying how
the total data are distributed between different
categories. Pie charts should only be used for
displaying nominal data. They are generally
best for showing information grouped into a
small number of categories and are a
graphical way of displaying data that might
otherwise be presented as a simple table.
42
Pie chart of populations of English native speakers
LESSON 5
LINE GRAPHS
Line graphs are usually used to show time
series data – that is how one or more
variables vary over a continuous period of
time. Line graphs are particularly useful for
identifying patterns and trends in the data
such as seasonal effects, large changes and
turning points. As well as time series data,
line graphs can also be appropriate for
displaying data that are measured over other
continuous variables such as distance.
43
DATA SCIENCE
WHAT IS
Photo by ev on Unsplash
LESSON 5
DEFINITION
Data Science is a blend of various tools, algorithms, and machine learning
principles with the goal to discover hidden patterns from the raw data and solve
analytically complicated problems.
45
LESSON 5
APPLICATION OF DATA SCIENCE
46
LESSON 5
47
LESSON 5
EXPLAINING VS PREDICTING
48
By 2020 more than 80 % of the data
will be unstructured. This data is
generated from different sources like
financial logs, text files, multimedia
forms, sensors, and instruments.
LESSON 5
49https://databasetown.com/introduction-to-data-science-a-beginners-guide/#What_is_Data_Science
LESSON 5
50
LESSON 5
51
The Data Scientist has the ability to handle the crude data using the latest
technologies and techniques, can perform the necessary analysis, and can
present the acquired knowledge to his associates in an informative way.
LESSON 5
52
The Data Analyst works with R, Python and SQL; the role combines technical
and analytical knowledge.
LESSON 5
53
The Data Architect integrates, centralizes, protects and maintains data
sources.
LESSON 5
54
The Statistician can be seen as the pioneer of the data science field. It is often
he who reaps the information from the data and transforms it into actionable
insights.
LESSON 5
55
The Database Administrator ensures that the database is accessible to every
stakeholder in the organizations and performs the necessary safety measures
to keep the stored data safe.
LESSON 5
56
The Business Analyst is probably the least technical profile, he has a deep
understanding of the various business processes that are in place. He often
performs the role of the middle person between the business folks and the
technicians.
LESSON 5
57
The Data and Analytics Manager steers the direction of the data science
team. He consolidates strong and specialized skills in a various arrangement
of advancements (SQL, R, SAS, … ) with the social aptitudes required to deal
with a group.
EXAMPLES
SOME
PHOTO BY JAREDD CRAIG ON UNSPLASH
LESSON 5
THE NY TIMES
https://www.nytimes.com/interactive/2019/11/02/us/politics/trump-twitter-
disinformation.html
59

Weitere ähnliche Inhalte

Was ist angesagt?

Data mining and data aggregation basics
Data mining and data aggregation basicsData mining and data aggregation basics
Data mining and data aggregation basicsFrieda Brioschi
 
Around Data Science (v. 2020 ITA)
Around Data Science (v. 2020 ITA)Around Data Science (v. 2020 ITA)
Around Data Science (v. 2020 ITA)Frieda Brioschi
 
How we perceive information (v. 2020 ITA)
How we perceive information (v. 2020 ITA)How we perceive information (v. 2020 ITA)
How we perceive information (v. 2020 ITA)Frieda Brioschi
 
Visual communication of quantitative data (v. 2020 ITA)
Visual communication of quantitative data (v. 2020 ITA)Visual communication of quantitative data (v. 2020 ITA)
Visual communication of quantitative data (v. 2020 ITA)Frieda Brioschi
 
Data Lingo (v. ITA 2021)
Data Lingo (v. ITA 2021)Data Lingo (v. ITA 2021)
Data Lingo (v. ITA 2021)Frieda Brioschi
 
Data science landscape in the insurance industry
Data science landscape in the insurance industryData science landscape in the insurance industry
Data science landscape in the insurance industryStefano Perfetti
 
The data science revolution in insurance
The data science revolution in insuranceThe data science revolution in insurance
The data science revolution in insuranceStefano Perfetti
 
Keynote acm10.14.2017
Keynote acm10.14.2017Keynote acm10.14.2017
Keynote acm10.14.2017Alo Ghosh
 
How we perceive information
How we perceive informationHow we perceive information
How we perceive informationFrieda Brioschi
 
Big data-and-creativity v.1
Big data-and-creativity v.1Big data-and-creativity v.1
Big data-and-creativity v.1Kim Flintoff
 
Information visualization: information dashboards
Information visualization: information dashboardsInformation visualization: information dashboards
Information visualization: information dashboardsKatrien Verbert
 
Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Carlo Vaccari
 
Focus composants-english-v0
Focus composants-english-v0Focus composants-english-v0
Focus composants-english-v0René MANDEL
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionFabio Stella
 
Data Visualization
Data Visualization Data Visualization
Data Visualization Madelyn Cox
 
Www.diva portal.org smash-get_diva2_328402_fulltext01
Www.diva portal.org smash-get_diva2_328402_fulltext01Www.diva portal.org smash-get_diva2_328402_fulltext01
Www.diva portal.org smash-get_diva2_328402_fulltext01Loida Silao
 
Digital Prosumer - Identification of Personas through Intelligent Data Mining...
Digital Prosumer - Identification of Personas through Intelligent Data Mining...Digital Prosumer - Identification of Personas through Intelligent Data Mining...
Digital Prosumer - Identification of Personas through Intelligent Data Mining...Adebowale Nadi MBCS MIET MIScT RITTech
 
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...Jonathan Gray
 

Was ist angesagt? (20)

Data mining and data aggregation basics
Data mining and data aggregation basicsData mining and data aggregation basics
Data mining and data aggregation basics
 
Around Data Science (v. 2020 ITA)
Around Data Science (v. 2020 ITA)Around Data Science (v. 2020 ITA)
Around Data Science (v. 2020 ITA)
 
How we perceive information (v. 2020 ITA)
How we perceive information (v. 2020 ITA)How we perceive information (v. 2020 ITA)
How we perceive information (v. 2020 ITA)
 
Visual communication of quantitative data (v. 2020 ITA)
Visual communication of quantitative data (v. 2020 ITA)Visual communication of quantitative data (v. 2020 ITA)
Visual communication of quantitative data (v. 2020 ITA)
 
Data Lingo (v. ITA 2021)
Data Lingo (v. ITA 2021)Data Lingo (v. ITA 2021)
Data Lingo (v. ITA 2021)
 
Data science landscape in the insurance industry
Data science landscape in the insurance industryData science landscape in the insurance industry
Data science landscape in the insurance industry
 
The data science revolution in insurance
The data science revolution in insuranceThe data science revolution in insurance
The data science revolution in insurance
 
Keynote acm10.14.2017
Keynote acm10.14.2017Keynote acm10.14.2017
Keynote acm10.14.2017
 
How we perceive information
How we perceive informationHow we perceive information
How we perceive information
 
Big data-and-creativity v.1
Big data-and-creativity v.1Big data-and-creativity v.1
Big data-and-creativity v.1
 
Visual analytics
Visual analyticsVisual analytics
Visual analytics
 
Data scientist
Data scientistData scientist
Data scientist
 
Information visualization: information dashboards
Information visualization: information dashboardsInformation visualization: information dashboards
Information visualization: information dashboards
 
Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8
 
Focus composants-english-v0
Focus composants-english-v0Focus composants-english-v0
Focus composants-english-v0
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - Introduction
 
Data Visualization
Data Visualization Data Visualization
Data Visualization
 
Www.diva portal.org smash-get_diva2_328402_fulltext01
Www.diva portal.org smash-get_diva2_328402_fulltext01Www.diva portal.org smash-get_diva2_328402_fulltext01
Www.diva portal.org smash-get_diva2_328402_fulltext01
 
Digital Prosumer - Identification of Personas through Intelligent Data Mining...
Digital Prosumer - Identification of Personas through Intelligent Data Mining...Digital Prosumer - Identification of Personas through Intelligent Data Mining...
Digital Prosumer - Identification of Personas through Intelligent Data Mining...
 
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
 

Ähnlich wie Around Data Science (v. 2021 ITA)

Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Miningtobiemuir
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big dataDigimark
 
Big data and the data quality imperative
Big data and the data quality imperativeBig data and the data quality imperative
Big data and the data quality imperativeTrillium Software
 
FAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfFAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfAlan Morrison
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016
Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016
Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016Quantopian
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategyHimanshu Bari
 
6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data mining6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data miningINFOGAIN PUBLICATION
 
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Ali Alkan
 
Climate change action through artificial intelligence
Climate change action through artificial intelligenceClimate change action through artificial intelligence
Climate change action through artificial intelligenceweADAPT
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data miningNeeda Multani
 
"Machine Learning Approaches to Regime-aware Portfolio Management" by Michael...
"Machine Learning Approaches to Regime-aware Portfolio Management" by Michael..."Machine Learning Approaches to Regime-aware Portfolio Management" by Michael...
"Machine Learning Approaches to Regime-aware Portfolio Management" by Michael...Quantopian
 
Data mining
Data miningData mining
Data miningsagar dl
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallTrillium Software
 
IRJET- A Study on Data Mining in Software
IRJET- A Study on Data Mining in SoftwareIRJET- A Study on Data Mining in Software
IRJET- A Study on Data Mining in SoftwareIRJET Journal
 
BCS BISSG Business Intelligence - Past, Present and Future
BCS BISSG Business Intelligence - Past, Present and FutureBCS BISSG Business Intelligence - Past, Present and Future
BCS BISSG Business Intelligence - Past, Present and FutureGary Nuttall MBCS CITP
 

Ähnlich wie Around Data Science (v. 2021 ITA) (20)

Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Mining
 
The ABCs of Big Data
The ABCs of Big DataThe ABCs of Big Data
The ABCs of Big Data
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big data
 
Big data and the data quality imperative
Big data and the data quality imperativeBig data and the data quality imperative
Big data and the data quality imperative
 
Data Mining Applications And Feature Scope Survey
Data Mining Applications And Feature Scope SurveyData Mining Applications And Feature Scope Survey
Data Mining Applications And Feature Scope Survey
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
FAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfFAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdf
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016
Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016
Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategy
 
6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data mining6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data mining
 
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
 
Climate change action through artificial intelligence
Climate change action through artificial intelligenceClimate change action through artificial intelligence
Climate change action through artificial intelligence
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data mining
 
Challenges of Big Data Research
Challenges of Big Data ResearchChallenges of Big Data Research
Challenges of Big Data Research
 
"Machine Learning Approaches to Regime-aware Portfolio Management" by Michael...
"Machine Learning Approaches to Regime-aware Portfolio Management" by Michael..."Machine Learning Approaches to Regime-aware Portfolio Management" by Michael...
"Machine Learning Approaches to Regime-aware Portfolio Management" by Michael...
 
Data mining
Data miningData mining
Data mining
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
 
IRJET- A Study on Data Mining in Software
IRJET- A Study on Data Mining in SoftwareIRJET- A Study on Data Mining in Software
IRJET- A Study on Data Mining in Software
 
BCS BISSG Business Intelligence - Past, Present and Future
BCS BISSG Business Intelligence - Past, Present and FutureBCS BISSG Business Intelligence - Past, Present and Future
BCS BISSG Business Intelligence - Past, Present and Future
 

Mehr von Frieda Brioschi

Storytelling with data (v. 2021 ITA)
Storytelling with data (v. 2021 ITA)Storytelling with data (v. 2021 ITA)
Storytelling with data (v. 2021 ITA)Frieda Brioschi
 
Visual communication of qualitative and quantitative data (v. 2021 ITA)
Visual communication of qualitative and quantitative data (v. 2021 ITA)Visual communication of qualitative and quantitative data (v. 2021 ITA)
Visual communication of qualitative and quantitative data (v. 2021 ITA)Frieda Brioschi
 
How we perceive information (v. 2021 ITA)
How we perceive information (v. 2021 ITA)How we perceive information (v. 2021 ITA)
How we perceive information (v. 2021 ITA)Frieda Brioschi
 
Information Classification (v. ITA 2021)
Information Classification (v. ITA 2021)Information Classification (v. ITA 2021)
Information Classification (v. ITA 2021)Frieda Brioschi
 
What are data and information, why they matter (v. ITA 2021)
What are data and information, why they matter (v. ITA 2021)What are data and information, why they matter (v. ITA 2021)
What are data and information, why they matter (v. ITA 2021)Frieda Brioschi
 
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)Frieda Brioschi
 
Storytelling with data (v. 2020 ITA)
Storytelling with data (v. 2020 ITA)Storytelling with data (v. 2020 ITA)
Storytelling with data (v. 2020 ITA)Frieda Brioschi
 
Data Lingo (v. ITA 2020)
Data Lingo (v. ITA 2020)Data Lingo (v. ITA 2020)
Data Lingo (v. ITA 2020)Frieda Brioschi
 
Information Classification (v. ITA 2020)
Information Classification (v. ITA 2020)Information Classification (v. ITA 2020)
Information Classification (v. ITA 2020)Frieda Brioschi
 
What are data and information, why they matter (v. ITA 2020)
What are data and information, why they matter (v. ITA 2020)What are data and information, why they matter (v. ITA 2020)
What are data and information, why they matter (v. ITA 2020)Frieda Brioschi
 
Visual communication of quantitative data
Visual communication of quantitative dataVisual communication of quantitative data
Visual communication of quantitative dataFrieda Brioschi
 
Information Classification
Information ClassificationInformation Classification
Information ClassificationFrieda Brioschi
 
What are data and information, why they matter
What are data and information, why they matterWhat are data and information, why they matter
What are data and information, why they matterFrieda Brioschi
 
Communication for beginners (v. 2019 ita)
Communication for beginners (v. 2019 ita)Communication for beginners (v. 2019 ita)
Communication for beginners (v. 2019 ita)Frieda Brioschi
 

Mehr von Frieda Brioschi (16)

Storytelling with data (v. 2021 ITA)
Storytelling with data (v. 2021 ITA)Storytelling with data (v. 2021 ITA)
Storytelling with data (v. 2021 ITA)
 
Visual communication of qualitative and quantitative data (v. 2021 ITA)
Visual communication of qualitative and quantitative data (v. 2021 ITA)Visual communication of qualitative and quantitative data (v. 2021 ITA)
Visual communication of qualitative and quantitative data (v. 2021 ITA)
 
How we perceive information (v. 2021 ITA)
How we perceive information (v. 2021 ITA)How we perceive information (v. 2021 ITA)
How we perceive information (v. 2021 ITA)
 
Information Classification (v. ITA 2021)
Information Classification (v. ITA 2021)Information Classification (v. ITA 2021)
Information Classification (v. ITA 2021)
 
What are data and information, why they matter (v. ITA 2021)
What are data and information, why they matter (v. ITA 2021)What are data and information, why they matter (v. ITA 2021)
What are data and information, why they matter (v. ITA 2021)
 
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
 
Storytelling with data (v. 2020 ITA)
Storytelling with data (v. 2020 ITA)Storytelling with data (v. 2020 ITA)
Storytelling with data (v. 2020 ITA)
 
Data Lingo (v. ITA 2020)
Data Lingo (v. ITA 2020)Data Lingo (v. ITA 2020)
Data Lingo (v. ITA 2020)
 
Information Classification (v. ITA 2020)
Information Classification (v. ITA 2020)Information Classification (v. ITA 2020)
Information Classification (v. ITA 2020)
 
What are data and information, why they matter (v. ITA 2020)
What are data and information, why they matter (v. ITA 2020)What are data and information, why they matter (v. ITA 2020)
What are data and information, why they matter (v. ITA 2020)
 
Storytelling with data
Storytelling with dataStorytelling with data
Storytelling with data
 
Visual communication of quantitative data
Visual communication of quantitative dataVisual communication of quantitative data
Visual communication of quantitative data
 
Data Lingo
Data LingoData Lingo
Data Lingo
 
Information Classification
Information ClassificationInformation Classification
Information Classification
 
What are data and information, why they matter
What are data and information, why they matterWhat are data and information, why they matter
What are data and information, why they matter
 
Communication for beginners (v. 2019 ita)
Communication for beginners (v. 2019 ita)Communication for beginners (v. 2019 ita)
Communication for beginners (v. 2019 ita)
 

Kürzlich hochgeladen

Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Celine George
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptxAneriPatwari
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxAneriPatwari
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 

Kürzlich hochgeladen (20)

Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptx
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptx
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 

Around Data Science (v. 2021 ITA)

  • 1. Frieda Brioschi - frieda.brioschi@gmail.com Emma Tracanella - emma.tracanella@gmail.com AROUND DATA SCIENCE LESSON 5 - 2020/21
  • 2. LESSON 5 2 DESCRIBE YOUR PROJECT Photo by William Iven on Unsplash
  • 3. LESSON 5 A COUPLE OF DIGRESSIONS ▸ storage issues ▸ http://blog.odsi.co.uk/wp-content/uploads/2013/08/History-of-computer- data-storage.png.jpg ▸ the rise of data center ▸ computational power ▸ the Internet 3
  • 5. LESSON 5 DATA CENTER CLOUD (4.563 IN 2019) 5https://www.digitalic.it/tecnologia/data-center-cloud-numeri-e-diffusione-nel-mondo-litalia-tra-i-paesi-europei-che-ne-ospita-di-piu
  • 6. BIG DATA WHAT ARE Photo by ev on Unsplash
  • 7. LESSON 5 DEFINITION The term “big data” refers to data that is so large, fast or complex that it’s difficult or impossible to process using traditional methods. The concept of big data gained momentum in the early 2000s when industry analyst Doug Laney articulated the definition of big data as the three V’s: ▸ Volume: Organizations collect data from a variety of sources, including business transactions, smart (IoT) devices, industrial equipment, videos, social media and more. In the past, storing it would have been a problem. ▸ Velocity: With the growth in the Internet of Things, data streams in to businesses at an unprecedented speed and must be handled in a timely manner, near-real time. ▸ Variety: Data comes in all types of formats – from structured, numeric data in traditional databases to unstructured text documents, emails, videos, audios, stock ticker data and financial transactions. 7
  • 12. LESSON 5 CORRELATION When two sets of data are strongly linked together we say they have a High Correlation. ▸ Correlation is Positive when the values increase together, and ▸ Correlation is Negative when one value decreases as the other increases Correlation can have a value: ▸ 1 is a perfect positive correlation ▸ 0 is no correlation (the values don't seem linked at all) ▸ -1 is a perfect negative correlation 12
  • 13. LESSON 5 CORRELATION Correlation is one of the most widely used statistical concepts. Since the term "correlation" refers to a mutual relationship or association between quantities, why is it a useful metric? ▸ Correlation can help in predicting one quantity from another ▸ Correlation can (but often does not) indicate the presence of a causal relationship ▸ Correlation is used as a basic quantity and foundation for many other modeling techniques 13
  • 19. LESSON 5 LINKED DATA / LOD 19 Linked data is structured data which is interlinked with other data so it becomes more useful through semantic queries.It builds upon standard Web technologies but rather than using them to serve web pages only for human readers, it extends them to share information in a way that can be read automatically by computers. Part of the vision of linked data is for the Internet to become a global database. Linked data may also be open data, in which case it is usually described as linked open data (LOD). ▸ https://en.wikipedia.org/wiki/Linked_data
  • 21. LESSON 5 GOOGLE KNOWLEDGE GRAPH 21 https://www.youtube.com/watch?v=mmQl6VGvX-c
  • 22. LESSON 5 WHY LINKED DATA MATTERS Linked data is a method for publishing structured data using vocabularies like schema.org that can be connected together and interpreted by machines. Using linked data, statements encoded in triples can be spread across different websites. This enables data from different sources to be connected and queried. ▸ https://wordlift.io/blog/en/entity/linked-data/ 22
  • 24. LESSON 5 CONTEXT You don’t have to be a fancy statistician to do data mining, but you do have to know something about what the data signifies and how the business works. Only when you understand the data and the problem that you need to solve can data-mining processes help you to discover useful information and put it to use. 24
  • 25. LESSON 5 NINE LAWS OF DATA MINING - 1 Pioneering data miner Thomas Khabaza developed his “Nine Laws of Data Mining” to guide new data miners as they get down to work ▸ 1 - “Business Goals Law” 
 Business objectives are the origin of every data mining solution. A data miner is someone who discovers useful information from data to support specific business goals. Data mining isn’t defined by the tool you use. ▸ 2 - “Business Knowledge Law”
 Business Knowledge is central to every step of the data mining process. You don’t have to be a fancy statistician to do data mining, but you do have to know something about what the data signifies and how the business works. 25
  • 26. LESSON 5 NINE LAWS OF DATA MINING - 2 ▸ 3. “Data Preparation Law”
 Data preparation is more than half of every data mining process. Pretty much every data miner will spend more time on data preparation than on analysis. ▸ 4. “No Free Lunch for the Data Miner”
 The right model for a given application can only be discovered by experiment. In data mining, models are selected through trial and error. ▸ 5 - “Patterns”
 There are always patterns in the data. As a data miner, you explore data in search of useful patterns. Understanding patterns in the data enables you to influence what happens in the future. 26
  • 27. LESSON 5 NINE LAWS OF DATA MINING - 3 ▸ 6.  “Insight Law”
 Data mining amplifies perception in the business domain. Data mining methods enable you to understand your business better than you could have done without them. ▸ 7 - “Prediction Law”
 Prediction increases information locally by generalization. Data mining helps us use what we know to make better predictions (or estimates) of things we don’t know. 27
  • 28. LESSON 5 NINE LAWS OF DATA MINING - 4 ▸ 8. “Value Law”
 The value of data mining results is not determined by the accuracy or stability of predictive models. Your model must produce good predictions, consistently. That’s it. ▸ 9. “Law of Change”
 All patterns are subject to change. Any model that gives you great predictions today may be useless tomorrow. 28
  • 29. LESSON 5 PHASES OF THE DATA MINING PROCESS The Cross-Industry Standard Process for Data Mining (CRISP-DM) is the dominant data-mining process framework. It’s an open standard; anyone may use it. 29
  • 30. LESSON 5 BUSINESS UNDERSTANDING Get a clear understanding of the problem you’re out to solve, how it impacts your organization, and your goals for addressing it. Tasks in this phase include: ▸ Identifying your business goals ▸ Assessing your situation ▸ Defining your data mining goals ▸ Producing your project plan 30
  • 31. LESSON 5 DATA UNDERSTANDING Review the data that you have, document it, identify data management and data quality issues. Tasks in this phase include: ▸ Gathering data ▸ Describing ▸ Exploring ▸ Verifying quality 31
  • 32. LESSON 5 DATA PREPARATION Get your data ready to use for modeling. Tasks in this phase include: ▸ Selecting data ▸ Cleaning data ▸ Constructing ▸ Integrating ▸ Formatting 32
  • 33. LESSON 5 MODELING Use mathematical techniques to identify patterns within your data. Tasks in this phase include: ▸ Selecting techniques ▸ Designing tests ▸ Building models ▸ Assessing models 33
  • 34. LESSON 5 EVALUATION Review the patterns you have discovered and assess their potential for business use. Tasks in this phase include: ▸ Evaluating results ▸ Reviewing the process ▸ Determining the next steps 34
  • 35. LESSON 5 DEPLOYMENT Put your discoveries to work in everyday business.  Tasks in this phase include: ▸ Planning deployment (your methods for integrating data mining discoveries into use) ▸ Reporting final results ▸ Reviewing final results 35
  • 37. LESSON 5 DATA AGGREGATION Data aggregation is the process where raw data is gathered and expressed in a summary form for statistical analysis. For example, raw data can be aggregated over a given time period to provide statistics. After the data is aggregated and written to a view or report, you can analyze the aggregated data to gain insights about particular resources or resource groups. There are two types of data aggregation: ▸ Time aggregation - All data points for a single resource over a specified time period. ▸ Spatial aggregation - All data points for a group of resources over a specified geographical area. 37
  • 38. LESSON 5 SUMMARY STATISTICS When data is aggregated, groups of observations are replaced with summary statistics based on those observations. Summary statistics are used tto communicate the largest amount of information as simply as possible. ▸ Mean ▸ Count ▸ Maximum ▸ Median ▸ Minimum ▸ Mode ▸ Range ▸ Sum 38
  • 39. LESSON 5 TABLES Tables are the format in which most numerical data are initially stored and analysed and are likely to be the means you use to organise data collected during experiments and dissertation research. Tables are an effective way of presenting data: • when you wish to show how a single category of information varies when measured at different points (in time or space). • when the dataset contains relatively few numbers. • when the precise value is crucial to your argument and a graph would not convey 39
  • 40. LESSON 5 BAR CHARTS Bar charts are one of the most commonly used types of graph and are used to display and compare the number, frequency or other measure for different discrete categories or groups. The bars can be drawn either vertically or horizontally depending upon the number of categories and length or complexity of the category labels. 40
  • 41. LESSON 5 HISTOGRAMS Histograms are a special form of bar chart where the data represent continuous rather than discrete categories. Since a continuous category may have a large number of possible values the data are often grouped to reduce the number of data points. 41
  • 42. LESSON 5 PIE CHARTS Pie charts are a visual way of displaying how the total data are distributed between different categories. Pie charts should only be used for displaying nominal data. They are generally best for showing information grouped into a small number of categories and are a graphical way of displaying data that might otherwise be presented as a simple table. 42 Pie chart of populations of English native speakers
  • 43. LESSON 5 LINE GRAPHS Line graphs are usually used to show time series data – that is how one or more variables vary over a continuous period of time. Line graphs are particularly useful for identifying patterns and trends in the data such as seasonal effects, large changes and turning points. As well as time series data, line graphs can also be appropriate for displaying data that are measured over other continuous variables such as distance. 43
  • 44. DATA SCIENCE WHAT IS Photo by ev on Unsplash
  • 45. LESSON 5 DEFINITION Data Science is a blend of various tools, algorithms, and machine learning principles with the goal to discover hidden patterns from the raw data and solve analytically complicated problems. 45
  • 46. LESSON 5 APPLICATION OF DATA SCIENCE 46
  • 48. LESSON 5 EXPLAINING VS PREDICTING 48 By 2020 more than 80 % of the data will be unstructured. This data is generated from different sources like financial logs, text files, multimedia forms, sensors, and instruments.
  • 51. LESSON 5 51 The Data Scientist has the ability to handle the crude data using the latest technologies and techniques, can perform the necessary analysis, and can present the acquired knowledge to his associates in an informative way.
  • 52. LESSON 5 52 The Data Analyst works with R, Python and SQL; the role combines technical and analytical knowledge.
  • 53. LESSON 5 53 The Data Architect integrates, centralizes, protects and maintains data sources.
  • 54. LESSON 5 54 The Statistician can be seen as the pioneer of the data science field. It is often he who reaps the information from the data and transforms it into actionable insights.
  • 55. LESSON 5 55 The Database Administrator ensures that the database is accessible to every stakeholder in the organizations and performs the necessary safety measures to keep the stored data safe.
  • 56. LESSON 5 56 The Business Analyst is probably the least technical profile, he has a deep understanding of the various business processes that are in place. He often performs the role of the middle person between the business folks and the technicians.
  • 57. LESSON 5 57 The Data and Analytics Manager steers the direction of the data science team. He consolidates strong and specialized skills in a various arrangement of advancements (SQL, R, SAS, … ) with the social aptitudes required to deal with a group.
  • 59. LESSON 5 THE NY TIMES https://www.nytimes.com/interactive/2019/11/02/us/politics/trump-twitter- disinformation.html 59