P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
Β
Statistics
1. 2013/05/22
1
STATISTICS
X-Kit Textbook
Chapter 9
Precalculus Textbook
Appendix B: Concepts in Statistics
Par B.2
CONTENT
THE GOAL
Look at ways of summarising a large
amount of sample data in just one or two
key numbers.
Two important aspects of a set of data:
β’The LOCATION
β’The SPREAD
MEASURES OF CENTRAL TENDENCY
(LOCATION)
Arithmetic Mean (Average)
Mode (the highest point/frequency)
Median (the middle observation)
Number of fraudulent cheques received at a
bank each week for 30 weeks
Week
1
2 3 4 5 6 7 8 9 10
5 3 8 3 3 1 10 4 6 8
Week
11
12 13 14 15 16 17 18 19 20
3 5 4 7 6 6 9 3 4 5
Week
21
22 23 24 25 26 27 28 29 30
7 9 4 5 8 6 4 4 10 4
ARITHMETIC MEAN
β’ π =
πππ
ππ
= π. ππ
β’ To calculate the MEAN add all the data points
in our sample and divide by die number of
data points (sample size).
β’ The MEAN can be a value that doesnβt
actually match any observation.
β’ The MEAN gives us useful information about
the location of our frequency distribution.
2. 2013/05/22
2
GRAPH
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8 9 10
Frequency
Frequency
CALCULATE THE MEAN
Raw Data
β’ π₯ =
π₯
π
β’ π₯ is data
points
β’ π is number
of
observations
Frequency
Table
β’ π₯ =
π₯π
π
β’ π₯ is data
points
β’ π is number
of
observations
β’ π is the
frequency
Frequency
Table (Intervals)
β’ π₯ =
π₯π
π
β’ π₯ is midpoints
for intervals
β’ π is number
of
observations
β’ π is the
frequency
CALCULATE THE MEAN - FREQUENCY TABLE:
NUBEROFFRAUDULENT CHEQUESPERWEEK
Distinct Values TallyMarks Frequency
1 / 1
2 0
3 //// 5
4 //// // 7
5 //// 4
6 //// 4
7 // 2
8 /// 3
9 // 2
10 // 2
Truck Data: weights (in tonnes) of 20 fully
loaded trucks
Truck
1
2 3 4 5 6 7 8 9 10
Weight
4.54
3.81 4.29 5.16 2.51 4.63 4.75 3.98 5.04 2.80
Truck
11
12 13 14 15 16 17 18 19 20
Weight
2.52
5.88 2.95 3.59 3.87 4.17 3.30 5.48 4.26 3.53
CALCULATE THE MEAN - GROUPED
FREQUENCY TABLE:
TruckData: weights(intonnes)of20fullyloadedtrucks
Class Intervals Frequency Midpoint
π. π β€ π β€ π. π 4 π. π + π. π Γ· π = 2.75
π. π < π β€ π. π 1 3.25
π. π < π β€ π. π 5 3.75
π. π < π β€ π. π 3 4.25
π. π < π β€ π. π 3 4.75
π. π < π β€ π. π 3 5.25
π. π < π β€ π. π 1 5.75
MODE
β’The mode is the interval with the
HIGHEST FREQUENCY.
β’There can be two or more modes in a set
of data β then the mode would not be a
good measure of central tendency.
β’MULTI-MODAL data consist of more than
one mode.
β’UNI-MODAL data consist of only one
mode.
4. 2013/05/22
4
DONβT FALL INTO THE COMMON TRAP
β’ The median is NOT the middle of the range of
observations, for example
1, 1, 1, 1, 1, 3, 9
ο±The median is 1 (the middle observation).
ο±The middle of the range (9 β 1) is 5! Big
difference!
MEDIAN
Odd Number of
Observations,
for example 7
Median Position
π+π
π
Even Number of
Observations,
for example30
Median Position
half-way between
π
π
πππ (
π
π
+ π)
FINDTHE MEDIAN -FREQUENCYTABLE:
NUBER OF FRAUDULENT CHEQUES PERWEEK
Distinct Values Frequency Cumulative
Frequency
1 1 1
2 0 1
3 5 6
4 7 13
5 4 17
6 4 21
7 2 23
8 3 26
9 2 28
10 2 30
FIND THE MEDIAN - GROUPED FREQUENCY
TABLE:
TruckData: weights(intonnes)of20fullyloadedtrucks
ClassIntervals Frequency Midpoint
π. π β€ π β€ π. π 4 π. π + π. π Γ· π = 2.75
π. π < π β€ π. π 1 3.25
π. π < π β€ π. π 5 3.75
π. π < π β€ π. π 3 4.25
π. π < π β€ π. π 3 4.75
π. π < π β€ π. π 3 5.25
π. π < π β€ π. π 1 5.75
FIND THE MEDIAN FROM A GROUPED
FREQUENCY TABLE
β’Median (middle observation)?
β’Find the class interval in which that
observation lies.
?
CALCULATIONS
Raw Data
Mean
Mode
Median
Frequency Table
(Ungrouped
Data)
Mean
Mode
Median
Frequency Table
(Grouped Data)
Mean
Mode
Median
5. 2013/05/22
5
HOW TO CHOOSE THE BEST MEASURE OF
LOCATION?
β’ When choosing the best measure of location, we
need to look as the SHAPE of the distribution.
β’ For nearly symmetric data, the mean is the best
choice.
β’ For very skewed (asymmetric) data, the mode or
median is better.
β’ The mean moves further along the tail than the
median, it is more sensitive to the values far from
the centre.
SYMMETRIC histogram:
Mean = Median = Mode
A POSITIVELY SKEWED (skewed to the right)
histogram has a longer tail on the right side:
Mode < Median < Mean
A NEGATIVELY SKEWED (skewed to the left)
histogram has a longer tail on the left side:
Mean < Median < Mode
PROBLEM
β’We can find two very different data sets (one
distribution very spread out and another very
concentrated) with measures of central
tendency EQUAL.
β’To find a true idea of our sample, we have to
MEASURE THE SPREAD OF A DISTRIBUTION,
called the spread dispersion.
MEASURESOF SPREAD(DISPERSION)
Interquartile Range
Variance
Standard Deviation
6. 2013/05/22
6
MEASURINGSPREAD
β’Think of a distribution in terms of
percentages, a horizontal axis equally divided
into 100 percentiles.
β’The 10th percentile marks the point below
which 10% of the observations fall, and
above which 90% of observations fall.
β’The 50th percentile, below which 50% of the
observations lie, is the median.
WORKINGWITH A PERCENTILE
β’ π% of the observationfall belowthe π π‘β percentile.
π·πππππππ =
π
πππ
π + π
β’ Workingwith the example on fraudulentcheques:
1, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6,
7, 7, 8, 8, 8, 9, 9, 10, 10
π· ππ =
ππ
πππ
ππ + π = ππ. π
β’ 15.5 tells us where to find our 50th percentile.
β’ 15 tells us which observation to go to, and 0.5 tells us how far to
move along the space between that observation and the next
highest one.
FORMULA
β’ π· ππ = π ππ + π. π π ππ β π ππ
π· π = π π + π π π+π β π π
β’ π means percentile
β’ π tell us which percentile
β’ π the whole number calculated from the
position
β’ π the decimal fraction calculated from the
position
WORKINGWITH PERCENTILESFROMUNGROUPEDFREQUENCYDATA:
NUBEROFFRAUDULENT CHEQUESPERWEEK
Distinct Values Frequency Cumulative Frequency
1 1 1
2 0 0 + 1 = 1
3 5 1 + 5 = 6
4 7 6 + 7 = 13
5 4 13 + 4 = 17
6 4 17 + 4 = 21
7 2 21 + 2 = 23
8 3 23 + 3 = 26
9 2 26 + 2 = 28
10 2 28 + 2 = 30
WORKING WITH PERCENTILES (AND
MEDIAN) FROM GROUPED DATA
β’ To identify the class interval π³ < π β€ πΌ containing the
π π‘β percentile:
π·πππππππ =
π
πππ
π + π
β’ The decimal fraction for grouped data is:
π =
π·πππππππβπΊππ ππ πππππ πππππππππππ ππ π³
πππππππππ ππ πππππ π³ < π β€ πΌ
β’ Calculate the π π‘β percentile:
π· π β π³ + π πΌ β π³
FIND THE MEDIAN - GROUPED FREQUENCY
TABLE:
TruckData: weights(intonnes)of20fullyloadedtrucks
Class Intervals Frequency CumulativeFrequency
π. π β€ π β€ π. π 4 4
π. π < π β€ π. π 1 5
π. π < π β€ π. π 5 10
π. π < π± β€ π. π 3 13
π. π < π β€ π. π 3 16
π. π < π β€ π. π 3 19
π. π < π β€ π. π 1 20
7. 2013/05/22
7
FIND THEMEDIAN-GROUPEDFREQUENCYTABLE:
TruckData: weights(intonnes)of20fullyloadedtrucks
β’ To identify the class interval π. π < π β€ π. π containing
the 50 π‘β percentile:
π·πππππππ =
ππ
πππ
ππ + π = ππ. π
β’ The decimal fraction for grouped data is:
π =
ππ.π β ππ
π
=
π
π
β’ Calculate the π π‘β percentile:
π· ππ β π. π + π π. π β π. π = π. πππππ
MEASURINGSPREAD
β’ If we measure the DIFFERENCE in value between
one percentile and another, this would give us an
idea of how widely our data is spread out.
β’ INTERQUARTILE RANGE (IQR) = 75th β 25th Percentiles
β’ The bigger the IQR, the more spread out the data.
β’ The 75th percentile β₯ 25th percentile, therefor the
IQR β₯ 0 .
β’ We tend to use the MEDIAN (as measure of
central tendency) together with the IQR.
FIND THE IQR - GROUPED FREQUENCY
TABLE:
TruckData: weights(intonnes)of20fullyloadedtrucks
ClassIntervals Frequency CumulativeFrequency
π. π β€ π β€ π. π 4 4
π. π < π β€ π. π 1 5
π. π < π β€ π. π 5 10
π. π < π β€ π. π 3 13
π. π < π β€ π. π 3 16
π. π < π β€ π. π 3 19
π. π < π β€ π. π 1 20
FIND THEMEDIAN-GROUPEDFREQUENCYTABLE:
TruckData: weights(intonnes)of20fullyloadedtrucks
β’ To identify the class interval π. π < π β€ π. π containing
the 75 π‘β percentile:
π·πππππππ =
ππ
πππ
ππ + π = ππ. ππ
β’ The decimal fraction for grouped data is:
π =
ππ. ππ β ππ
π
= π. πππ
β’ Calculate the π π‘β percentile:
π· ππ β π. π + π π. π β π. π = π. πππ
FIND THEMEDIAN-GROUPEDFREQUENCYTABLE:
TruckData: weights(intonnes)of20fullyloadedtrucks
β’ To identify the class interval π. π < π β€ π.0 containing
the 25 π‘β percentile:
π·πππππππ =
ππ
πππ
ππ + π = π. ππ
β’ The decimal fraction for grouped data is:
π =
π. ππ β π
π
= π. ππ
β’ Calculate the π π‘β percentile:
π· ππ β π. π + π π. π β π. π = π. πππ
β’ IQR = 4.958 β 3.525 = 1.433
MEASURINGSPREAD
β’ When we use the MEAN as our measure of central
tendency, we usually choose A MEASURE OF HOW FAR
THE DATA IS SPREAD OUT AROUND THE MEAN.
β’ Two measures of spread that are based on the mean are
the VARIANCE and the STANDARD DEVIATION.
β’ An advantage of standard deviation is that it is measured
in the same units as the original observations.
β’ The variance and standard deviation are closely related.
β’ The variance (π π or π π) is the square of the standard
deviation (π or π).