SlideShare a Scribd company logo
1 of 68
‫الرحيم‬ ‫الرحمن‬ ‫هللا‬ ‫بسم‬
An Introduction to Statistical Tools and SPSS
used in Social Research
Presented by
Professor Dr. Md. Nazrul Islam Mondal
Visiting Scholar
ERASMUS+ Fellowship Program
Department of Sociology
Middle East Technical University
Ankara, TURKEY
E-mail: nazrulupm@gmail.com
Outline
i. Data presentation
ii. Central tendency
iii. Skewness and kurtosis
iv. Measures of dispersion
v. Correlation
vi. Regression
2
DATA PRESENTATION
• Statistics:
Statistics is a branch of scientific methodology.
Data: Data is a collection of facts or information from
which conclusions may be drawn.
• Stages of statistical investigation:
 Collection,
 Organization,
 Presentation,
 Analysis, and
 Interpretation.
3
• Population and sample
Essential purpose of statistics:
i. describe about the numerical properties of
populations, and
ii. draw inferences about the population from the
samples.
• Population: It is the entire category under
consideration. Its size is usually denoted by N.
• Sample: A sample is a representative part of
the population. It is a subset or portion of the
population. Its size is denoted by n.
4
• Parameter: It is a characteristic or measure
obtained from a population.
• Statistic: It is characteristic or measure obtained
from a sample.
• Descriptive statistics: They provide simple
summaries about the sample and the measures.
• Statistical inference: To draw conclusions about
population parameters from sample.
It performs:
i. hypothesis testing;
ii. determine relationships between
variables, and
iii. makes predictions.
5
• Variables: The measureable characteristics are
called variables.
• Types:
i. Qualitative, and
ii. Quantitative
• Random Variable: A variable whose values are
determined by chance.
• Constant: A constant is a particular type of
variable, which does not vary from one
member of a group to another.
6
• Discrete variables: Usually it is obtained by
counting. Integers.
• Examples: number of children, number of students in METU, etc.
• Continuous variables: Usually it is obtained by
measurement. Real numbers.
• Examples: height, weight, etc.
• Data types (collection ways)
– Primary data: Primary data come mainly from
direct field operations.
– Secondary data: Secondary data are usually
obtained from already published or unpublished
documents.
7
• Types of data
i. categorical, and
ii. numerical.
• Categorical data types:
i. nominal (gender, religion, room numbers, etc.),
ii. ordinal (education level, book chapters, health status, etc.)
• Numerical data types:
i. interval scale (age groups, dates, etc.)
ii. ii. ratio scale (BMI, CWR, etc.)
8
• Frequency: The number of times a certain
value or class of values occurs.
• Frequency distribution: A tabular
arrangement of data by classes together with
the corresponding class frequencies is called
frequency distribution.
9
• Data types:
i. Ungrouped data (without frequency),
Ii. Grouped data (with frequency).
• Data presentation by tables, graphs
– frequency distributions,
– bar diagram,
– histograms,
– multiple bar diagram,
– Box plot,
– Frequency polygon,
– other graphs.
10
Examples
11
Activity Number of Students
Play Sports 45
Talk on Phone 53
Visit With Friends 99
Earn Money 44
Chat Online 66
School Clubs 22
Watch TV 37
n=366
The students in the Department of Sociology, METU were involved the following activities:
45
53
99
44
66
22
37
0
20
40
60
80
100
120
Play Sports Talk on Phone Visit With
Friends
Earn Money Chat Online School Clubs Watch TV
Series1
12
Bar diagram
Note: Bar diagram can be presented vertically or horizontally
to show comparisons among categories.
Histogram: A graphical representation of how many times different, mutually
exclusive events are observed in an experiment.
The data represents ages of a group of people.
13
36 25 38 46 55 68 72 55 36 38
67 45 22 48 91 46 52 61 58 55
0
50
100
150
200
250
300
Climbing Caving Walking Sailing
1997
1996
1995
14
1995 1996 1997
Climbing 21 34 30
Caving 10 12 21
Walking 75 85 100
Sailing 36 36 40
Combine bar diagram
Multiple bar diagram: Data on several variables in respect of different places
or time points may be presented by multiple bar diagram.
Grades
Years
1st year 2nd year 3rd year 4th year
Grade A 5 7 9 10
Grade B 15 18 15 12
Grade C 20 15 10 8
15
0
5
10
15
20
25
1st year 2nd year 3rd year 4th year
Grade A Grade B Grade C
Pie chart: Different components of data may be exhibited by splitting s circle. The
angle at the center of a circle is proportionally divided and accordingly splitting the
circle we exhibit different components of the data. The division is also done in
percentage according to the relative magnitude of different components. Usually the
components are demarked by different colors.
Grades 1st year
Grade A 5
Grade B 15
Grade C 20
Total 40
16
Grade A
12%
Grade B
38%
Grade C
50%
Social Science
Frequency polygon: A frequency polygon gives the idea about the shape of
the data distribution. The two end points of a frequency polygon always lie on
the x-axis.
17
Box plot: The box plot (box and whisker diagram) is the five number
summary: minimum, first quartile, median, third quartile, and maximum.
18
Scatter plot: A plot of the data values on a coordinate system. A
scatter plot (or scatter diagram) is used to show the relationship
between two variables. The independent variable is graphed
along the x-axis and the dependent variable along the y-axis.
19
Cumulative frequency polygon or ogive: A graph showing the cumulative
frequency less than any upper class boundary plotted against the upper class
boundary is called a cumulative frequency polygon or ogive.
20
CENTRAL TENDENCY
• Central Tendency: It is a single value that
attempts to describe a set of data by identifying
the central position within that set of data.
• Measures are:
–Mean,
–Median,
–Mode,
–Quartiles,
–Deciles, and
–Percentiles.
21
 
.,
1
.........
11
(HM),meanHarmonic
.,.........(GM),meanGeometric
ns.observatiotheofmeansofmeani.e.,,
..........
..............
mean,Weighted
.
..............
.,
21
1
21
321
2211
321
numberszerononforonly
xxx
n
x
numberspositivenadzerononforonlyxxxx
nnnn
nxnxnx
x
n
xxxx
n
x
meansamplexmeanSimple
n
n
n
k
kk
w
n
i











Mean types:
• Arithmetic mean (AM) (Simple mean, weighted mean)
• Geometric mean,
• Harmonic mean
Calculation methods (for ungrouped data set)
22
• Examples
23
./11.4
243
)23()44()35(
:
bus.thatofspeedmeantheFindhours.2for3km/hourandhours,4
for4km/hourhours,3forkm/hour5ranaverageanonbusaSuppose.
.94.5
101
95
5
120
101
5
10
1
12
1
5
1
3
1
8
1
5
).
.7.6)1012538(..6.7
5
1012538
.:
10.and12,5,3,8,numberstheofHMiii)andGM,ii)AM,i)FindQu.
5
1
hourkmxAnswer
Qu
HMiiiiii
GMiixAMiAnswers










• Median: The median is the middle score for a set of
data that has been arranged in order of magnitude.
• Note: At first arrange the data according to their magnitudes.
.5.2
2
32
2
2
6
is97,3,2,1,0,ofmedianThe:Example
.,
2
2
.23
2
15
theis73,2,1,0,ofmedianThe:Example
.,
2
1
,






















termnexttermth
numberevenanisnwhen
termnexttermth
n
Me
termrdtermth
numberoddanisnwhentermth
n
MeMedian
24
• Mode (M0)= Most frequent number in the observation.
– Example: 1, 3, 4, 5, 9, 3. Mode=3
• Unimodal: A distribution having only one mode is called
unimodal.
• Bimodal: A distribution having two modes is called
unimodal.
Example: 1, 3, 4, 5, 9, 3, 4. Mode=3 and 4
• Mode= mean-3(mean-median)
• Midrange: The mean of the highest and lowest values.
(Max + Min) / 2.
25
• QUANTILES: It divides the total frequency into a number of equal parts.
Types:
i. Quartiles,
ii. Deciles, and
iii. Percentiles.
• Quartiles: It divides the total frequency into four equal parts.
• Types:
i. 1st quartile, Q1 ,
ii. 2nd quartile, Q2, and
iii. 3rd quartile, Q3 .
• The 2nd quartile is identical with median,
• the 1st quartile is the value at or below which one-fourth (25%) of all items in the
series, and
• the 3rd quartile is the value at or below which three-fourths (75%) of the item lie.
26
 
.;3,2,1,
2
4
.;3,2,1,
4
1
numberevenanisnwheni
termnexttermthi
n
Q
numberoddanisnwhenitermth
in
Q
i
i











Quartiles calculation methods
(for ungrouped data)
At first arrange the data according to their magnitudes.
Here n is the number of observations
27
Deciles: Deciles divide the total frequency into ten equal parts. There
are nine types of deciles: 1st decile , 2nd decile ,…,9th decile .
 
.;9,....,3,2,1,
2
10
.;9....,3,2,1,
10
1
numberevenanisnwhenj
termnexttermthj
n
D
numberoddanisnwhenjtermth
jn
D
j
j











Calculation methods
28
Percentiles: It divide the total frequency into 100 equal parts.
There are 99 types of percentiles: 1st percentile, P1 ; 2nd
percentile, P2 ; …,99th percentile, P99 .
 
.;99,....3,2,1,
2
100
.;99.....,3,2,1,
100
1
numberevenanisnwhenk
termnexttermthk
n
P
numberoddanisnwhenktermth
kn
P
k
k











Calculation methods
29
• Examples:
30
 
 
 
 
 
  .8.18.)45(48.4
100
6017
.;99.....,3,2,1,
100
1
.8.36.)56(56.5
10
717
.;9....,3,2,1,
10
1
.56
4
317
.;3,2,1,
4
1
number.oddanis7,=nHere
4.-7,6,-2,1,0,5,numbersfollowingtheofpercentile60thiii)anddecile,7thii)quartile,3rdi)Find
60
7
3


















termthtermthtermthtermthtermthP
numberoddanisnwhenktermth
kn
P
termthtermthtermthtermthtermthD
numberoddanisnwhenjtermth
jn
D
termthtermthQ
numberoddanisnwhenitermth
in
QNow
k
j
i
For grouped data:
 
.,,, nsobservatioofnumbertotaltheNfwhere
f
xf
xmeanSimpleAM i
i
ii
 


Arithmetic mean (AM),
Again for the class interval grouped data, mean,
 
,


i
ii
f
Xf
x
where Xi be the mid-values of the classes.
31
• Geometric mean (GM),
• Harmonic mean (HM),
  .,........
1
21
21
numberspositivenadzerononforonlyxxxx nf
k
ff k

.,
.........
2
2
1
1
numberszerononforonly
x
f
n
x
f
x
f
x
f
n
x
i
i
k
k











32
Example:
33
 
./11.4
243
)23()44()35(
:
bus.thatofspeedmeantheFindhours.2for3km/hour
andhours,4for4km/hourhours,3forkm/hour5ranaverageanonbusaSupposeQu.
.96.4
121
600
60
121
10
2
1
6
4
8
2
5
3
1423
)
.39.5)2129664125(2685)
.7.5
1423
)21()64()82()53(
).
ly.respective1,and4,2,3,sfrequencieoccur with
2and6,8,5,numberstheofHMiii)andGM,ii)AM,i)FindQu.
10
1
10
1
1423
hourkmxAnswer
HMiii
GMii
xAMiAnswers













Median
class.mediantheofintervalclass=c
class,mediantheoffrequency=f
class,mediantheoffrequencycumulativepreceding=f
class,mediantheoflimitlower=L
n,observatioofnumbertotal=Nwhere
,2
p
c
f
f
N
LMe
p



34
Mode
Example
class.modaltheofintervalclass=c
class,modaltheoffrequencysubsequentandfrequencymodalbetweendifference=
class,modaltheoffrequencyprecedingandfrequencymodalbetweendifference=
class,modaltheoflimitlower=Lwhere,
,
2
1
21
1





 cLMo
35
Quartiles
class.quartiletheofintervalclass=c
class,quartiletheoffrequency=f
class,quartiletheoffrequencycumulativepreceding=f
classquartilethitheoflimitlower=L
n,observatioofnumbertotal=N
1,2,3.=iquartile,thi=where
,4
p
i
p
i
Q
c
f
fi
N
LQ 


36
Deciles
class.deciletheofintervalclass=c
class,deciletheoffrequency=f
class,deciletheoffrequencycumulativepreceding=f
class,decilethjtheoflimitlower=L
n,observatioofnumbertotal=N
..,9.1,2,3…=jdecile,thj=Dwhere
,10
p
j
c
f
fj
N
LD
p
j 


37
Percentiles
Example
class.percentiletheofintervalclass=c
class,percentiletheoffrequency=f
class,percentiletheoffrequencycumulativepreceding=f
classpercentilek ththeoflimitlower=L
n,observatioofnumbertotal=N
,99.…1,2,3…=kquartile,k th=Pwhere
,100
p
k
c
f
fk
N
LP
p
k 


38
SKEWNESS AND KURTOSIS
Moments
• The r-th moment of a variable x for ungrouped
data,
• for grouped data,
N
x
x
r
ir 




.,
,
nsobservatioofnumbertotaltheisNf
N
xf
x
i
r
iir
39
• The r-th moment of a variable x for
ungrouped data about the AM, is given by
• also for grouped data,
 
N
xx
r
i
r
 

 
N
xxf
r
i
r
 

40
• Symmetrical curve: A frequency curve is said to be
symmetrical if it can be folded along a vertical line at
centre so that the two halves of the figures coincide.
In a symmetrical distribution, the values of mean,
median and mode coincide.
41
• Skewness: Skewness means lack of symmetry
of a curve. It indicated whether the curve is
turned more to one side than to the other.
• Measures of skewness
• Karl Pearson’s Coefficient of Skewness:
42
.
)(3

MexMox
Skp




Figures of skewness
Skp<0
Mean<median<mode
Skp=0
Mean=median=mode
Skp>0
Mean>median>mode
43
• Measure of skewness based on moments,
• if then distribution is positively skewed;
• If then distribution is negatively skewed,
• If then the distribution is symmetric.
3
3
2
3
2
3
11




 
01 
01 
01 
44
• Kurtosis: the sharpness of the peak of a
frequency-distribution curve.
• Measures of kurtosis
Moment coefficient of kurtosis,
45
2
2
4
2


 
32 
32 
32 
46
MEASURES OF DISPERSION
Measures of dispersion measure how spread
out a set of data.
• Types of measures:
i. Absolute measures, and ii. Relative measures.
• Absolute measures:
i. Range,
ii. Mean deviation,
iii. Quartile deviation,
iv. Standard deviation,
v. Variance, and
vi. Standard error.
47
• Calculation methods, examples:
48
 
 
 
 
  
data.ofsetsfor twomeasureaisIt.,Covariance*
.100var*
nobservatioofnumbertheisn,)(*
var(x)=Variance*
.
,,
,,*
=IQRrange,quartileInter*,
2
*
,*
2
2
2
13
13
n
yyxx
yxCov
x
CViationoftCoefficien
n
xSEmeanoferrorStadard
AMisxandnsobservatioofnumbertotaltheisNwhere
datagroupedfor
N
xxf
dataungroupedfor
N
xx
SD
QQ
QQ
QDdeviationQuartile
datagroupedfor
n
xxf
xMDdeviationMean
ii
ii
i
ii






















• CORRELATION ANALYSIS
• Correlation: It is a statistical measurement to find out of the
relationship (linear) between two variables.
• Pearson's Correlation Coefficient, r: It is a statistic or parameter
which measures the strength and direction of a relationship
between two variables.
49
The value of r denotes the strength of the
association as illustrated by the following diagram.
-1 10-0.25-0.75 0.750.25
strong strongintermediate intermediateweak weak
no relation
perfect
correlation
perfect
correlation
Directindirect
11  r
50
51
Calculation
  
       
.,,
22
2
2
2
2
22
yyYandxxXwhere
YX
XY
n
y
y
n
x
x
n
yx
yx
yyxx
yyxx
r
ii
i
i
i
i
ii
ii
ii
ii
























 


  
 

rij is the simple or zero order correlation coefficient,
In partial correlation coefficient, rij.k is called the
first order correlation coefficient, and so on
Probable error (PE) of r
• It helps in determining the accuracy and
reliability of the value of r.
• Upper limit of r= r + PE, lower limit of r=r - PE
• If r*<r, then r is significant, otherwise it is
insignificant.
52
N
r
rrPE
2
* 1
6745.0


• Significance test of r
53
n.correlatiotsignificanaisthere;0:
ncorrelatiotsignificannoisthere;0:
0
0




H
H
Formula for the t-test for r (Table t-value)
.,
)2(,
1
2
2
rejectedisHthenttIf
ndfwith
r
n
rt
otabulatedcalculated 




Partial Correlation
• It is a measure of association between two
variables, while controlling the effect of one or
more additional variables.
• Partial correlation coefficient: The correlation
coefficient of x1 and x2 holding the effect of x3
constant,
54
..
)1)(1( 2
23
2
13
231312
3.12 ncorrelatioorderfirstaisIt
rr
rrr
r



Multiple correlation coefficient
• It shows the correlation among more than
two variables and it is denoted by R.
• Suppose that there are 3 variables x1
(dependent), and x2, x3 (independent) then
the multiple correlation coefficient is R1.23
and it is determined by
55
.
1
2
2
23
132312
2
13
2
12
23.1
2
r
rrrrr
R



Coefficient of determination, R2
Examples
• It measures the proportion of the variation in
Y explained by X.
• It ranges from 0 to 1.0 (or, 0% to 100%)
• R2 is actually equal to r2 for simple regression
model.
56
.
varianceTotal
variancedUnexplaine
R-1iondeterminat-nonoftCoefficien
.
varianceTotal
varianceExplained
RiondeterminatoftCoefficien
22
2


k
REGRESSION ANALYSIS
• Regression: It is a statistical process for estimating the
relationships among dependent variable (DV) and independent
variables (IVs).
• It can be used to infer causal relationships between the DV and IVs.
• Regression line: The best fit line.
57
Types
• Linear Regression
• Logistic Regression
• Polynomial Regression
• Stepwise Regression
• Ridge Regression
• Lasso Regression
• ElasticNet Regression
58
Linear regression and its types
• Linear regression is a common Statistical
data analysis technique. It is used to
determine the extent to which there is
a linear relationship between a DV and one or
more IVs.
• Types:
i. Simple linear regression (one IV), and
ii. Multiple linear regression (more IVs).
59
Simple linear regression
• It allows us to summarize and study relationships
between two continuous (quantitative) variables.
• A regression equation takes the form of
y=a+bx+c, a is the intercept of the line, b is the
coefficient, and c is a value called the regression
residual (mean 0).
• Y: is referred to as DV, response variable or
predicted variable.
• X: is referred to as IV, explanatory variable,
factor, carrier, covariate, regressor or predictor
variable.
60
• Calculation, example
61
  
   
xy
yyYxx
XSS
XYSP
xxn
yxyxn
xx
yyxx
xy
n
x
n
n
n
y
orxy
lineregisxy
ii
ii
iiii
i
ii







ˆˆThen
.,X
)(
)(ˆ
ˆˆ
,0,,
.
222












 
 


 
• The estimates are called the least
square estimates of because they are
the solution to the least squares method.
• The filleted line is called least squares
regression line.
62
 ˆˆ and
 and
Multiple regression
In some cases DV is influenced by some IVS. The method of
estimating the rate of average change in the value of two or
more IVs is known as multiple regression.
63
.0
.
).......,3,2,1(,intercept
..........
1
0
22110
xandybetweeniprelationshlinearnoistherethatmeans
termerrorrandomisand
xofregressionpartialoftscoefficientheare
kjthebewhere
xxxy
i
i
j
ikk







Interpreting parameter values (model coefficients)
• “Intercept, ” - value of y when all predictors
are 0.
• - describes the expected change in y per unit
increment in xj when all other predictors in the
model are held at a constant value.
64
0
j
Estimating model parameters of multiple regression
Assuming a random sample of n observations (yi, xi1,xi2,...,xik), i=1,2,...,n. The
estimates of the parameters for the best predicting equation:
65



 



 





n
i
kik
n
i
iik
n
i
ik
n
i
iikik
n
i
kiki
n
i
i
n
i
i
n
i
iii
n
i
kik
n
i
i
n
i
i
n
i
n
i
ikkiiiii
k
ikkiii
xxxxyxx
xxxxyxx
xxny
xxxyyySSE
xxxy
1
2
1
110
11
1
1
1
1
2
10
1
1
1
11
11
110
1
1 1
2
22110
2
10
22110
ˆˆˆ
ˆˆˆ
ˆˆˆ1
estimates.parameterfor the
equationsobtain thetounknowns1+kinequations1+kofsystemthisSolve0.oequation t
eachequateandk,,…1,0,respect toithfunction wSSEtheofsderivativepartialtheTake
)()ˆ(expressiontheminimizewhich
ˆ,,ˆ,ˆvaluesthechoosingbyfoundis
ˆˆˆˆˆ













Multicollinearity
• The predictors (x1, x2, ... xk) are statistically highly correlated.
• It leads to
– Numerical instability in the estimates of the regression parameters
– No longer have simple interpretations for the regression coefficients in the additive model.
• Ways to detect multicollinearity
– Scatterplots of the predictor variables.
– Correlation matrix for the predictor variables – the higher these correlations the worse the
problem.
– Variance Inflation Factors (VIFs) reported by software packages. Values larger than 10 usually
signal a substantial amount of collinearity.
• What can be done
– Regression estimates are still OK, but the resulting confidence/prediction intervals are very
wide.
– Choose explanatory variables wisely! (E.g. consider omitting one of two highly correlated
variables.)
– More advanced solutions: principal components analysis; ridge regression.
66
Stepwise regression
• It is an automated tool used in the exploratory
stages of model building to identify a useful
subset of predictors. The process systematically
adds the most significant variable or removes
the least significant variable during each step.
67
68

More Related Content

What's hot

Statistics - Presentation
Statistics - PresentationStatistics - Presentation
Statistics - PresentationROCIO YUSTE
 
Day2 session i&amp;ii - spss
Day2 session i&amp;ii - spssDay2 session i&amp;ii - spss
Day2 session i&amp;ii - spssabir hossain
 
Statistics in Physical Education
Statistics in Physical EducationStatistics in Physical Education
Statistics in Physical Educationdryadav1300
 
Data array and frequency distribution
Data array and frequency distributionData array and frequency distribution
Data array and frequency distributionraboz
 
analytical representation of data
 analytical representation of data analytical representation of data
analytical representation of dataUnsa Shakir
 
descriptive and inferential statistics
descriptive and inferential statisticsdescriptive and inferential statistics
descriptive and inferential statisticsMona Sajid
 
Presentation of data
Presentation of dataPresentation of data
Presentation of dataRuby Ocenar
 
Source of DATA
Source of DATASource of DATA
Source of DATANahid Amin
 
Statistics Math project class 10th
Statistics Math project class 10thStatistics Math project class 10th
Statistics Math project class 10thRiya Singh
 
Numerical and statistical methods new
Numerical and statistical methods newNumerical and statistical methods new
Numerical and statistical methods newAabha Tiwari
 
Graphical presentation of data
Graphical presentation of dataGraphical presentation of data
Graphical presentation of dataprince irfan
 
Data presentation
Data presentationData presentation
Data presentationMaiBabes17
 
Basic Stat Notes
Basic Stat NotesBasic Stat Notes
Basic Stat Notesroopcool
 

What's hot (20)

Statistics - Presentation
Statistics - PresentationStatistics - Presentation
Statistics - Presentation
 
Day2 session i&amp;ii - spss
Day2 session i&amp;ii - spssDay2 session i&amp;ii - spss
Day2 session i&amp;ii - spss
 
Statistics in Physical Education
Statistics in Physical EducationStatistics in Physical Education
Statistics in Physical Education
 
Data array and frequency distribution
Data array and frequency distributionData array and frequency distribution
Data array and frequency distribution
 
analytical representation of data
 analytical representation of data analytical representation of data
analytical representation of data
 
Tabular and Graphical Representation of Data
Tabular and Graphical Representation of Data Tabular and Graphical Representation of Data
Tabular and Graphical Representation of Data
 
descriptive and inferential statistics
descriptive and inferential statisticsdescriptive and inferential statistics
descriptive and inferential statistics
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Biostatistics Graphical for grouped data
Biostatistics Graphical for grouped dataBiostatistics Graphical for grouped data
Biostatistics Graphical for grouped data
 
Presentation of data
Presentation of dataPresentation of data
Presentation of data
 
Source of DATA
Source of DATASource of DATA
Source of DATA
 
Data organization
Data organizationData organization
Data organization
 
Statistics Math project class 10th
Statistics Math project class 10thStatistics Math project class 10th
Statistics Math project class 10th
 
Numerical and statistical methods new
Numerical and statistical methods newNumerical and statistical methods new
Numerical and statistical methods new
 
Graphical presentation of data
Graphical presentation of dataGraphical presentation of data
Graphical presentation of data
 
Statistics
StatisticsStatistics
Statistics
 
Statstics in nursing
Statstics in nursing Statstics in nursing
Statstics in nursing
 
Data presentation
Data presentationData presentation
Data presentation
 
Displaying data
Displaying dataDisplaying data
Displaying data
 
Basic Stat Notes
Basic Stat NotesBasic Stat Notes
Basic Stat Notes
 

Similar to An Introduction to Statistics

Biostatistics_descriptive stats.pptx
Biostatistics_descriptive stats.pptxBiostatistics_descriptive stats.pptx
Biostatistics_descriptive stats.pptxMohammedAbdela7
 
STATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdf
STATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdfSTATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdf
STATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdfMariaCatherineErfeLa
 
2. AAdata presentation edited edited tutor srudents(1).pptx
2. AAdata presentation edited edited tutor srudents(1).pptx2. AAdata presentation edited edited tutor srudents(1).pptx
2. AAdata presentation edited edited tutor srudents(1).pptxssuser504dda
 
Basics of statistics by Arup Nama Das
Basics of statistics by Arup Nama DasBasics of statistics by Arup Nama Das
Basics of statistics by Arup Nama DasArup8
 
2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptx2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptxssuser03ba7c
 
Chapter 4 MMW.pdf
Chapter 4 MMW.pdfChapter 4 MMW.pdf
Chapter 4 MMW.pdfRaRaRamirez
 
Basic biostatistics dr.eezn
Basic biostatistics dr.eeznBasic biostatistics dr.eezn
Basic biostatistics dr.eeznEhealthMoHS
 
statistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfstatistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfkobra22
 
Classification and tabulation of data
Classification and tabulation of dataClassification and tabulation of data
Classification and tabulation of dataJagdish Powar
 
Biostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxBiostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxSailajaReddyGunnam
 
Class1.ppt
Class1.pptClass1.ppt
Class1.pptGautam G
 
Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1RajnishSingh367990
 

Similar to An Introduction to Statistics (20)

Biostatistics_descriptive stats.pptx
Biostatistics_descriptive stats.pptxBiostatistics_descriptive stats.pptx
Biostatistics_descriptive stats.pptx
 
Biostatistics
Biostatistics Biostatistics
Biostatistics
 
STATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdf
STATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdfSTATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdf
STATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdf
 
2. AAdata presentation edited edited tutor srudents(1).pptx
2. AAdata presentation edited edited tutor srudents(1).pptx2. AAdata presentation edited edited tutor srudents(1).pptx
2. AAdata presentation edited edited tutor srudents(1).pptx
 
Basics of statistics by Arup Nama Das
Basics of statistics by Arup Nama DasBasics of statistics by Arup Nama Das
Basics of statistics by Arup Nama Das
 
2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptx2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptx
 
Descriptive
DescriptiveDescriptive
Descriptive
 
Chapter 4 MMW.pdf
Chapter 4 MMW.pdfChapter 4 MMW.pdf
Chapter 4 MMW.pdf
 
Basic biostatistics dr.eezn
Basic biostatistics dr.eeznBasic biostatistics dr.eezn
Basic biostatistics dr.eezn
 
Statistics
StatisticsStatistics
Statistics
 
Unit 1 - Statistics (Part 1).pptx
Unit 1 - Statistics (Part 1).pptxUnit 1 - Statistics (Part 1).pptx
Unit 1 - Statistics (Part 1).pptx
 
statistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfstatistics - Populations and Samples.pdf
statistics - Populations and Samples.pdf
 
Classification and tabulation of data
Classification and tabulation of dataClassification and tabulation of data
Classification and tabulation of data
 
Statistics
StatisticsStatistics
Statistics
 
Basic statistics
Basic statisticsBasic statistics
Basic statistics
 
Biostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxBiostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptx
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1
 

Recently uploaded

While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 

Recently uploaded (20)

While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 

An Introduction to Statistics

  • 1. ‫الرحيم‬ ‫الرحمن‬ ‫هللا‬ ‫بسم‬ An Introduction to Statistical Tools and SPSS used in Social Research Presented by Professor Dr. Md. Nazrul Islam Mondal Visiting Scholar ERASMUS+ Fellowship Program Department of Sociology Middle East Technical University Ankara, TURKEY E-mail: nazrulupm@gmail.com
  • 2. Outline i. Data presentation ii. Central tendency iii. Skewness and kurtosis iv. Measures of dispersion v. Correlation vi. Regression 2
  • 3. DATA PRESENTATION • Statistics: Statistics is a branch of scientific methodology. Data: Data is a collection of facts or information from which conclusions may be drawn. • Stages of statistical investigation:  Collection,  Organization,  Presentation,  Analysis, and  Interpretation. 3
  • 4. • Population and sample Essential purpose of statistics: i. describe about the numerical properties of populations, and ii. draw inferences about the population from the samples. • Population: It is the entire category under consideration. Its size is usually denoted by N. • Sample: A sample is a representative part of the population. It is a subset or portion of the population. Its size is denoted by n. 4
  • 5. • Parameter: It is a characteristic or measure obtained from a population. • Statistic: It is characteristic or measure obtained from a sample. • Descriptive statistics: They provide simple summaries about the sample and the measures. • Statistical inference: To draw conclusions about population parameters from sample. It performs: i. hypothesis testing; ii. determine relationships between variables, and iii. makes predictions. 5
  • 6. • Variables: The measureable characteristics are called variables. • Types: i. Qualitative, and ii. Quantitative • Random Variable: A variable whose values are determined by chance. • Constant: A constant is a particular type of variable, which does not vary from one member of a group to another. 6
  • 7. • Discrete variables: Usually it is obtained by counting. Integers. • Examples: number of children, number of students in METU, etc. • Continuous variables: Usually it is obtained by measurement. Real numbers. • Examples: height, weight, etc. • Data types (collection ways) – Primary data: Primary data come mainly from direct field operations. – Secondary data: Secondary data are usually obtained from already published or unpublished documents. 7
  • 8. • Types of data i. categorical, and ii. numerical. • Categorical data types: i. nominal (gender, religion, room numbers, etc.), ii. ordinal (education level, book chapters, health status, etc.) • Numerical data types: i. interval scale (age groups, dates, etc.) ii. ii. ratio scale (BMI, CWR, etc.) 8
  • 9. • Frequency: The number of times a certain value or class of values occurs. • Frequency distribution: A tabular arrangement of data by classes together with the corresponding class frequencies is called frequency distribution. 9
  • 10. • Data types: i. Ungrouped data (without frequency), Ii. Grouped data (with frequency). • Data presentation by tables, graphs – frequency distributions, – bar diagram, – histograms, – multiple bar diagram, – Box plot, – Frequency polygon, – other graphs. 10
  • 11. Examples 11 Activity Number of Students Play Sports 45 Talk on Phone 53 Visit With Friends 99 Earn Money 44 Chat Online 66 School Clubs 22 Watch TV 37 n=366 The students in the Department of Sociology, METU were involved the following activities:
  • 12. 45 53 99 44 66 22 37 0 20 40 60 80 100 120 Play Sports Talk on Phone Visit With Friends Earn Money Chat Online School Clubs Watch TV Series1 12 Bar diagram Note: Bar diagram can be presented vertically or horizontally to show comparisons among categories.
  • 13. Histogram: A graphical representation of how many times different, mutually exclusive events are observed in an experiment. The data represents ages of a group of people. 13 36 25 38 46 55 68 72 55 36 38 67 45 22 48 91 46 52 61 58 55
  • 14. 0 50 100 150 200 250 300 Climbing Caving Walking Sailing 1997 1996 1995 14 1995 1996 1997 Climbing 21 34 30 Caving 10 12 21 Walking 75 85 100 Sailing 36 36 40 Combine bar diagram
  • 15. Multiple bar diagram: Data on several variables in respect of different places or time points may be presented by multiple bar diagram. Grades Years 1st year 2nd year 3rd year 4th year Grade A 5 7 9 10 Grade B 15 18 15 12 Grade C 20 15 10 8 15 0 5 10 15 20 25 1st year 2nd year 3rd year 4th year Grade A Grade B Grade C
  • 16. Pie chart: Different components of data may be exhibited by splitting s circle. The angle at the center of a circle is proportionally divided and accordingly splitting the circle we exhibit different components of the data. The division is also done in percentage according to the relative magnitude of different components. Usually the components are demarked by different colors. Grades 1st year Grade A 5 Grade B 15 Grade C 20 Total 40 16 Grade A 12% Grade B 38% Grade C 50% Social Science
  • 17. Frequency polygon: A frequency polygon gives the idea about the shape of the data distribution. The two end points of a frequency polygon always lie on the x-axis. 17
  • 18. Box plot: The box plot (box and whisker diagram) is the five number summary: minimum, first quartile, median, third quartile, and maximum. 18
  • 19. Scatter plot: A plot of the data values on a coordinate system. A scatter plot (or scatter diagram) is used to show the relationship between two variables. The independent variable is graphed along the x-axis and the dependent variable along the y-axis. 19
  • 20. Cumulative frequency polygon or ogive: A graph showing the cumulative frequency less than any upper class boundary plotted against the upper class boundary is called a cumulative frequency polygon or ogive. 20
  • 21. CENTRAL TENDENCY • Central Tendency: It is a single value that attempts to describe a set of data by identifying the central position within that set of data. • Measures are: –Mean, –Median, –Mode, –Quartiles, –Deciles, and –Percentiles. 21
  • 24. • Median: The median is the middle score for a set of data that has been arranged in order of magnitude. • Note: At first arrange the data according to their magnitudes. .5.2 2 32 2 2 6 is97,3,2,1,0,ofmedianThe:Example ., 2 2 .23 2 15 theis73,2,1,0,ofmedianThe:Example ., 2 1 ,                       termnexttermth numberevenanisnwhen termnexttermth n Me termrdtermth numberoddanisnwhentermth n MeMedian 24
  • 25. • Mode (M0)= Most frequent number in the observation. – Example: 1, 3, 4, 5, 9, 3. Mode=3 • Unimodal: A distribution having only one mode is called unimodal. • Bimodal: A distribution having two modes is called unimodal. Example: 1, 3, 4, 5, 9, 3, 4. Mode=3 and 4 • Mode= mean-3(mean-median) • Midrange: The mean of the highest and lowest values. (Max + Min) / 2. 25
  • 26. • QUANTILES: It divides the total frequency into a number of equal parts. Types: i. Quartiles, ii. Deciles, and iii. Percentiles. • Quartiles: It divides the total frequency into four equal parts. • Types: i. 1st quartile, Q1 , ii. 2nd quartile, Q2, and iii. 3rd quartile, Q3 . • The 2nd quartile is identical with median, • the 1st quartile is the value at or below which one-fourth (25%) of all items in the series, and • the 3rd quartile is the value at or below which three-fourths (75%) of the item lie. 26
  • 27.   .;3,2,1, 2 4 .;3,2,1, 4 1 numberevenanisnwheni termnexttermthi n Q numberoddanisnwhenitermth in Q i i            Quartiles calculation methods (for ungrouped data) At first arrange the data according to their magnitudes. Here n is the number of observations 27
  • 28. Deciles: Deciles divide the total frequency into ten equal parts. There are nine types of deciles: 1st decile , 2nd decile ,…,9th decile .   .;9,....,3,2,1, 2 10 .;9....,3,2,1, 10 1 numberevenanisnwhenj termnexttermthj n D numberoddanisnwhenjtermth jn D j j            Calculation methods 28
  • 29. Percentiles: It divide the total frequency into 100 equal parts. There are 99 types of percentiles: 1st percentile, P1 ; 2nd percentile, P2 ; …,99th percentile, P99 .   .;99,....3,2,1, 2 100 .;99.....,3,2,1, 100 1 numberevenanisnwhenk termnexttermthk n P numberoddanisnwhenktermth kn P k k            Calculation methods 29
  • 30. • Examples: 30             .8.18.)45(48.4 100 6017 .;99.....,3,2,1, 100 1 .8.36.)56(56.5 10 717 .;9....,3,2,1, 10 1 .56 4 317 .;3,2,1, 4 1 number.oddanis7,=nHere 4.-7,6,-2,1,0,5,numbersfollowingtheofpercentile60thiii)anddecile,7thii)quartile,3rdi)Find 60 7 3                   termthtermthtermthtermthtermthP numberoddanisnwhenktermth kn P termthtermthtermthtermthtermthD numberoddanisnwhenjtermth jn D termthtermthQ numberoddanisnwhenitermth in QNow k j i
  • 31. For grouped data:   .,,, nsobservatioofnumbertotaltheNfwhere f xf xmeanSimpleAM i i ii     Arithmetic mean (AM), Again for the class interval grouped data, mean,   ,   i ii f Xf x where Xi be the mid-values of the classes. 31
  • 32. • Geometric mean (GM), • Harmonic mean (HM),   .,........ 1 21 21 numberspositivenadzerononforonlyxxxx nf k ff k  ., ......... 2 2 1 1 numberszerononforonly x f n x f x f x f n x i i k k            32
  • 39. SKEWNESS AND KURTOSIS Moments • The r-th moment of a variable x for ungrouped data, • for grouped data, N x x r ir      ., , nsobservatioofnumbertotaltheisNf N xf x i r iir 39
  • 40. • The r-th moment of a variable x for ungrouped data about the AM, is given by • also for grouped data,   N xx r i r      N xxf r i r    40
  • 41. • Symmetrical curve: A frequency curve is said to be symmetrical if it can be folded along a vertical line at centre so that the two halves of the figures coincide. In a symmetrical distribution, the values of mean, median and mode coincide. 41
  • 42. • Skewness: Skewness means lack of symmetry of a curve. It indicated whether the curve is turned more to one side than to the other. • Measures of skewness • Karl Pearson’s Coefficient of Skewness: 42 . )(3  MexMox Skp    
  • 44. • Measure of skewness based on moments, • if then distribution is positively skewed; • If then distribution is negatively skewed, • If then the distribution is symmetric. 3 3 2 3 2 3 11       01  01  01  44
  • 45. • Kurtosis: the sharpness of the peak of a frequency-distribution curve. • Measures of kurtosis Moment coefficient of kurtosis, 45 2 2 4 2    
  • 47. MEASURES OF DISPERSION Measures of dispersion measure how spread out a set of data. • Types of measures: i. Absolute measures, and ii. Relative measures. • Absolute measures: i. Range, ii. Mean deviation, iii. Quartile deviation, iv. Standard deviation, v. Variance, and vi. Standard error. 47
  • 48. • Calculation methods, examples: 48            data.ofsetsfor twomeasureaisIt.,Covariance* .100var* nobservatioofnumbertheisn,)(* var(x)=Variance* . ,, ,,* =IQRrange,quartileInter*, 2 * ,* 2 2 2 13 13 n yyxx yxCov x CViationoftCoefficien n xSEmeanoferrorStadard AMisxandnsobservatioofnumbertotaltheisNwhere datagroupedfor N xxf dataungroupedfor N xx SD QQ QQ QDdeviationQuartile datagroupedfor n xxf xMDdeviationMean ii ii i ii                      
  • 49. • CORRELATION ANALYSIS • Correlation: It is a statistical measurement to find out of the relationship (linear) between two variables. • Pearson's Correlation Coefficient, r: It is a statistic or parameter which measures the strength and direction of a relationship between two variables. 49
  • 50. The value of r denotes the strength of the association as illustrated by the following diagram. -1 10-0.25-0.75 0.750.25 strong strongintermediate intermediateweak weak no relation perfect correlation perfect correlation Directindirect 11  r 50
  • 51. 51 Calculation            .,, 22 2 2 2 2 22 yyYandxxXwhere YX XY n y y n x x n yx yx yyxx yyxx r ii i i i i ii ii ii ii                                   rij is the simple or zero order correlation coefficient, In partial correlation coefficient, rij.k is called the first order correlation coefficient, and so on
  • 52. Probable error (PE) of r • It helps in determining the accuracy and reliability of the value of r. • Upper limit of r= r + PE, lower limit of r=r - PE • If r*<r, then r is significant, otherwise it is insignificant. 52 N r rrPE 2 * 1 6745.0  
  • 53. • Significance test of r 53 n.correlatiotsignificanaisthere;0: ncorrelatiotsignificannoisthere;0: 0 0     H H Formula for the t-test for r (Table t-value) ., )2(, 1 2 2 rejectedisHthenttIf ndfwith r n rt otabulatedcalculated     
  • 54. Partial Correlation • It is a measure of association between two variables, while controlling the effect of one or more additional variables. • Partial correlation coefficient: The correlation coefficient of x1 and x2 holding the effect of x3 constant, 54 .. )1)(1( 2 23 2 13 231312 3.12 ncorrelatioorderfirstaisIt rr rrr r   
  • 55. Multiple correlation coefficient • It shows the correlation among more than two variables and it is denoted by R. • Suppose that there are 3 variables x1 (dependent), and x2, x3 (independent) then the multiple correlation coefficient is R1.23 and it is determined by 55 . 1 2 2 23 132312 2 13 2 12 23.1 2 r rrrrr R   
  • 56. Coefficient of determination, R2 Examples • It measures the proportion of the variation in Y explained by X. • It ranges from 0 to 1.0 (or, 0% to 100%) • R2 is actually equal to r2 for simple regression model. 56 . varianceTotal variancedUnexplaine R-1iondeterminat-nonoftCoefficien . varianceTotal varianceExplained RiondeterminatoftCoefficien 22 2   k
  • 57. REGRESSION ANALYSIS • Regression: It is a statistical process for estimating the relationships among dependent variable (DV) and independent variables (IVs). • It can be used to infer causal relationships between the DV and IVs. • Regression line: The best fit line. 57
  • 58. Types • Linear Regression • Logistic Regression • Polynomial Regression • Stepwise Regression • Ridge Regression • Lasso Regression • ElasticNet Regression 58
  • 59. Linear regression and its types • Linear regression is a common Statistical data analysis technique. It is used to determine the extent to which there is a linear relationship between a DV and one or more IVs. • Types: i. Simple linear regression (one IV), and ii. Multiple linear regression (more IVs). 59
  • 60. Simple linear regression • It allows us to summarize and study relationships between two continuous (quantitative) variables. • A regression equation takes the form of y=a+bx+c, a is the intercept of the line, b is the coefficient, and c is a value called the regression residual (mean 0). • Y: is referred to as DV, response variable or predicted variable. • X: is referred to as IV, explanatory variable, factor, carrier, covariate, regressor or predictor variable. 60
  • 61. • Calculation, example 61        xy yyYxx XSS XYSP xxn yxyxn xx yyxx xy n x n n n y orxy lineregisxy ii ii iiii i ii        ˆˆThen .,X )( )(ˆ ˆˆ ,0,, . 222                    
  • 62. • The estimates are called the least square estimates of because they are the solution to the least squares method. • The filleted line is called least squares regression line. 62  ˆˆ and  and
  • 63. Multiple regression In some cases DV is influenced by some IVS. The method of estimating the rate of average change in the value of two or more IVs is known as multiple regression. 63 .0 . ).......,3,2,1(,intercept .......... 1 0 22110 xandybetweeniprelationshlinearnoistherethatmeans termerrorrandomisand xofregressionpartialoftscoefficientheare kjthebewhere xxxy i i j ikk       
  • 64. Interpreting parameter values (model coefficients) • “Intercept, ” - value of y when all predictors are 0. • - describes the expected change in y per unit increment in xj when all other predictors in the model are held at a constant value. 64 0 j
  • 65. Estimating model parameters of multiple regression Assuming a random sample of n observations (yi, xi1,xi2,...,xik), i=1,2,...,n. The estimates of the parameters for the best predicting equation: 65                n i kik n i iik n i ik n i iikik n i kiki n i i n i i n i iii n i kik n i i n i i n i n i ikkiiiii k ikkiii xxxxyxx xxxxyxx xxny xxxyyySSE xxxy 1 2 1 110 11 1 1 1 1 2 10 1 1 1 11 11 110 1 1 1 2 22110 2 10 22110 ˆˆˆ ˆˆˆ ˆˆˆ1 estimates.parameterfor the equationsobtain thetounknowns1+kinequations1+kofsystemthisSolve0.oequation t eachequateandk,,…1,0,respect toithfunction wSSEtheofsderivativepartialtheTake )()ˆ(expressiontheminimizewhich ˆ,,ˆ,ˆvaluesthechoosingbyfoundis ˆˆˆˆˆ             
  • 66. Multicollinearity • The predictors (x1, x2, ... xk) are statistically highly correlated. • It leads to – Numerical instability in the estimates of the regression parameters – No longer have simple interpretations for the regression coefficients in the additive model. • Ways to detect multicollinearity – Scatterplots of the predictor variables. – Correlation matrix for the predictor variables – the higher these correlations the worse the problem. – Variance Inflation Factors (VIFs) reported by software packages. Values larger than 10 usually signal a substantial amount of collinearity. • What can be done – Regression estimates are still OK, but the resulting confidence/prediction intervals are very wide. – Choose explanatory variables wisely! (E.g. consider omitting one of two highly correlated variables.) – More advanced solutions: principal components analysis; ridge regression. 66
  • 67. Stepwise regression • It is an automated tool used in the exploratory stages of model building to identify a useful subset of predictors. The process systematically adds the most significant variable or removes the least significant variable during each step. 67
  • 68. 68