SlideShare ist ein Scribd-Unternehmen logo
1 von 83
INTRODUCTION TO
STATISTICS
Definition of Statistics
◉ The term statistics refers to a set of mathematical procedures
for organizing, summarizing, and interpreting information.
◉ Statistical procedures help ensure that the information or
observations are presented and interpreted in an accurate and
informative way. In somewhat grandiose terms, statistics help
researchers bring order out of chaos. In addition, statistics
provide researchers with a set of standardized techniques that
are recognized and understood throughout the scientific
community.
2
Population and Sample
◉ A population is the set of all the
individuals of interest in a
particular study.
◉ As you can well imagine, a
population can be quite large,
for example, the entire set of
women on the planet Earth. A
researcher might be more
specific, limiting the population
for study to women who are
registered voters in the United
States.
◉ A sample is a set of individuals
selected from a population, usually
intended to represent the
population in a research study.
◉ Just as we saw with populations,
samples can vary in size. For
example, one study might examine
a sample of only 10 students in a
graduate program and another
study might use a sample of more
than 10,000 people who take a
specific cholesterol medication.
3
4
Variable and Data
◉ A variable is a characteristic or
condition that changes or has
different values for different
individuals.
◉ Once again, variables can be
characteristics that differ from one
individual to another, such as
height, weight, gender, or
personality. Also, variables can be
environmental conditions that
change such as temperature, time of
day, or the size of the room in which
the research is being conducted.
◉ Data (plural) are measurements or
observations.
◉ A data set is a collection of
measurements or observations.
◉ A datum (singular) is a single
measurement or observation and is
commonly called a score or raw
score.
5
Parameters and Statistics
◉ A parameter is a value, usually a
numerical value, that describes a
population.
◉ A parameter is usually derived
from measurements of the
individuals in the population.
◉ For example, we want to
know the average length of a
butterfly. This is a parameter
because it is states something
about the entire population of
butterflies.
◉ A statistic is a value, usually a
numerical value, that describes a
sample.
◉ A statistic is usually derived from
measurements of the individuals in
the sample.
◉ For example, the parameter may
be the average height of 25-year-
old men in North America. The
height of the members of a sample
of 100 such men are measured; the
average of those 100 numbers is a
statistic.
6
Descriptive and Inferential Statistical Methods
◉ Descriptive statistics are statistical
procedures used to summarize,
organize, and simplify data.
◉ Descriptive statistics are techniques
that take raw scores and organize
or summarize them in a form that is
more manageable. Often the scores
are organized in a table or a graph
so that it is possible to see the
entire set of scores. Another
common technique is to summarize
a set of scores by computing an
average.
◉ Inferential statistics consist of
techniques that allow us to study
samples and then make
generalizations about the
populations from which they were
selected.
◉ Because populations are typically
very large, it usually is not possible
to measure everyone in the
population. Therefore, a sample is
selected to represent the
population.
7
◉ Sampling error is the naturally occurring discrepancy, or error, that
exists between a sample statistic and the corresponding population
parameter.
8
Constructs and Operational Definitions
◉ Constructs are internal attributes
or characteristics that cannot be
directly observed but are useful
for describing and explaining
behaviour.
◉ Constructs exist at a higher
level of abstraction than
concepts. Justice, Beauty,
Happiness, and Health are all
constructs.
◉ An operational definition identifies
a measurement procedure (a set of
operations) for measuring an
external behaviour and uses the
resulting measurements as a
definition and a measurement of a
hypothetical construct.
◉ Note that an operational definition
has two components. First, it
describes a set of operations for
measuring a construct. Second, it
defines the construct in terms of the
resulting measurements.
9
Discrete and Continuous Variable
◉ A discrete variable consists of separate,
indivisible categories. No values can
exist between two neighbouring
categories.
◉ Discrete variables are commonly
restricted to whole, countable
numbers—for example, the number of
children in a family or the number of
students attending class. A discrete
variable may also consist of
observations that differ qualitatively. For
example, people can be classified by
gender (male or female), by occupation
(nurse, teacher, lawyer, etc.)
◉ For a continuous variable, there are an
infinite number of possible values that
fall between any two observed values. A
continuous variable is divisible into an
infinite number of fractional parts.
◉ For example, two people who both
claim to weigh 150 pounds are probably
not exactly the same weight. However,
they are both around 150 pounds. One
person may actually weigh 149.6 and
the other 150.3. Thus, a score of 150 is
not a specific point on the scale but
instead is an interval
10
Scale of Measurement
11
Nominal Scale and Ordinal Scale
◉ A nominal scale consists of a set
of categories that have different
names.
◉ Measurements on a nominal scale
label and categorize
observations, but do not make
any quantitative distinctions
between observations. The rooms
or offices in a building may be
identified by numbers.
◉ An ordinal scale consists of a set of
categories that are organized in an
ordered sequence. Measurements
on an ordinal scale rank
observations in terms of size or
magnitude.
◉ Often, an ordinal scale consists of a
series of ranks (first, second, third,
and so on) like the order of finish in
a horse race. Occasionally, the
categories are identified by verbal
labels like small, medium, and large
drink sizes at a fast-food restaurant.
12
Interval Scale and Ratio Scale
◉ An interval scale consists of
ordered categories that are all
intervals of exactly the same size.
◉ Equal differences between
numbers on scale reflect equal
differences in magnitude.
However, the zero point on an
interval scale is arbitrary and
does not indicate a zero amount
of the variable being measured.
◉ A ratio scale is an interval scale with
the additional feature of an absolute
zero point. With a ratio scale, ratios
of numbers do reflect ratios of
magnitude.
◉ For example, you know that a
measurement of 80° Fahrenheit is
higher than a measure of 60°, and
you know that it is exactly 20°
higher.
13
Shape of Frequency Distribution
◉ In a symmetrical distribution, it is possible to draw a vertical line
through the middle so that one side of the distribution is a mirror image
of the other.
◉ In a skewed distribution, the scores tend to pile up toward one end of
the scale and taper off gradually at the other end.
◉ The section where the scores taper off toward one end of a distribution is
called the tail of the distribution.
◉ A skewed distribution with the tail on the right-hand side is positively
skewed because the tail points toward the positive (above-zero) end of
the X-axis. If the tail points to the left, the distribution is negatively
skewed.
14
15
16
Introduction to Measures of Central Tendency
Mean
The mean for a
distribution is the sum of
the scores divided by the
number of scores
Median
If the scores in a distribution
are listed in order from
smallest to largest, the
median is the midpoint of
the list. More specifically, the
median is the point on the
measurement scale below
which 50% of the scores in
the distribution are located.
Mode
If the scores in a distribution
are listed in order from
smallest to largest, the median
is the midpoint of the list. More
specifically, the median is the
point on the measurement
scale below which 50% of the
scores in the distribution are
located.
17
Exercise related to Central Tendency
18
Introduction to Variability
◉ Variability provides a quantitative measure of the differences between
scores in a distribution and describes the degree to which the scores are
spread out or clustered together.
◉ The range, is the distance covered by the scores in a distribution, from
the smallest score to the largest score.
◉ Deviation is distance from the mean:
Deviation score = X - μ
19
◉ SS, or sum of squares, is the sum of the squared deviation scores.
◉ Variance equals the mean of the squared deviations. Variance is the
average.
◉ Standard deviation is the square root of the variance and provides a
measure of the standard, or average distance from the mean.
20
21
Exercise related to Variability
22
Introduction to z-score
◉ The z-score definition is adequate for transforming back and forth from X
values to z-scores as long as the arithmetic is easy to do in your head.
◉ Z-scores are often used in academic settings to analyze how well a
student's score compares to the mean score on a given exam. For example,
suppose the scores on a certain college entrance exam are roughly normally
distributed with a mean of 82 and a standard deviation of 5.
◉ For more complicated values, it is best to have an equation to help structure
the calculations. Fortunately, the relationship between X values and z-scores is
easily expressed in a formula. The formula for transforming scores into z-
scores is
23
◉ The numerator of the equation, X – μ, is a deviation score.
◉ It measures the distance in points between X and μ and
indicates whether X is located above or below the mean.
◉ The deviation score is then divided by σ because we want the z-
score to measure distance in terms of standard deviation units.
◉ The formula performs exactly the same arithmetic that is used
with the z-score definition, and it provides a structured equation
to organize the calculations when the numbers are more
difficult.
24
25
Exercise related to z-score
26
Introduction to Hypothesis Testing
◉ A hypothesis test is a statistical method that uses sample data to
evaluate a hypothesis about a population.
◉ The Four Steps of a Hypothesis Test
STEP 1
State the hypothesis. As the name implies, the process of hypothesis
testing begins by stating a hypothesis about the unknown
population. Actually, we state two opposing hypotheses. Notice that
both hypotheses are stated in terms of population parameters.
27
◉ The first and most important of the two hypotheses is called the
null hypothesis. The null hypothesis states that the treatment
has no effect. Thenull hypothesis is identified by the symbol H0.
The null hypothesis (H0) states that in the general population there
is no change, no difference, or no relationship. In the context of an
experiment, H0 predicts that the independent variable (treatment)
has no effect on the dependent variable (scores) for the population.
28
◉ The second hypothesis is simply the opposite of the null
hypothesis, and it is called the scientific, or alternative,
hypothesis (H1)
The alternative hypothesis (H1) states that there is a change, a
difference, or a relationship for the general population. In the
context of an experiment, H1 predicts that the independent
variable (treatment) does have an effect on the dependent
variable.
29
30
31
32
Type I & Type II Error
33
◉ A Type I error occurs when
a researcher rejects a null
hypothesis that is actually
true. In a typical research
situation, a Type I error
means the researcher
concludes that a treatment
does have an effect when in
fact it has no effect.
◉ A Type II error occurs when
a researcher fails to reject a
null hypothesis that is really
false. In a typical research
situation, a Type II error
means that the hypothesis
test has failed to detect a
real treatment effect.
34
STEP 2
Set the criteria for a decision. Eventually the researcher will use the data from
the sample to evaluate the credibility of the null hypothesis. The data will either
provide support for the null hypothesis or tend to refute the null hypothesis.
The Alpha Level To find the boundaries that separate the high-probability
samples from the low-probability samples, we must define exactly what is meant
by “low” probability and “high” probability. This is accomplished by selecting a
specific probability value, which is known as the level of significance, or the alpha
level, for the hypothesis test. The alpha (α) value is a small probability that is
used to identify the low-probability samples. By convention, commonly used
alpha levels are α = .05 (5%), α = .01 (1%), and α = .001 (0.1%).
35
◉ The extremely unlikely values, as defined by the alpha level,
make up what is called the critical region.
◉ The alpha level, or the level of significance, is a probability
value that is used to define the concept of “very unlikely” in a
hypothesis test.
◉ The critical region is composed of the extreme sample values
that are very unlikely (as defined by the alpha level) to be
obtained if the null hypothesis is true. The boundaries for the
critical region are determined by the alpha level. If sample data
fall in the critical region, the null hypothesis is rejected.
36
◉ The Boundaries for the Critical Region To determine the exact
location for the boundaries that define the critical region, we use the
alpha-level probability and the unit.
◉ In most cases, the distribution of sample means is normal, and the
unit normal table provides the precise z-score location for the critical
region boundaries.
37
◉ Degrees of freedom describe the number of scores in a sample
that are independent and free to vary. Because the sample
mean places a restriction on the value of one score in the
sample, there are n – 1 degrees of freedom for a sample with n
scores
◉ For a sample of n scores, the degrees of freedom, or df, for the
sample variance are defined as df = n - 1. The degrees of
freedom determine the number of scores in the sample that are
independent and free to vary.
38
The Unit Table
◉ The graph shows proportions for only a few selected z-score values. A more
complete listing of z-scores and proportions is provided in the unit normal table.
◉ This table lists proportions of the normal distribution for a full range of possible z-
score values.
39
A normal distribution following a z-score
transformation
40
41
STEP 3
Collect data and compute sample statistics
◉ The data are as given, so all that remains is to compute the statistic.
42
STEP 4
Make a decision
◉ The sample data are located in the critical region. By definition, a
sample value in the critical region is very unlikely to occur if the null
hypothesis is true. Therefore, we conclude that the sample is not
consistent with H0 and our decision is to reject the null hypothesis.
Remember, the null hypothesis states that there is no treatment
effect, so rejecting H0 means we are concluding that the treatment
did have an effect.
43
Introduction to the t Statistic
◉ The t statistic is used to test hypotheses about an unknown
population mean, μ, when the value of σ is unknown. The
formula for the t statistic has the same structure as the z-
score formula, except that the t statistic uses the estimated
standard error in the denominator.
44
◉ The estimated standard error (SM) is used as an estimate of the
real standard error σM when the value of σ is unknown. It is
computed from the sample variance or sample standard
deviation and provides an estimate of the standard distance
between a sample mean M and the population mean μ.
45
Exercise related to t test
46
47
The t Test for Two Independent Samples
◉ A research design that uses a separate group of participants
for each treatment condition (or for each population) is
called an independent-measures research design or a
between-subjects design.
48
◉ The Estimated Standard Error In each of the t-score formulas, the standard error
in the denominator measures how accurately the sample statistic represents the
population parameter. In the single-sample t formula, the standard error measures
the amount of error expected for a sample mean and is represented by the symbol
SM. For the independent measures t formula, the standard error measures the
amount of error that is expected when you use a sample mean difference (M1 − M2)
to represent a population mean difference (μ1 − μ2). The standard error for the
sample mean difference is represented by the symbol S(M1 - M2)
49
◉ Pooled Variance
◉ One method for correcting the bias in the standard error is to combine the two
sample variances into a single value called the pooled variance. The pooled
variance is obtained by averaging or “pooling” the two sample variances using
a procedure that allows the bigger sample to carry more weight in determining
the final value.
◉ For the independent-measures t statistic, there are two SS values and two df
values (one from each sample). The values from the two samples are combined
to compute what is called the pooled variance.
50
51
52
53
54
Exercise related to independent sample t test
55
56
Introduction to Repeated-Measures Designs
◉ A repeated-measures design, or a within-subject design, is one in
which the dependent variable is measured two or more times for each
individual in a single sample. The same group of subjects is used in all of
the treatment conditions.
◉ In a repeated-measures design or a matched-subjects design comparing
two treatment conditions, the data consist of two sets of scores, which
are grouped into sets of two, corresponding to the two scores obtained
for each individual or each matched pair of subjects.
57
58
Exercise related to dependent sample t test
59
60
Analysis of Variance ANOVA
◉ Analysis of variance (ANOVA) is a hypothesis-testing procedure that is used to
evaluate mean differences between two or more treatments (or populations).
◉ As with all inferential procedures, ANOVA uses sample data as the basis for drawing
general conclusions about populations.
◉ It may appear that ANOVA and t tests are simply two different ways of doing exactly
the same job: testing for mean differences. In some respects, this is true—both tests
use sample data to test hypotheses about population means.
◉ However, ANOVA has a tremendous advantage over t tests. Specifically, t tests are
limited to situations in which there are only two treatments to compare.
◉ The major advantage of ANOVA is that it can be used to compare two or more
treatments.
61
◉ There really are no differences between the populations (or
treatments). The observed differences between the sample means are
caused by random, unsystematic factors (sampling error) that
differentiate one sample from another.
◉ The populations (or treatments) really do have different means, and
these population mean differences are responsible for causing
systematic differences betweenthe sample means.
62
◉ In analysis of variance, the variable (independent or quasi-
independent) that designates the groups being compared is called a
factor.
◉ The individual conditions or values that make up a factor are called
the levels of the factor.
63
64
65
66
67
◉ The distribution of F-Ratios
◉ For ANOVA, we expect F near 1.00 if H0 is true. An F-ratio that is
much larger than 1.00 is an indication that H0 is not true. In the F
distribution, we need to separate those values that are
reasonably near 1.00 from the values that are significantly greater
than 1.00.
68
69
Exercise related to ANOVA
70
71
72
73
74
The Pearson Correlation
◉ The Pearson correlation measures the degree and the direction of the linear
relationship between two variables.
◉ The Pearson correlation for a sample is identified by the letter r. The
corresponding correlation for the entire population is identified by the Greek
letter rho (ρ), which is the Greek equivalent of the letter r.
75
76
◉ The sum of products of deviations, or SP. This new value is similar to SS
(the sum of squared deviations), which is used to measure variability for a
single variable. Now, we use SP to measure the amount of co-variability
between two variables.
◉ In general, the squared correlation (r2) measures the gain in accuracy that
is obtained from using the correlation for prediction. The squared
correlation measures the proportion of variability in the data that is
explained by the relationship between X and Y. It is sometimes called the
coefficient of determination.
77
◉ The value r2 is called the coefficient of determination because it
measures the proportion of variability in one variable that can be
determined from the relationship with the other variable. A correlation
of r = 0.80 (or –0.80), for example, means that r2 = 0.64 (or 64%) of the
variability in the Y scores can be predicted from the relationship with X.
78
Exercise related to Correlation
79
80
81
For further detail consult this book
82
83

Weitere ähnliche Inhalte

Ähnlich wie Introduction to Statistics Presentation.pptx

Probability and statistics(exercise answers)
Probability and statistics(exercise answers)Probability and statistics(exercise answers)
Probability and statistics(exercise answers)
Fatima Bianca Gueco
 
Probability and statistics
Probability and statisticsProbability and statistics
Probability and statistics
trixiacruz
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Aiden Yeh
 
Research methodology Chapter 6
Research methodology Chapter 6Research methodology Chapter 6
Research methodology Chapter 6
Pulchowk Campus
 

Ähnlich wie Introduction to Statistics Presentation.pptx (20)

Probability and statistics
Probability and statisticsProbability and statistics
Probability and statistics
 
Probability and statistics(exercise answers)
Probability and statistics(exercise answers)Probability and statistics(exercise answers)
Probability and statistics(exercise answers)
 
Probability and statistics
Probability and statisticsProbability and statistics
Probability and statistics
 
Statistics1(finals)
Statistics1(finals)Statistics1(finals)
Statistics1(finals)
 
Probability and statistics
Probability and statisticsProbability and statistics
Probability and statistics
 
Finals Stat 1
Finals Stat 1Finals Stat 1
Finals Stat 1
 
ststs nw.pptx
ststs nw.pptxststs nw.pptx
ststs nw.pptx
 
Quantitative Research Design.pptx
Quantitative Research Design.pptxQuantitative Research Design.pptx
Quantitative Research Design.pptx
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Research methodology Chapter 6
Research methodology Chapter 6Research methodology Chapter 6
Research methodology Chapter 6
 
Statistics.pptx
Statistics.pptxStatistics.pptx
Statistics.pptx
 
Understanding statistics in research
Understanding statistics in researchUnderstanding statistics in research
Understanding statistics in research
 
INTRODUCTION TO STATISTICS.pptx
INTRODUCTION TO STATISTICS.pptxINTRODUCTION TO STATISTICS.pptx
INTRODUCTION TO STATISTICS.pptx
 
Chapter_1_Lecture.pptx
Chapter_1_Lecture.pptxChapter_1_Lecture.pptx
Chapter_1_Lecture.pptx
 
Analyzing quantitative data
Analyzing quantitative dataAnalyzing quantitative data
Analyzing quantitative data
 
Mr 4. quantitative research design and methods
Mr 4. quantitative research design and methodsMr 4. quantitative research design and methods
Mr 4. quantitative research design and methods
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and Summary
 
Measurement and evaluation
Measurement and evaluationMeasurement and evaluation
Measurement and evaluation
 

Mehr von Aniqa Zai

Mehr von Aniqa Zai (13)

Philosophical Perspective of Socrates
Philosophical Perspective of SocratesPhilosophical Perspective of Socrates
Philosophical Perspective of Socrates
 
7 Habits of Highly Effective People (Habit 5)
7 Habits of Highly Effective People (Habit 5)7 Habits of Highly Effective People (Habit 5)
7 Habits of Highly Effective People (Habit 5)
 
Curriculum reforms in Pakistan (1947-2020)
Curriculum reforms in Pakistan (1947-2020)Curriculum reforms in Pakistan (1947-2020)
Curriculum reforms in Pakistan (1947-2020)
 
Essentialism
EssentialismEssentialism
Essentialism
 
Cognitive view of learning
Cognitive view of learningCognitive view of learning
Cognitive view of learning
 
Teacher Education for Sustainable Development
Teacher Education for Sustainable DevelopmentTeacher Education for Sustainable Development
Teacher Education for Sustainable Development
 
Montessori model
Montessori modelMontessori model
Montessori model
 
21st Century Skills, Technology and Education
21st Century Skills, Technology and Education21st Century Skills, Technology and Education
21st Century Skills, Technology and Education
 
Fear of faliure
Fear of faliureFear of faliure
Fear of faliure
 
School leadership and management
School leadership and managementSchool leadership and management
School leadership and management
 
Classroom management
Classroom managementClassroom management
Classroom management
 
The internet
The internetThe internet
The internet
 
Stress management
Stress managementStress management
Stress management
 

Kürzlich hochgeladen

obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
yulianti213969
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
pwgnohujw
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
yulianti213969
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
aqpto5bt
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
dq9vz1isj
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
23050636
 
Go paperless and transform your procurement process with the Hive Collaborati...
Go paperless and transform your procurement process with the Hive Collaborati...Go paperless and transform your procurement process with the Hive Collaborati...
Go paperless and transform your procurement process with the Hive Collaborati...
LitoGarin1
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
ppy8zfkfm
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Valters Lauzums
 

Kürzlich hochgeladen (20)

SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
Go paperless and transform your procurement process with the Hive Collaborati...
Go paperless and transform your procurement process with the Hive Collaborati...Go paperless and transform your procurement process with the Hive Collaborati...
Go paperless and transform your procurement process with the Hive Collaborati...
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 

Introduction to Statistics Presentation.pptx

  • 2. Definition of Statistics ◉ The term statistics refers to a set of mathematical procedures for organizing, summarizing, and interpreting information. ◉ Statistical procedures help ensure that the information or observations are presented and interpreted in an accurate and informative way. In somewhat grandiose terms, statistics help researchers bring order out of chaos. In addition, statistics provide researchers with a set of standardized techniques that are recognized and understood throughout the scientific community. 2
  • 3. Population and Sample ◉ A population is the set of all the individuals of interest in a particular study. ◉ As you can well imagine, a population can be quite large, for example, the entire set of women on the planet Earth. A researcher might be more specific, limiting the population for study to women who are registered voters in the United States. ◉ A sample is a set of individuals selected from a population, usually intended to represent the population in a research study. ◉ Just as we saw with populations, samples can vary in size. For example, one study might examine a sample of only 10 students in a graduate program and another study might use a sample of more than 10,000 people who take a specific cholesterol medication. 3
  • 4. 4
  • 5. Variable and Data ◉ A variable is a characteristic or condition that changes or has different values for different individuals. ◉ Once again, variables can be characteristics that differ from one individual to another, such as height, weight, gender, or personality. Also, variables can be environmental conditions that change such as temperature, time of day, or the size of the room in which the research is being conducted. ◉ Data (plural) are measurements or observations. ◉ A data set is a collection of measurements or observations. ◉ A datum (singular) is a single measurement or observation and is commonly called a score or raw score. 5
  • 6. Parameters and Statistics ◉ A parameter is a value, usually a numerical value, that describes a population. ◉ A parameter is usually derived from measurements of the individuals in the population. ◉ For example, we want to know the average length of a butterfly. This is a parameter because it is states something about the entire population of butterflies. ◉ A statistic is a value, usually a numerical value, that describes a sample. ◉ A statistic is usually derived from measurements of the individuals in the sample. ◉ For example, the parameter may be the average height of 25-year- old men in North America. The height of the members of a sample of 100 such men are measured; the average of those 100 numbers is a statistic. 6
  • 7. Descriptive and Inferential Statistical Methods ◉ Descriptive statistics are statistical procedures used to summarize, organize, and simplify data. ◉ Descriptive statistics are techniques that take raw scores and organize or summarize them in a form that is more manageable. Often the scores are organized in a table or a graph so that it is possible to see the entire set of scores. Another common technique is to summarize a set of scores by computing an average. ◉ Inferential statistics consist of techniques that allow us to study samples and then make generalizations about the populations from which they were selected. ◉ Because populations are typically very large, it usually is not possible to measure everyone in the population. Therefore, a sample is selected to represent the population. 7
  • 8. ◉ Sampling error is the naturally occurring discrepancy, or error, that exists between a sample statistic and the corresponding population parameter. 8
  • 9. Constructs and Operational Definitions ◉ Constructs are internal attributes or characteristics that cannot be directly observed but are useful for describing and explaining behaviour. ◉ Constructs exist at a higher level of abstraction than concepts. Justice, Beauty, Happiness, and Health are all constructs. ◉ An operational definition identifies a measurement procedure (a set of operations) for measuring an external behaviour and uses the resulting measurements as a definition and a measurement of a hypothetical construct. ◉ Note that an operational definition has two components. First, it describes a set of operations for measuring a construct. Second, it defines the construct in terms of the resulting measurements. 9
  • 10. Discrete and Continuous Variable ◉ A discrete variable consists of separate, indivisible categories. No values can exist between two neighbouring categories. ◉ Discrete variables are commonly restricted to whole, countable numbers—for example, the number of children in a family or the number of students attending class. A discrete variable may also consist of observations that differ qualitatively. For example, people can be classified by gender (male or female), by occupation (nurse, teacher, lawyer, etc.) ◉ For a continuous variable, there are an infinite number of possible values that fall between any two observed values. A continuous variable is divisible into an infinite number of fractional parts. ◉ For example, two people who both claim to weigh 150 pounds are probably not exactly the same weight. However, they are both around 150 pounds. One person may actually weigh 149.6 and the other 150.3. Thus, a score of 150 is not a specific point on the scale but instead is an interval 10
  • 12. Nominal Scale and Ordinal Scale ◉ A nominal scale consists of a set of categories that have different names. ◉ Measurements on a nominal scale label and categorize observations, but do not make any quantitative distinctions between observations. The rooms or offices in a building may be identified by numbers. ◉ An ordinal scale consists of a set of categories that are organized in an ordered sequence. Measurements on an ordinal scale rank observations in terms of size or magnitude. ◉ Often, an ordinal scale consists of a series of ranks (first, second, third, and so on) like the order of finish in a horse race. Occasionally, the categories are identified by verbal labels like small, medium, and large drink sizes at a fast-food restaurant. 12
  • 13. Interval Scale and Ratio Scale ◉ An interval scale consists of ordered categories that are all intervals of exactly the same size. ◉ Equal differences between numbers on scale reflect equal differences in magnitude. However, the zero point on an interval scale is arbitrary and does not indicate a zero amount of the variable being measured. ◉ A ratio scale is an interval scale with the additional feature of an absolute zero point. With a ratio scale, ratios of numbers do reflect ratios of magnitude. ◉ For example, you know that a measurement of 80° Fahrenheit is higher than a measure of 60°, and you know that it is exactly 20° higher. 13
  • 14. Shape of Frequency Distribution ◉ In a symmetrical distribution, it is possible to draw a vertical line through the middle so that one side of the distribution is a mirror image of the other. ◉ In a skewed distribution, the scores tend to pile up toward one end of the scale and taper off gradually at the other end. ◉ The section where the scores taper off toward one end of a distribution is called the tail of the distribution. ◉ A skewed distribution with the tail on the right-hand side is positively skewed because the tail points toward the positive (above-zero) end of the X-axis. If the tail points to the left, the distribution is negatively skewed. 14
  • 15. 15
  • 16. 16
  • 17. Introduction to Measures of Central Tendency Mean The mean for a distribution is the sum of the scores divided by the number of scores Median If the scores in a distribution are listed in order from smallest to largest, the median is the midpoint of the list. More specifically, the median is the point on the measurement scale below which 50% of the scores in the distribution are located. Mode If the scores in a distribution are listed in order from smallest to largest, the median is the midpoint of the list. More specifically, the median is the point on the measurement scale below which 50% of the scores in the distribution are located. 17
  • 18. Exercise related to Central Tendency 18
  • 19. Introduction to Variability ◉ Variability provides a quantitative measure of the differences between scores in a distribution and describes the degree to which the scores are spread out or clustered together. ◉ The range, is the distance covered by the scores in a distribution, from the smallest score to the largest score. ◉ Deviation is distance from the mean: Deviation score = X - μ 19
  • 20. ◉ SS, or sum of squares, is the sum of the squared deviation scores. ◉ Variance equals the mean of the squared deviations. Variance is the average. ◉ Standard deviation is the square root of the variance and provides a measure of the standard, or average distance from the mean. 20
  • 21. 21
  • 22. Exercise related to Variability 22
  • 23. Introduction to z-score ◉ The z-score definition is adequate for transforming back and forth from X values to z-scores as long as the arithmetic is easy to do in your head. ◉ Z-scores are often used in academic settings to analyze how well a student's score compares to the mean score on a given exam. For example, suppose the scores on a certain college entrance exam are roughly normally distributed with a mean of 82 and a standard deviation of 5. ◉ For more complicated values, it is best to have an equation to help structure the calculations. Fortunately, the relationship between X values and z-scores is easily expressed in a formula. The formula for transforming scores into z- scores is 23
  • 24. ◉ The numerator of the equation, X – μ, is a deviation score. ◉ It measures the distance in points between X and μ and indicates whether X is located above or below the mean. ◉ The deviation score is then divided by σ because we want the z- score to measure distance in terms of standard deviation units. ◉ The formula performs exactly the same arithmetic that is used with the z-score definition, and it provides a structured equation to organize the calculations when the numbers are more difficult. 24
  • 25. 25
  • 26. Exercise related to z-score 26
  • 27. Introduction to Hypothesis Testing ◉ A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis about a population. ◉ The Four Steps of a Hypothesis Test STEP 1 State the hypothesis. As the name implies, the process of hypothesis testing begins by stating a hypothesis about the unknown population. Actually, we state two opposing hypotheses. Notice that both hypotheses are stated in terms of population parameters. 27
  • 28. ◉ The first and most important of the two hypotheses is called the null hypothesis. The null hypothesis states that the treatment has no effect. Thenull hypothesis is identified by the symbol H0. The null hypothesis (H0) states that in the general population there is no change, no difference, or no relationship. In the context of an experiment, H0 predicts that the independent variable (treatment) has no effect on the dependent variable (scores) for the population. 28
  • 29. ◉ The second hypothesis is simply the opposite of the null hypothesis, and it is called the scientific, or alternative, hypothesis (H1) The alternative hypothesis (H1) states that there is a change, a difference, or a relationship for the general population. In the context of an experiment, H1 predicts that the independent variable (treatment) does have an effect on the dependent variable. 29
  • 30. 30
  • 31. 31
  • 32. 32
  • 33. Type I & Type II Error 33
  • 34. ◉ A Type I error occurs when a researcher rejects a null hypothesis that is actually true. In a typical research situation, a Type I error means the researcher concludes that a treatment does have an effect when in fact it has no effect. ◉ A Type II error occurs when a researcher fails to reject a null hypothesis that is really false. In a typical research situation, a Type II error means that the hypothesis test has failed to detect a real treatment effect. 34
  • 35. STEP 2 Set the criteria for a decision. Eventually the researcher will use the data from the sample to evaluate the credibility of the null hypothesis. The data will either provide support for the null hypothesis or tend to refute the null hypothesis. The Alpha Level To find the boundaries that separate the high-probability samples from the low-probability samples, we must define exactly what is meant by “low” probability and “high” probability. This is accomplished by selecting a specific probability value, which is known as the level of significance, or the alpha level, for the hypothesis test. The alpha (α) value is a small probability that is used to identify the low-probability samples. By convention, commonly used alpha levels are α = .05 (5%), α = .01 (1%), and α = .001 (0.1%). 35
  • 36. ◉ The extremely unlikely values, as defined by the alpha level, make up what is called the critical region. ◉ The alpha level, or the level of significance, is a probability value that is used to define the concept of “very unlikely” in a hypothesis test. ◉ The critical region is composed of the extreme sample values that are very unlikely (as defined by the alpha level) to be obtained if the null hypothesis is true. The boundaries for the critical region are determined by the alpha level. If sample data fall in the critical region, the null hypothesis is rejected. 36
  • 37. ◉ The Boundaries for the Critical Region To determine the exact location for the boundaries that define the critical region, we use the alpha-level probability and the unit. ◉ In most cases, the distribution of sample means is normal, and the unit normal table provides the precise z-score location for the critical region boundaries. 37
  • 38. ◉ Degrees of freedom describe the number of scores in a sample that are independent and free to vary. Because the sample mean places a restriction on the value of one score in the sample, there are n – 1 degrees of freedom for a sample with n scores ◉ For a sample of n scores, the degrees of freedom, or df, for the sample variance are defined as df = n - 1. The degrees of freedom determine the number of scores in the sample that are independent and free to vary. 38
  • 39. The Unit Table ◉ The graph shows proportions for only a few selected z-score values. A more complete listing of z-scores and proportions is provided in the unit normal table. ◉ This table lists proportions of the normal distribution for a full range of possible z- score values. 39 A normal distribution following a z-score transformation
  • 40. 40
  • 41. 41
  • 42. STEP 3 Collect data and compute sample statistics ◉ The data are as given, so all that remains is to compute the statistic. 42
  • 43. STEP 4 Make a decision ◉ The sample data are located in the critical region. By definition, a sample value in the critical region is very unlikely to occur if the null hypothesis is true. Therefore, we conclude that the sample is not consistent with H0 and our decision is to reject the null hypothesis. Remember, the null hypothesis states that there is no treatment effect, so rejecting H0 means we are concluding that the treatment did have an effect. 43
  • 44. Introduction to the t Statistic ◉ The t statistic is used to test hypotheses about an unknown population mean, μ, when the value of σ is unknown. The formula for the t statistic has the same structure as the z- score formula, except that the t statistic uses the estimated standard error in the denominator. 44
  • 45. ◉ The estimated standard error (SM) is used as an estimate of the real standard error σM when the value of σ is unknown. It is computed from the sample variance or sample standard deviation and provides an estimate of the standard distance between a sample mean M and the population mean μ. 45
  • 46. Exercise related to t test 46
  • 47. 47
  • 48. The t Test for Two Independent Samples ◉ A research design that uses a separate group of participants for each treatment condition (or for each population) is called an independent-measures research design or a between-subjects design. 48
  • 49. ◉ The Estimated Standard Error In each of the t-score formulas, the standard error in the denominator measures how accurately the sample statistic represents the population parameter. In the single-sample t formula, the standard error measures the amount of error expected for a sample mean and is represented by the symbol SM. For the independent measures t formula, the standard error measures the amount of error that is expected when you use a sample mean difference (M1 − M2) to represent a population mean difference (μ1 − μ2). The standard error for the sample mean difference is represented by the symbol S(M1 - M2) 49
  • 50. ◉ Pooled Variance ◉ One method for correcting the bias in the standard error is to combine the two sample variances into a single value called the pooled variance. The pooled variance is obtained by averaging or “pooling” the two sample variances using a procedure that allows the bigger sample to carry more weight in determining the final value. ◉ For the independent-measures t statistic, there are two SS values and two df values (one from each sample). The values from the two samples are combined to compute what is called the pooled variance. 50
  • 51. 51
  • 52. 52
  • 53. 53
  • 54. 54 Exercise related to independent sample t test
  • 55. 55
  • 56. 56
  • 57. Introduction to Repeated-Measures Designs ◉ A repeated-measures design, or a within-subject design, is one in which the dependent variable is measured two or more times for each individual in a single sample. The same group of subjects is used in all of the treatment conditions. ◉ In a repeated-measures design or a matched-subjects design comparing two treatment conditions, the data consist of two sets of scores, which are grouped into sets of two, corresponding to the two scores obtained for each individual or each matched pair of subjects. 57
  • 58. 58 Exercise related to dependent sample t test
  • 59. 59
  • 60. 60
  • 61. Analysis of Variance ANOVA ◉ Analysis of variance (ANOVA) is a hypothesis-testing procedure that is used to evaluate mean differences between two or more treatments (or populations). ◉ As with all inferential procedures, ANOVA uses sample data as the basis for drawing general conclusions about populations. ◉ It may appear that ANOVA and t tests are simply two different ways of doing exactly the same job: testing for mean differences. In some respects, this is true—both tests use sample data to test hypotheses about population means. ◉ However, ANOVA has a tremendous advantage over t tests. Specifically, t tests are limited to situations in which there are only two treatments to compare. ◉ The major advantage of ANOVA is that it can be used to compare two or more treatments. 61
  • 62. ◉ There really are no differences between the populations (or treatments). The observed differences between the sample means are caused by random, unsystematic factors (sampling error) that differentiate one sample from another. ◉ The populations (or treatments) really do have different means, and these population mean differences are responsible for causing systematic differences betweenthe sample means. 62
  • 63. ◉ In analysis of variance, the variable (independent or quasi- independent) that designates the groups being compared is called a factor. ◉ The individual conditions or values that make up a factor are called the levels of the factor. 63
  • 64. 64
  • 65. 65
  • 66. 66
  • 67. 67
  • 68. ◉ The distribution of F-Ratios ◉ For ANOVA, we expect F near 1.00 if H0 is true. An F-ratio that is much larger than 1.00 is an indication that H0 is not true. In the F distribution, we need to separate those values that are reasonably near 1.00 from the values that are significantly greater than 1.00. 68
  • 69. 69
  • 71. 71
  • 72. 72
  • 73. 73
  • 74. 74
  • 75. The Pearson Correlation ◉ The Pearson correlation measures the degree and the direction of the linear relationship between two variables. ◉ The Pearson correlation for a sample is identified by the letter r. The corresponding correlation for the entire population is identified by the Greek letter rho (ρ), which is the Greek equivalent of the letter r. 75
  • 76. 76
  • 77. ◉ The sum of products of deviations, or SP. This new value is similar to SS (the sum of squared deviations), which is used to measure variability for a single variable. Now, we use SP to measure the amount of co-variability between two variables. ◉ In general, the squared correlation (r2) measures the gain in accuracy that is obtained from using the correlation for prediction. The squared correlation measures the proportion of variability in the data that is explained by the relationship between X and Y. It is sometimes called the coefficient of determination. 77
  • 78. ◉ The value r2 is called the coefficient of determination because it measures the proportion of variability in one variable that can be determined from the relationship with the other variable. A correlation of r = 0.80 (or –0.80), for example, means that r2 = 0.64 (or 64%) of the variability in the Y scores can be predicted from the relationship with X. 78
  • 79. Exercise related to Correlation 79
  • 80. 80
  • 81. 81
  • 82. For further detail consult this book 82
  • 83. 83