SlideShare ist ein Scribd-Unternehmen logo
1 von 56
1
Research Methods in Health
Chapter 7. Statistical Methods 1
Young Moon Chae, Ph.D.
Graduate School of Public Health
Yonsei University, Korea
ymchae@yuhs.ac
2
Topics
• What is Biostatistics?
• Biostatistics in Public Health Research
• Descriptive statistics
• Inference statistics
• Power of test
• T-test
3
Concepts
Biostatistics
• Biostatistics is the development and application of statistics to research in
health-related fields.
Statistics
• Common perceptions of statistics: numbers, tables, figures, polls, rates, etc.
• These are “descriptions of the world”
• Analysis of data
4
Biostatistics in Public Health Research
Methodological research:
• new statistical techniques
• high speed of computing
• geographical patterns of disease
• clinical trials
• longitudinal analysis
• data analysis in epidemiology studies
5
Errors in Statistical methods
• Research design
-Improper control group in case-control design
-Selection bias (sample does not represent study population)
-Too small sample size
• Statistical methods
-Parametric statistics for small sample
-T-test for the related sample
-T-test or ANOVA for the samples that do not meet assumptions
(normality, equal variances, independence)
-T-test for multiple comparison
-Regression for nominal dependent variable
-Regression with multi-colinearity
-Chi-square test with cell size less than 5
6
6
Overview of statistical methods
7
Descriptive vs. Inferential Statistics
• The mean and standard deviation can be used in 2 ways.
-One way is to describe the distribution of data
-The other way is to infer something about a population (is the population
mean 25? 20?). A statistical test!
• Because the sampling distribution of the mean is normally distributed
(Central Limit Theorem), we can use the normal to show how close the
parameter is likely to be to the sample mean and to make decisions about
treatments.
8
Descriptive Statistics
• Descriptive Statistics
-Mean, median, mode
-Variance, standard deviation, range, Interquartile range, quartile range,
-Skewness, Kurtosis
• Frequency tables, Bar charts and pie charts, Histograms, Stem-and-Leaf
display
9
Variables have distributions
• A variable is something that changes or has different values (e.g., anger).
• A distribution is a collection of measures, usually across people.
• Distributions of numbers can be summarized with numbers (called statistics
or parameters).
10
Central Tendency
Central Tendency refers to the Middle of the Distribution
11
Middle of the Distribution
Common Statistics
• Mode
-Most common score
• Median
-Top from bottom 50 percent
• Mean
-Arithmetic mean or average
12
Mode
• The most frequently occurring score. Can have bimodal and multimodal
distributions. Modal public health student is female.
13
Median
• Score that separates top 50% from bottom 50%
• Even number of scores, median is half way between two middle scores.
-1 2 3 4 | 5 6 7 8 – Median is 4.5
• Odd number of scores, median is the middle number
-1 2 3 4 5 6 7 – Median is 4
14
Mean
• Sum of scores divided by the number of people. Population mean is
(mu) and sample mean is
• We calculate the sample mean by:
• We calculate the population mean by:
m
X
N
X
X
å=
N
Xå=m
15
Comparison of statistics
• Mode
-Good for nominal variables
-Good if you need to know most frequent observation
-Quick and easy
• Median
-Good for “bad” (skewed) distributions
-Often used with distributions of money
• Mean
-Used for inference as well as description; best estimator of the parameter
-Based on all data in the distribution
-Generally preferred except for “bad” distribution.
-Most commonly used statistic for central tendency.
16
Effects of Distribution Shape
17
Distribution Shapes
• Normal
• Center
• Spread
• Shoulders
• Skew
18
Normal
19
Central Tendency
20
Variability (spread)
Central tendency and Variability
21
Skew
The tail!
22
Kurtosis - shoulders
23
Inferential Statistics
• Estimation: This includes point and interval estimation of certain
characteristics in the population(s).
• Testing Hypothesis about population parameter(s) based on the
information contained in the sample(s).
24
Estimation of Parameters
• Point Estimation
• Interval Estimation (Confidence Intervals)
• Bound on the error of estimation
• The width of a confidence interval is directly related to the bound on the
error.
25
Sampling Distribution
• Sampling distribution is a distribution of a statistic (not raw data) over all
possible samples. Same as distribution over infinite number of trials.
• Notion of trials, experiments, replications
• Coin toss example (5 flips, # heads)
• Repeated estimation of the mean
26
Mean of Sampling Distribution
• Statisticians have worked out properties of sampling
distributions
• Middle and spread of sampling distribution are known.
• If mean of sampling distribution equals parameter, statistic
is unbiased. (otherwise, it’s biased.) The sample mean is
unbiased.
• Best estimate of is .
X
Xm
27
SD of Sampling Distribution
• The standard deviation of the sampling distribution is the standard error.
For the mean, it indicates the average distance of the statistic from the
parameter.
80787674727068666462605856545250
Heignt in Inches
Raw Data
Means (N=50)
Standard Error
Standard error of the mean.
28
Factors influencing the Bound on the error of
estimation
• Narrow confidence intervals are preferred
• As the sample size increases, the bound on the error of estimation
decreases.
• As the confidence level increases the bound on the error of estimation
increases.
• You need to plan a sample size to achieve the desired level of error
and confidence.
29
Decision Making Under Uncertainty
• You have to make decisions even when you are unsure. School, marriage,
therapy, jobs, whatever.
• Statistics provides an approach to decision making under uncertainty. Sort
of decision making by choosing the same way you would bet. Maximize
expected utility (subjective value).
• Comes from agronomy, where they were trying to decide what strain to
plant.
30
Statistical Hypotheses
• While attempting to make decisions, some necessary assumptions or guesses about
the populations or statements about the probability distribution of the populations
made are called statistical hypothesis. These assumptions are to be proved or
disproved
• A predictive statement usually put in the form of a null hypothesis and alternate
hypothesis
Hypothesis Testing
• Researcher bets in advance of his experiment that the results will agree with his
theory and cannot be accounted for by the chance variation involved in sampling
• Procedures which enable researcher to decide whether to accept or reject
hypothesis or whether observed samples differ significantly from expected results
31
Statistical Hypotheses
• Statements about characteristics of populations, denoted H:
-H: normal distribution,
-H: N(28,13)
• The hypothesis actually tested is called the null hypothesis, H0
-E.g.,
• The other hypothesis, assumed true if the null is false, is the alternative
hypothesis, H1
-E.g.,
13;28 == sm
100:0 =mH
100:1 ¹mH
32
Testing Statistical Hypotheses - steps
• State the null and alternative hypotheses
• Assume whatever is required to specify the sampling distribution of the
statistic (e.g., SD, normal distribution, etc.)
• Find rejection region of sampling distribution –that place which is not likely if
null is true
• Collect sample data. Find whether statistic falls inside or outside the
rejection region. If statistic falls in the rejection region, result is said to be
statistically significant.
33
The level of significance (a )
• a is known as the nominal level of significance.
• If p-value < a, then we reject the null hypothesis in favor of the alternative
hypothesis.
• P-value is also known as the observed level of significance.
• a needs to be pre-determined. (Usually 5%)
34
Type I and Type II errors
• Type I error is committed when a true null hypothesis is rejected.
• a is the probability of committing type I error.
• Type II error is committed when a false null hypothesis is not rejected.
• b is the probability of committing type II error.
35
Power of a test
• The power of a test is the probability that a false null hypothesis is
rejected.
• Power = 1 - b, where b is the probability of committing type II error.
• More powerful tests are preferred. At the design stage one should
identify the desired level of power in the given situation.
36
Decisions
Null true Null False
Accept
Null
Right Beta (type II
error)
Reject
Null
Alpha
(type I
error)
Correct
rejection
(power)
Population Condition
Sample Decision
No fire Fire
Alarm silent Right,
but…
Beta
Alarm on Alpha Correct rejection
Three named probabilities:
Alpha, beta, and power.
37
Power of a test (1-β):
• Value (1-β) indicates how well the test is working, i.e., value nearer to 1 means
working well (test is rejecting Ho when it is not true) and value nearer to 0 means
poorly working (not rejecting Ho when it is not true)
• It indicates how well given test will enable us to minimize the probability of type II
error (β), i.e., avoid making wrong decisions. Hypothesis testing cannot be foolproof.
Sometimes test does not reject a Ho which is false (type II error). We would like β to
be as small as possible or (1-β) to be as large as possible.
• Operating Characteristic Function (L) L = 1 -H : Shows conditional probability of
accepting Ho for all values of population parameters for a given sample size,
whether or not the decision happens to be a correct one
• OC curve -graphs showing the probabilities of type II error (β) under various
hypotheses
38
Factors influencing the Power
• The power of a test is influenced by the magnitude of the difference
between the null hypothesis and the true parameter.
• The power of a test could be improved by increasing the sample size.
• The power of a test could be improved by increasing a. (this is a very
artificial way)
39
One Tail or Two Tails
The rejection region can fit into 1 or 2 tails of the
sampling distribution of means. The RR is determined by
the alternative hypothesis.
Two Tails One Tail
valueH =m:0
valueH a ¹m:
valueH =m:0
orvalueH a >m:
valueH a <m:
40
Tails illustrated
valueH a ¹m:
3210-1-2-3
ZZZ
1.96-1.96
Don't reject RejectReject
Likely Outcome
If Null is True
3210-1-2-3
ZZ
Don't reject Reject
Likely Outcome
If Null is True
1.65
Two tails.
One tail.
valueH a >m:
Note 1.96 vs. 1.65
41
Example of 2 tails
• Suppose:
• Then:
75:;75:0 ¹= mm aHH
25,10 == Ns
92.7808.71
25
10
96.175 «=±
X
78.9271.08
Don't reject RejectReject
Likely Outcome
If Null is True
75
Note 5 percent is split into two tails.
42
Example of 1 tail
• Suppose:
75:;75:0 >= mm aHH
25,10 == Ns
3.78
25
10
65.175 =+
8079787776757473727170
X
0.20
0.16
0.12
0.08
0.04
0.00
8079787776757473727170
X
Sampling Distribution of Means
Likely Outcome if Null is True
Don't Reject
Reject
78.3
Note all 5 percent is at the top tail.
43
Parametric or Standard Tests
• Require measurements equivalent to at least an interval scale
• Assume certain properties of parent population like
-i) observations are from a normal population
-ii) large random sample
-iii) population parameters like mean, variance, etc. must hold good
• Situations where above assumptions are not possible, non-parametric tests
are used; As there is no model, these tests are also called distribution-free
tests
44
Parametric tests
Z-test
• Based on the normal probability distribution and even binomial in case of large
samples.
• For testing mean, variance, two individual samples, median, mode, correlation,
coefficients etc.
T-test
• It is based on t-distribution and only incase of small samples
• Used for testing difference between means of two samples, coefficient of simple &
partial correlations, etc.
45
(cont.)
F-test
• Used in the context of ANOVA and for the testing the significance of multiple
correlation coefficients, comparing the variance of two independent samples,
• Χ2
Test
• Based on Chi-square distribution
• Used for comparing a sample variance to a theoretical population variance
46
Some Important Parametric Tests
47
(cont.)
48
48
T test
Mann Whitney
49
The t-test
Inferences about Population Means
50
The t Distribution
• We use t when the population variance is unknown (the usual case)
and sample size is small (N<100, the usual case).
• The t distribution is a short, fat relative of the normal. The shape of t
depends on its df. As N becomes infinitely large, t becomes normal.
51
Assumptions
• The t-test is based on assumptions of normality
• Two groups are independent
• Homogeneity of variance -> can be tested by using F-test.
• As long as the samples in each group are large and nearly equal, the t-test
is robust, that is, still good, even though assumptions are not met.
52
Normality Assumption
• We assume normal distributions to figure sampling distributions and thus p
levels.
• Violations of normality have implications for testing means. Need to use
non-parametric statistics or use data transformation
• Can test for normality by using Kolmogrov-Simirnov test
53
The F Distribution (1)
• The F distribution is the ratio of two variance estimates:
• Also the ratio of two chi-squares, each divided by its degrees of freedom:
2
2
2
1
2
2
2
1
.
.
s
s
est
est
s
s
F ==
2
2
(
1
2
)(
/)
/
2
1
v
v
F
v
v
c
c
=
In our applications, v2 will be larger than v1 and v2 will
be larger than 2. In such a case, the mean of the F
distribution (expected value) is
v2 /(v2 -2).
54
Testing Hypotheses about 2 Variances
• Suppose
-Note 1-tailed.
• We find
• Then df1=df2 = 15, and
2
2
2
11
2
2
2
10 :;: ssss >£ HH
7.1;16;8.5;16 2
22
2
11 ==== sNsN
41.3
7.1
8.5
2
2
2
1
===
s
s
F
Going to the F table with 15 and 15 df, we find
that for alpha = .05 (1-tailed), the critical value
is 2.40. Therefore the result is significant.
55
Application of F Distribution
• The F distribution is used in many statistical tests
-Test for equality of variances.
-Tests for differences in means in ANOVA.
-Tests for regression models (slopes relating one continuous variable to
another like SAT and GPA).
56
Reference
• Cohen, Louis and Manion, Lawrence. Research methods in education.
London: Routledge, 1980.
• Goode, William J and Hatt, Paul K. Methods on social research. London; Mc
GrawHill, 1981.
• 10.Gopal, M.H. An introduction to research procedures in social sciences.
Bombay: Asia Publishing House, 1970.
• Koosis, Donald J. Business statistics. New York: John Wiley,1972.

Weitere ähnliche Inhalte

Was ist angesagt?

Inferential statistics
Inferential statisticsInferential statistics
Inferential statisticsMaria Theresa
 
Confidence intervals
Confidence intervalsConfidence intervals
Confidence intervalsTanay Tandon
 
Frequency Distributions
Frequency DistributionsFrequency Distributions
Frequency Distributionsjasondroesch
 
INFERENTIAL STATISTICS: AN INTRODUCTION
INFERENTIAL STATISTICS: AN INTRODUCTIONINFERENTIAL STATISTICS: AN INTRODUCTION
INFERENTIAL STATISTICS: AN INTRODUCTIONJohn Labrador
 
Sample size estimation
Sample size estimationSample size estimation
Sample size estimationHanaaBayomy
 
Chapter 6 part2-Introduction to Inference-Tests of Significance, Stating Hyp...
Chapter 6 part2-Introduction to Inference-Tests of Significance,  Stating Hyp...Chapter 6 part2-Introduction to Inference-Tests of Significance,  Stating Hyp...
Chapter 6 part2-Introduction to Inference-Tests of Significance, Stating Hyp...nszakir
 
Chi square test
Chi square testChi square test
Chi square testNayna Azad
 
Non parametric test
Non parametric testNon parametric test
Non parametric testNeetathakur3
 
L16 rm (systematic review and meta-analysis)-samer
L16 rm (systematic review and meta-analysis)-samerL16 rm (systematic review and meta-analysis)-samer
L16 rm (systematic review and meta-analysis)-samerDr Ghaiath Hussein
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statisticsAshok Kulkarni
 
Meta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptxMeta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptxACSRM
 
Hypothesis testing an introduction
Hypothesis testing an introductionHypothesis testing an introduction
Hypothesis testing an introductionGeetika Gulyani
 

Was ist angesagt? (20)

Inferential statistics
Inferential statisticsInferential statistics
Inferential statistics
 
Metaanalysis copy
Metaanalysis    copyMetaanalysis    copy
Metaanalysis copy
 
Confidence intervals
Confidence intervalsConfidence intervals
Confidence intervals
 
Testing Hypothesis
Testing HypothesisTesting Hypothesis
Testing Hypothesis
 
Frequency Distributions
Frequency DistributionsFrequency Distributions
Frequency Distributions
 
INFERENTIAL STATISTICS: AN INTRODUCTION
INFERENTIAL STATISTICS: AN INTRODUCTIONINFERENTIAL STATISTICS: AN INTRODUCTION
INFERENTIAL STATISTICS: AN INTRODUCTION
 
Sample size estimation
Sample size estimationSample size estimation
Sample size estimation
 
Chapter 6 part2-Introduction to Inference-Tests of Significance, Stating Hyp...
Chapter 6 part2-Introduction to Inference-Tests of Significance,  Stating Hyp...Chapter 6 part2-Introduction to Inference-Tests of Significance,  Stating Hyp...
Chapter 6 part2-Introduction to Inference-Tests of Significance, Stating Hyp...
 
Chi square test
Chi square testChi square test
Chi square test
 
Sample size calculation
Sample size calculationSample size calculation
Sample size calculation
 
Calculating p value
Calculating p valueCalculating p value
Calculating p value
 
Biostatistics lec 1
Biostatistics lec 1Biostatistics lec 1
Biostatistics lec 1
 
Non parametric test
Non parametric testNon parametric test
Non parametric test
 
Chi square mahmoud
Chi square mahmoudChi square mahmoud
Chi square mahmoud
 
L16 rm (systematic review and meta-analysis)-samer
L16 rm (systematic review and meta-analysis)-samerL16 rm (systematic review and meta-analysis)-samer
L16 rm (systematic review and meta-analysis)-samer
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
 
Student t-test
Student t-testStudent t-test
Student t-test
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statistics
 
Meta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptxMeta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptx
 
Hypothesis testing an introduction
Hypothesis testing an introductionHypothesis testing an introduction
Hypothesis testing an introduction
 

Ähnlich wie Research method ch07 statistical methods 1

7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdf7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdfezaldeen2013
 
Tests of significance Periodontology
Tests of significance PeriodontologyTests of significance Periodontology
Tests of significance PeriodontologySaiLakshmi128
 
Application of statistical tests in Biomedical Research .pptx
Application of statistical tests in Biomedical Research .pptxApplication of statistical tests in Biomedical Research .pptx
Application of statistical tests in Biomedical Research .pptxHalim AS
 
BASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdf
BASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdfBASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdf
BASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdfAdamu Mohammad
 
TREATMENT OF DATA_Scrd.pptx
TREATMENT OF DATA_Scrd.pptxTREATMENT OF DATA_Scrd.pptx
TREATMENT OF DATA_Scrd.pptxCarmela857185
 
De-Mystifying Stats: A primer on basic statistics
De-Mystifying Stats: A primer on basic statisticsDe-Mystifying Stats: A primer on basic statistics
De-Mystifying Stats: A primer on basic statisticsGillian Byrne
 
Bio-Statistics in Bio-Medical research
Bio-Statistics in Bio-Medical researchBio-Statistics in Bio-Medical research
Bio-Statistics in Bio-Medical researchShinjan Patra
 
ststs nw.pptx
ststs nw.pptxststs nw.pptx
ststs nw.pptxMrymNb
 
Parametric tests seminar
Parametric tests seminarParametric tests seminar
Parametric tests seminardrdeepika87
 
Chi square test final
Chi square test finalChi square test final
Chi square test finalHar Jindal
 
Class 5 Hypothesis & Normal Disdribution.pptx
Class 5 Hypothesis & Normal Disdribution.pptxClass 5 Hypothesis & Normal Disdribution.pptx
Class 5 Hypothesis & Normal Disdribution.pptxCallplanetsDeveloper
 
Epidemiology Chapter 5.pptx
Epidemiology Chapter 5.pptxEpidemiology Chapter 5.pptx
Epidemiology Chapter 5.pptxAdugnaWari
 
COM 201_Inferential Statistics_18032022.pptx
COM 201_Inferential Statistics_18032022.pptxCOM 201_Inferential Statistics_18032022.pptx
COM 201_Inferential Statistics_18032022.pptxAkinsolaAyomidotun
 
Topic 7 stat inference
Topic 7 stat inferenceTopic 7 stat inference
Topic 7 stat inferenceSizwan Ahammed
 
Stat 1163 -statistics in environmental science
Stat 1163 -statistics in environmental scienceStat 1163 -statistics in environmental science
Stat 1163 -statistics in environmental scienceKhulna University
 
Test of significance in Statistics
Test of significance in StatisticsTest of significance in Statistics
Test of significance in StatisticsVikash Keshri
 

Ähnlich wie Research method ch07 statistical methods 1 (20)

Ds 2251 -_hypothesis test
Ds 2251 -_hypothesis testDs 2251 -_hypothesis test
Ds 2251 -_hypothesis test
 
7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdf7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdf
 
Tests of significance Periodontology
Tests of significance PeriodontologyTests of significance Periodontology
Tests of significance Periodontology
 
Application of statistical tests in Biomedical Research .pptx
Application of statistical tests in Biomedical Research .pptxApplication of statistical tests in Biomedical Research .pptx
Application of statistical tests in Biomedical Research .pptx
 
BASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdf
BASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdfBASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdf
BASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdf
 
TREATMENT OF DATA_Scrd.pptx
TREATMENT OF DATA_Scrd.pptxTREATMENT OF DATA_Scrd.pptx
TREATMENT OF DATA_Scrd.pptx
 
De-Mystifying Stats: A primer on basic statistics
De-Mystifying Stats: A primer on basic statisticsDe-Mystifying Stats: A primer on basic statistics
De-Mystifying Stats: A primer on basic statistics
 
Bio-Statistics in Bio-Medical research
Bio-Statistics in Bio-Medical researchBio-Statistics in Bio-Medical research
Bio-Statistics in Bio-Medical research
 
ststs nw.pptx
ststs nw.pptxststs nw.pptx
ststs nw.pptx
 
Testing of hypothesis and Goodness of fit
Testing of hypothesis and Goodness of fitTesting of hypothesis and Goodness of fit
Testing of hypothesis and Goodness of fit
 
Parametric tests seminar
Parametric tests seminarParametric tests seminar
Parametric tests seminar
 
Chi square test final
Chi square test finalChi square test final
Chi square test final
 
Class 5 Hypothesis & Normal Disdribution.pptx
Class 5 Hypothesis & Normal Disdribution.pptxClass 5 Hypothesis & Normal Disdribution.pptx
Class 5 Hypothesis & Normal Disdribution.pptx
 
Epidemiology Chapter 5.pptx
Epidemiology Chapter 5.pptxEpidemiology Chapter 5.pptx
Epidemiology Chapter 5.pptx
 
COM 201_Inferential Statistics_18032022.pptx
COM 201_Inferential Statistics_18032022.pptxCOM 201_Inferential Statistics_18032022.pptx
COM 201_Inferential Statistics_18032022.pptx
 
Basics of Statistics.pptx
Basics of Statistics.pptxBasics of Statistics.pptx
Basics of Statistics.pptx
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Topic 7 stat inference
Topic 7 stat inferenceTopic 7 stat inference
Topic 7 stat inference
 
Stat 1163 -statistics in environmental science
Stat 1163 -statistics in environmental scienceStat 1163 -statistics in environmental science
Stat 1163 -statistics in environmental science
 
Test of significance in Statistics
Test of significance in StatisticsTest of significance in Statistics
Test of significance in Statistics
 

Mehr von naranbatn

эрүүл мэндийн шинжлэх ухаан
эрүүл мэндийн шинжлэх ухаанэрүүл мэндийн шинжлэх ухаан
эрүүл мэндийн шинжлэх ухаанnaranbatn
 
Instructions to authors
Instructions to authorsInstructions to authors
Instructions to authorsnaranbatn
 
Төгсөлтийн сургалтыг зохицуулах журам
Төгсөлтийн сургалтыг зохицуулах журамТөгсөлтийн сургалтыг зохицуулах журам
Төгсөлтийн сургалтыг зохицуулах журамnaranbatn
 
хичээлийн хуваарь 2011-2012 1-р улирал
хичээлийн хуваарь 2011-2012 1-р улиралхичээлийн хуваарь 2011-2012 1-р улирал
хичээлийн хуваарь 2011-2012 1-р улиралnaranbatn
 
д.амарсайхан захирал
д.амарсайхан захиралд.амарсайхан захирал
д.амарсайхан захиралnaranbatn
 
ц.лхагвасүрэн захирал
ц.лхагвасүрэн захиралц.лхагвасүрэн захирал
ц.лхагвасүрэн захиралnaranbatn
 
Self eval report english final for printing, 28.09.2011last for printing
Self eval report english final for printing, 28.09.2011last for printingSelf eval report english final for printing, 28.09.2011last for printing
Self eval report english final for printing, 28.09.2011last for printingnaranbatn
 
Бүрдүүлэх материал
Бүрдүүлэх материалБүрдүүлэх материал
Бүрдүүлэх материалnaranbatn
 
мэргэжлийн индекс
мэргэжлийн индексмэргэжлийн индекс
мэргэжлийн индексnaranbatn
 
Health sciences university of mongolia
Health sciences university of mongoliaHealth sciences university of mongolia
Health sciences university of mongolianaranbatn
 
Магистрын ганцаарчилсан сургалтын төлөвлөгөө
Магистрын ганцаарчилсан сургалтын төлөвлөгөөМагистрын ганцаарчилсан сургалтын төлөвлөгөө
Магистрын ганцаарчилсан сургалтын төлөвлөгөөnaranbatn
 
Germany summer school-2010
Germany summer school-2010Germany summer school-2010
Germany summer school-2010naranbatn
 
Uni sannio courses
Uni sannio coursesUni sannio courses
Uni sannio coursesnaranbatn
 
Germany summer school-2010
Germany summer school-2010Germany summer school-2010
Germany summer school-2010naranbatn
 
Germany international semester
Germany international semesterGermany international semester
Germany international semesternaranbatn
 
Uni sannio courses_faculty_economics
Uni sannio courses_faculty_economicsUni sannio courses_faculty_economics
Uni sannio courses_faculty_economicsnaranbatn
 
Ull ph dproposals
Ull ph dproposalsUll ph dproposals
Ull ph dproposalsnaranbatn
 
Ull ph dproposals
Ull ph dproposalsUll ph dproposals
Ull ph dproposalsnaranbatn
 
Su pd ph-dproposals
Su pd ph-dproposalsSu pd ph-dproposals
Su pd ph-dproposalsnaranbatn
 
Sannio ph dproposals_v4
Sannio ph dproposals_v4Sannio ph dproposals_v4
Sannio ph dproposals_v4naranbatn
 

Mehr von naranbatn (20)

эрүүл мэндийн шинжлэх ухаан
эрүүл мэндийн шинжлэх ухаанэрүүл мэндийн шинжлэх ухаан
эрүүл мэндийн шинжлэх ухаан
 
Instructions to authors
Instructions to authorsInstructions to authors
Instructions to authors
 
Төгсөлтийн сургалтыг зохицуулах журам
Төгсөлтийн сургалтыг зохицуулах журамТөгсөлтийн сургалтыг зохицуулах журам
Төгсөлтийн сургалтыг зохицуулах журам
 
хичээлийн хуваарь 2011-2012 1-р улирал
хичээлийн хуваарь 2011-2012 1-р улиралхичээлийн хуваарь 2011-2012 1-р улирал
хичээлийн хуваарь 2011-2012 1-р улирал
 
д.амарсайхан захирал
д.амарсайхан захиралд.амарсайхан захирал
д.амарсайхан захирал
 
ц.лхагвасүрэн захирал
ц.лхагвасүрэн захиралц.лхагвасүрэн захирал
ц.лхагвасүрэн захирал
 
Self eval report english final for printing, 28.09.2011last for printing
Self eval report english final for printing, 28.09.2011last for printingSelf eval report english final for printing, 28.09.2011last for printing
Self eval report english final for printing, 28.09.2011last for printing
 
Бүрдүүлэх материал
Бүрдүүлэх материалБүрдүүлэх материал
Бүрдүүлэх материал
 
мэргэжлийн индекс
мэргэжлийн индексмэргэжлийн индекс
мэргэжлийн индекс
 
Health sciences university of mongolia
Health sciences university of mongoliaHealth sciences university of mongolia
Health sciences university of mongolia
 
Магистрын ганцаарчилсан сургалтын төлөвлөгөө
Магистрын ганцаарчилсан сургалтын төлөвлөгөөМагистрын ганцаарчилсан сургалтын төлөвлөгөө
Магистрын ганцаарчилсан сургалтын төлөвлөгөө
 
Germany summer school-2010
Germany summer school-2010Germany summer school-2010
Germany summer school-2010
 
Uni sannio courses
Uni sannio coursesUni sannio courses
Uni sannio courses
 
Germany summer school-2010
Germany summer school-2010Germany summer school-2010
Germany summer school-2010
 
Germany international semester
Germany international semesterGermany international semester
Germany international semester
 
Uni sannio courses_faculty_economics
Uni sannio courses_faculty_economicsUni sannio courses_faculty_economics
Uni sannio courses_faculty_economics
 
Ull ph dproposals
Ull ph dproposalsUll ph dproposals
Ull ph dproposals
 
Ull ph dproposals
Ull ph dproposalsUll ph dproposals
Ull ph dproposals
 
Su pd ph-dproposals
Su pd ph-dproposalsSu pd ph-dproposals
Su pd ph-dproposals
 
Sannio ph dproposals_v4
Sannio ph dproposals_v4Sannio ph dproposals_v4
Sannio ph dproposals_v4
 

Research method ch07 statistical methods 1

  • 1. 1 Research Methods in Health Chapter 7. Statistical Methods 1 Young Moon Chae, Ph.D. Graduate School of Public Health Yonsei University, Korea ymchae@yuhs.ac
  • 2. 2 Topics • What is Biostatistics? • Biostatistics in Public Health Research • Descriptive statistics • Inference statistics • Power of test • T-test
  • 3. 3 Concepts Biostatistics • Biostatistics is the development and application of statistics to research in health-related fields. Statistics • Common perceptions of statistics: numbers, tables, figures, polls, rates, etc. • These are “descriptions of the world” • Analysis of data
  • 4. 4 Biostatistics in Public Health Research Methodological research: • new statistical techniques • high speed of computing • geographical patterns of disease • clinical trials • longitudinal analysis • data analysis in epidemiology studies
  • 5. 5 Errors in Statistical methods • Research design -Improper control group in case-control design -Selection bias (sample does not represent study population) -Too small sample size • Statistical methods -Parametric statistics for small sample -T-test for the related sample -T-test or ANOVA for the samples that do not meet assumptions (normality, equal variances, independence) -T-test for multiple comparison -Regression for nominal dependent variable -Regression with multi-colinearity -Chi-square test with cell size less than 5
  • 7. 7 Descriptive vs. Inferential Statistics • The mean and standard deviation can be used in 2 ways. -One way is to describe the distribution of data -The other way is to infer something about a population (is the population mean 25? 20?). A statistical test! • Because the sampling distribution of the mean is normally distributed (Central Limit Theorem), we can use the normal to show how close the parameter is likely to be to the sample mean and to make decisions about treatments.
  • 8. 8 Descriptive Statistics • Descriptive Statistics -Mean, median, mode -Variance, standard deviation, range, Interquartile range, quartile range, -Skewness, Kurtosis • Frequency tables, Bar charts and pie charts, Histograms, Stem-and-Leaf display
  • 9. 9 Variables have distributions • A variable is something that changes or has different values (e.g., anger). • A distribution is a collection of measures, usually across people. • Distributions of numbers can be summarized with numbers (called statistics or parameters).
  • 10. 10 Central Tendency Central Tendency refers to the Middle of the Distribution
  • 11. 11 Middle of the Distribution Common Statistics • Mode -Most common score • Median -Top from bottom 50 percent • Mean -Arithmetic mean or average
  • 12. 12 Mode • The most frequently occurring score. Can have bimodal and multimodal distributions. Modal public health student is female.
  • 13. 13 Median • Score that separates top 50% from bottom 50% • Even number of scores, median is half way between two middle scores. -1 2 3 4 | 5 6 7 8 – Median is 4.5 • Odd number of scores, median is the middle number -1 2 3 4 5 6 7 – Median is 4
  • 14. 14 Mean • Sum of scores divided by the number of people. Population mean is (mu) and sample mean is • We calculate the sample mean by: • We calculate the population mean by: m X N X X å= N Xå=m
  • 15. 15 Comparison of statistics • Mode -Good for nominal variables -Good if you need to know most frequent observation -Quick and easy • Median -Good for “bad” (skewed) distributions -Often used with distributions of money • Mean -Used for inference as well as description; best estimator of the parameter -Based on all data in the distribution -Generally preferred except for “bad” distribution. -Most commonly used statistic for central tendency.
  • 17. 17 Distribution Shapes • Normal • Center • Spread • Shoulders • Skew
  • 23. 23 Inferential Statistics • Estimation: This includes point and interval estimation of certain characteristics in the population(s). • Testing Hypothesis about population parameter(s) based on the information contained in the sample(s).
  • 24. 24 Estimation of Parameters • Point Estimation • Interval Estimation (Confidence Intervals) • Bound on the error of estimation • The width of a confidence interval is directly related to the bound on the error.
  • 25. 25 Sampling Distribution • Sampling distribution is a distribution of a statistic (not raw data) over all possible samples. Same as distribution over infinite number of trials. • Notion of trials, experiments, replications • Coin toss example (5 flips, # heads) • Repeated estimation of the mean
  • 26. 26 Mean of Sampling Distribution • Statisticians have worked out properties of sampling distributions • Middle and spread of sampling distribution are known. • If mean of sampling distribution equals parameter, statistic is unbiased. (otherwise, it’s biased.) The sample mean is unbiased. • Best estimate of is . X Xm
  • 27. 27 SD of Sampling Distribution • The standard deviation of the sampling distribution is the standard error. For the mean, it indicates the average distance of the statistic from the parameter. 80787674727068666462605856545250 Heignt in Inches Raw Data Means (N=50) Standard Error Standard error of the mean.
  • 28. 28 Factors influencing the Bound on the error of estimation • Narrow confidence intervals are preferred • As the sample size increases, the bound on the error of estimation decreases. • As the confidence level increases the bound on the error of estimation increases. • You need to plan a sample size to achieve the desired level of error and confidence.
  • 29. 29 Decision Making Under Uncertainty • You have to make decisions even when you are unsure. School, marriage, therapy, jobs, whatever. • Statistics provides an approach to decision making under uncertainty. Sort of decision making by choosing the same way you would bet. Maximize expected utility (subjective value). • Comes from agronomy, where they were trying to decide what strain to plant.
  • 30. 30 Statistical Hypotheses • While attempting to make decisions, some necessary assumptions or guesses about the populations or statements about the probability distribution of the populations made are called statistical hypothesis. These assumptions are to be proved or disproved • A predictive statement usually put in the form of a null hypothesis and alternate hypothesis Hypothesis Testing • Researcher bets in advance of his experiment that the results will agree with his theory and cannot be accounted for by the chance variation involved in sampling • Procedures which enable researcher to decide whether to accept or reject hypothesis or whether observed samples differ significantly from expected results
  • 31. 31 Statistical Hypotheses • Statements about characteristics of populations, denoted H: -H: normal distribution, -H: N(28,13) • The hypothesis actually tested is called the null hypothesis, H0 -E.g., • The other hypothesis, assumed true if the null is false, is the alternative hypothesis, H1 -E.g., 13;28 == sm 100:0 =mH 100:1 ¹mH
  • 32. 32 Testing Statistical Hypotheses - steps • State the null and alternative hypotheses • Assume whatever is required to specify the sampling distribution of the statistic (e.g., SD, normal distribution, etc.) • Find rejection region of sampling distribution –that place which is not likely if null is true • Collect sample data. Find whether statistic falls inside or outside the rejection region. If statistic falls in the rejection region, result is said to be statistically significant.
  • 33. 33 The level of significance (a ) • a is known as the nominal level of significance. • If p-value < a, then we reject the null hypothesis in favor of the alternative hypothesis. • P-value is also known as the observed level of significance. • a needs to be pre-determined. (Usually 5%)
  • 34. 34 Type I and Type II errors • Type I error is committed when a true null hypothesis is rejected. • a is the probability of committing type I error. • Type II error is committed when a false null hypothesis is not rejected. • b is the probability of committing type II error.
  • 35. 35 Power of a test • The power of a test is the probability that a false null hypothesis is rejected. • Power = 1 - b, where b is the probability of committing type II error. • More powerful tests are preferred. At the design stage one should identify the desired level of power in the given situation.
  • 36. 36 Decisions Null true Null False Accept Null Right Beta (type II error) Reject Null Alpha (type I error) Correct rejection (power) Population Condition Sample Decision No fire Fire Alarm silent Right, but… Beta Alarm on Alpha Correct rejection Three named probabilities: Alpha, beta, and power.
  • 37. 37 Power of a test (1-β): • Value (1-β) indicates how well the test is working, i.e., value nearer to 1 means working well (test is rejecting Ho when it is not true) and value nearer to 0 means poorly working (not rejecting Ho when it is not true) • It indicates how well given test will enable us to minimize the probability of type II error (β), i.e., avoid making wrong decisions. Hypothesis testing cannot be foolproof. Sometimes test does not reject a Ho which is false (type II error). We would like β to be as small as possible or (1-β) to be as large as possible. • Operating Characteristic Function (L) L = 1 -H : Shows conditional probability of accepting Ho for all values of population parameters for a given sample size, whether or not the decision happens to be a correct one • OC curve -graphs showing the probabilities of type II error (β) under various hypotheses
  • 38. 38 Factors influencing the Power • The power of a test is influenced by the magnitude of the difference between the null hypothesis and the true parameter. • The power of a test could be improved by increasing the sample size. • The power of a test could be improved by increasing a. (this is a very artificial way)
  • 39. 39 One Tail or Two Tails The rejection region can fit into 1 or 2 tails of the sampling distribution of means. The RR is determined by the alternative hypothesis. Two Tails One Tail valueH =m:0 valueH a ¹m: valueH =m:0 orvalueH a >m: valueH a <m:
  • 40. 40 Tails illustrated valueH a ¹m: 3210-1-2-3 ZZZ 1.96-1.96 Don't reject RejectReject Likely Outcome If Null is True 3210-1-2-3 ZZ Don't reject Reject Likely Outcome If Null is True 1.65 Two tails. One tail. valueH a >m: Note 1.96 vs. 1.65
  • 41. 41 Example of 2 tails • Suppose: • Then: 75:;75:0 ¹= mm aHH 25,10 == Ns 92.7808.71 25 10 96.175 «=± X 78.9271.08 Don't reject RejectReject Likely Outcome If Null is True 75 Note 5 percent is split into two tails.
  • 42. 42 Example of 1 tail • Suppose: 75:;75:0 >= mm aHH 25,10 == Ns 3.78 25 10 65.175 =+ 8079787776757473727170 X 0.20 0.16 0.12 0.08 0.04 0.00 8079787776757473727170 X Sampling Distribution of Means Likely Outcome if Null is True Don't Reject Reject 78.3 Note all 5 percent is at the top tail.
  • 43. 43 Parametric or Standard Tests • Require measurements equivalent to at least an interval scale • Assume certain properties of parent population like -i) observations are from a normal population -ii) large random sample -iii) population parameters like mean, variance, etc. must hold good • Situations where above assumptions are not possible, non-parametric tests are used; As there is no model, these tests are also called distribution-free tests
  • 44. 44 Parametric tests Z-test • Based on the normal probability distribution and even binomial in case of large samples. • For testing mean, variance, two individual samples, median, mode, correlation, coefficients etc. T-test • It is based on t-distribution and only incase of small samples • Used for testing difference between means of two samples, coefficient of simple & partial correlations, etc.
  • 45. 45 (cont.) F-test • Used in the context of ANOVA and for the testing the significance of multiple correlation coefficients, comparing the variance of two independent samples, • Χ2 Test • Based on Chi-square distribution • Used for comparing a sample variance to a theoretical population variance
  • 49. 49 The t-test Inferences about Population Means
  • 50. 50 The t Distribution • We use t when the population variance is unknown (the usual case) and sample size is small (N<100, the usual case). • The t distribution is a short, fat relative of the normal. The shape of t depends on its df. As N becomes infinitely large, t becomes normal.
  • 51. 51 Assumptions • The t-test is based on assumptions of normality • Two groups are independent • Homogeneity of variance -> can be tested by using F-test. • As long as the samples in each group are large and nearly equal, the t-test is robust, that is, still good, even though assumptions are not met.
  • 52. 52 Normality Assumption • We assume normal distributions to figure sampling distributions and thus p levels. • Violations of normality have implications for testing means. Need to use non-parametric statistics or use data transformation • Can test for normality by using Kolmogrov-Simirnov test
  • 53. 53 The F Distribution (1) • The F distribution is the ratio of two variance estimates: • Also the ratio of two chi-squares, each divided by its degrees of freedom: 2 2 2 1 2 2 2 1 . . s s est est s s F == 2 2 ( 1 2 )( /) / 2 1 v v F v v c c = In our applications, v2 will be larger than v1 and v2 will be larger than 2. In such a case, the mean of the F distribution (expected value) is v2 /(v2 -2).
  • 54. 54 Testing Hypotheses about 2 Variances • Suppose -Note 1-tailed. • We find • Then df1=df2 = 15, and 2 2 2 11 2 2 2 10 :;: ssss >£ HH 7.1;16;8.5;16 2 22 2 11 ==== sNsN 41.3 7.1 8.5 2 2 2 1 === s s F Going to the F table with 15 and 15 df, we find that for alpha = .05 (1-tailed), the critical value is 2.40. Therefore the result is significant.
  • 55. 55 Application of F Distribution • The F distribution is used in many statistical tests -Test for equality of variances. -Tests for differences in means in ANOVA. -Tests for regression models (slopes relating one continuous variable to another like SAT and GPA).
  • 56. 56 Reference • Cohen, Louis and Manion, Lawrence. Research methods in education. London: Routledge, 1980. • Goode, William J and Hatt, Paul K. Methods on social research. London; Mc GrawHill, 1981. • 10.Gopal, M.H. An introduction to research procedures in social sciences. Bombay: Asia Publishing House, 1970. • Koosis, Donald J. Business statistics. New York: John Wiley,1972.