SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
What’s Significant?
 Hypothesis Testing, Effect Size, Confidence
      Intervals, & the p-Value Fallacy

Patrick B. Barlow, The University of Tennessee
On the Agenda…

•   Recap of causation
•   The basics of hypothesis testing
     – From research question to testable hypothesis
•   Effect size
     – What is it?
     – What can impact effect size?
•   Confidence Intervals
     – What are they?
     – How do you interpret?
     – What are the implications for interpreting statistical findings?
•   Statistical significance & p-values
     – What counts as “statistically significant”?
     – Weaknesses of the p-value
     – The p-value fallacy
•   Putting it all Together
Recap: Bradford Hill
Criteria

•   Strength of causal
    inference is affected
    by a number of
    different factors:
     – Strength of
       association
     – Consistency
     – Specificity
     – Temporal
       relationship
     – Biological gradient
     – Plausibility
     – Coherence
     – Experiment
       (reversibility)
     – Analogy
       (consideration of
       alternate
       explanations)
From research question to testable hypothesis
Statistical significance & p-values


THE BASICS OF HYPOTHESIS
TESTING
The Basics of Hypothesis Testing

In statistics, hypothesis testing forms the basis for the majority of
inferential statistical tests.
• Three basic components:
     –   Null hypothesis (H0)
     –   Alternative/research hypothesis (H1)
     –   Error
           •   Type I
           •   Type II


•   Was originally conceived as a way to minimize error over infinite trials
    rather than specify the absolute “truth” in a single scenario.
     –   Goodman equated hypothesis testing to, “a system of justice that is not concerned with
         which individual defendant is found guilty or innocent…but tries instead to control the
         overall number of incorrect verdicts.”
The Basics of Hypothesis Testing

    Null Hypothesis (H0)             Alternative Hypothesis (H1)
• Almost always the              •    The statement that you will
  statement that no                   be trying to “prove” by
  difference or relationship          conducting your inferential
  exists between the variables        statistics.
  of interest.                   •    It is almost always the
• Example: A study looking            statement that a difference
  at deep vein thrombosis             or relationship does exist
  (DVT) & the risk of                 between the variables of
  pulmonary embolism (PE)             interest.
   – The null hypothesis would
     be…                         •    What would be an alternative
   – “Having DVT does not             hypothesis for our example?
     increase one’s risk for           – “Having DVT increases the
     developing a PE.”                   risk of developing a PE.”
The Basics of Hypothesis Testing

The two most common errors we encounter in statistical testing are Type I
& Type II error. Both of these errors pose serious risks to the integrity of
your conclusions if ignored.

•    Type I error: falsely concluding a statistically significant relationship
     does exist when in fact it does not
      –     “Alpha”, “False positive”, “False alarm”, “Red-herring”, etc.
      –     Origin of the “p<.05” as statistically significant.


•    Type II error: failing to detect a statistically significant relationship
     when in fact one does exist
      –     “Beta”, “Miss”, “False negative”
      –     Statistical power & Type II error

          The probability for committing either error is interdependent, so the researcher/analyst
                      must consider which error would be more costly to their study.
Your Turn
                                       Questions
     Instructions               (for each research topic)

                              1. What is your research question?
                              2. What would you propose to use
                                 as a research design?
  In groups of 2-3, work      3. What would be the null
together to brainstorm at        hypothesis?
                              4. What are two possible
    least two research           alternative/research hypotheses
    questions/topics, &          that could be tested?
                              5. Considering the relationship
    answer each of the           between Type I & II error, which
   following questions:          would be more costly/serious to
                                 commit if conducting your
                                 particular study?



          Be prepared to discuss your answers!
What is it?
How do we interpret effect sizes?
How does effect size relate to issues of statistical power, sample
size, and error?


EFFECT SIZE
What is it?
Generally speaking, the effect
size represents the magnitude
or strength of the relationship
between two variables.

•   The proportion of variance
    in the DV explained by your
    IV.
      •   Example…
•   The difference in the mean
    on your DV among levels of
    your IV.
      •   Example…
•  The difference in proportion
   of patients with an outcome
   in the exposed vs. the
   unexposed groups of your
   IV.
Two types
1. Unstandardized Effect
    Sizes:
2. Standardized Effect Sizes:
How do we interpret
unstandardized                                     Average BMI Between Men & Women
effect sizes?                                      Following Physical Fitness Intervention
                                              29
Interpreted in the same                               28.5
metric as your variables.                     28
                                                                                           Mean
Example:
                                              27                                      difference = 3.0
                                                                                 26
                                              26
                                                                                           kg/m2
In a fitness study looking at                                25
differences between the
                                Average BMI
                                              25
sexes, men (M=26.0,                                                                               Men
SD=3.0) reported                              24                                                  Women
significantly higher average
BMI than women (M=23.0,                       23
                                                                                      23
SD=2.5), p = .02.
                                              22
What is the unstandardized
effect size?                                  21


                                              20
                                                      Pre Intervention   Post Intervention
Your Turn

In pairs, calculate & interpret (in sentence format) the unstandardized effect
size. Be ready to share your interpretations.
1.   Patients admitted to “academic” hospital clinics (M=.50, SD=.40) had
     lower average 90-day readmissions than patients seen by non-
     academic clinics (M=1.5, SD=.75), p = .02.
2.   A researcher looks at differences in number of side effects patients had
     on three difference drugs (A, B, and C). Comparison of Drug “A” to
     Drug “B” shows average side effects to be 4(SD=2.5) and
     7(SD=4.8), respectively, p=.04
3.   An article shows a difference in average number of COPD-related
     readmissions before (M=1.5, SD=2.0) and after (M=.05, SD=.90) a
     patient education intervention, p=.08.
4.   An article shows a difference in average number of COPD-related
     readmissions before (M=1.5, SD=2.0) and after (M=.05, SD=.90), and
     six months following a patient education intervention
     (M=0.80, SD=3.0), p =.12
How do we interpret standardized
         effect sizes?



Two of the most common standardized effect
sizes are Risk / Odds Ratios and Pearson r/R2
Interpreting ORs and RRs
• Odds/Risk ratio ABOVE 1.0 = Your exposure INCREASES
  risk of the event occurring
   – For OR/RRs between 1.00 and 1.99 the risk is increased by
       (OR – 1)%.
   – For OR/RRs 2.00 or higher, the risk is increased OR
     times, but you could also still use (OR – 1)%.
• Example:
   – Smoking is found to increase your odds of breast cancer by
     OR = 1.25. What is the increase in odds?
       • You are 25% more likely to have breast cancer if you are a smoker.
   – Smoking is found to increase your risk of developing lung
     cancer by RR = 4.8. What is the increase in risk?
       • You are 4.8 times more likely to develop lung cancer if you are a
         smoker vs. non-smoker.
Interpreting ORs and RRs
• Odds/Risk ratio BELOW 1.0 = Your exposure
  DECREASES risk of the event occurring
  – The risk is decreased by (1 – OR)%
  – Often called a PROTECTIVE effect

• Example:
  – Addition of the new guidelines for pacemaker/ICD
    interrogation produced an OR for device
    interrogation of OR = .30 versus the old
    guidelines. What is the reduction in odds?
     • (1 – OR) = (1 – .30) = 70% reduction in odds.
Your Turn

       Instructions                 Practice

Feel free to make up your    1.   OR = 3.00
own examples or just         2.   OR = .39
use, “Odds/Risk of           3.   RR = 1.50
having disease if you        4.   OR = 1.00
have the exposure of         5.   RR = .22
interest.”                   6.   RR = 18.99
                             7.   OR = .78
 What does the OR/RR         8.   RR = 6.30
say about the strength of
      relationship?
Interpreting r / R2

       Pearson r                         R2

• Provides the strength      • Literally calculated the
  of a linear relationship     square of an r statistic.
  between exactly two        • Also known as the
  continuous, quantitati       coefficient of
  ve variables.                determination
• Can vary between -1        • Provides the
  (perfect negative) to 1      proportion of shared
  (perfect positive)           variance between your
• Most correlational           IV and DV
  studies only report r         – What’s the range?
How do we interpret effect sizes?
How does effect size relate to issues of
 statistical power, sample size, and
                error?
Effect size vs. Statistical Power, sample
size, and error.
• As effect size increases , statistical
  power also increases . Which means
  that (1) you need a smaller sample
  size, and (2) have a lower chance of
  making a Type II error (i.e. a “miss”).

    So, when possible, measure for a large effect
                       size!
An OR/RR is only as
                                       important as the
                                      confidence interval
                                      that comes with it!




What are they?
How do you interpret?
How do they affect our conclusions?


CONFIDENCE
INTERVALS
What are they?
• Confidence intervals provide, as the name suggests, the confidence in a
  particular inferential statistic.
• Provide the range of values within which we are confident the true
  population parameter (e.g. mean, proportion, etc.) exists.
• Usually set at 95%
• They are calculated by using:
   • Standard error of measurement (Sm or SE)
   • Point estimate for your sample (e.g. t statistic)
   • Degrees of freedom for the sample
What are they? OR /
RR example

 95% Confidence intervals are added
 to any OR/RR calculation to provide
 an estimate on the accuracy of the
 estimation.
 • Size Matters!
      – Wide CI = weaker inference
      – Narrow CI = stronger
         inference
      – CI crosses over 1.0 = non-
         significant
 • Any 95% CI can instantly tell us:
      1. Sample size
      2. Accuracy of estimation
      3. Statistical significance
                                       1.0
Interpreting 95% Confidence
                        Intervals
95% CI of an Odds or Risk
                                         Your Turn
          Ratio

• What you read…               Interpret these 95% CIs
   – OR = 4.5 (95% CI =2.8 –   1.   OR 2.4 (95% CI 1.7 - 3.3)
     6.1)
• What you interpret…          2. OR 6.7 (95% CI 1.4 -
   – Lower bounds: OR = 2.8         107.2)
   – Upper bounds: OR = 6.1
• How you interpret…           3. OR 1.2 (95% CI .147 - 1.97)
   – “We are 95% confident     4. OR .37 (95% CI .22 - .56)
     that the true odds of
     disease for exposed vs.   5. OR .57 (95% CI .12 - .99)
     unexposed lies between
     2.8 and 6.1.”             6. OR .78 (95% CI .36 – 1.65)
What counts as “statistically significant”?
Weaknesses of the p-value
The p-value fallacy


STATISTICAL
SIGNIFICANCE
What counts as “Statistically
             significant?”
• To be considered statistically significant, the
  probability of obtaining a value of the test
  statistic (e.g. t, z, F, or χ2) must smaller than the
  probability for committing a Type I error.

• In other words, the probability (p) must be less
  than (<) what you have chosen for your alpha
  value (.05).
   – So, in most cases we conclude that a relationship if
     statistically significant if the test returns a p<.05.
Interpretation & Practice
• If a statistically significant relationship is
  found, then we conclude that observed
  relationship is too great to exist by chance
  alone.
• Which of the following are statistically
  significant results?
  1.   t(34)=5.89, p = .002
  2.   F(3, 285)=1.09, p = .101
  3.   χ2(4)=18.78, p = .04
  4.   t(68) = 4.25, p = .05
Weakness of p-values

• Not truly compatible with hypothesis testing
    – Absence of evidence vs. evidence of absence

• Never meant to be the sole indicator of significance
    – Average knowledge of statistical interpretation in evidence-based
      professions

• No consideration of effect size

• What influences p-values?
    –   Sample size
    –   Chance
    –   Effect size
    –   Statistical power
The “p-value fallacy”

P-values have become the “have your cake and eat
it too” of the statistical world.

• You get the supposed accuracy of a single study
  (short term) while being able to simultaneously
  avoid errors in the long term.

• Comes from misinterpretation of p-values as
  absolute indicators of the strength of a
  relationship. That is, seeing p = .03 as more
  significant than p = .04.
How to use multiple sources to become a better consumer of
Epidemiologic Evidence

PUTTING IT ALL TOGETHER
Going beyond the p-value

• Measures of effect size provides a far more vivid
  description of the magnitude of the relationship.
   – An OR of 4.30 is stronger than an OR of 1.50.
   – A mean difference of 35pts is larger than a mean
     difference of 20pts.
   – 65% of the variance is more than 20% of the variance

• The 95% CI provides far more information on the
  accuracy of the inference.
   – Which is more accurate?
      • OR = 2.5 (95% CI = 1.2 – 10.0) vs. OR = 2.5 (95% CI = 1.2 –
        3.1)
When reading an article…

Always consider:
1. What is the research question? Have the
   researchers used the correct null &
   alternative hypotheses?
2. How large is the…
  − Sample? Subgroup? Etc.
  − Effect size? (standardized or unstandardized)
  − Confidence interval?
3. Finally, what is the p-value?
Just because a finding is not
 significant does not mean
  that it is not meaningful.
You should always consider
the effect size and context of
the research when making a
 decision about whether or
 not any finding is clinically
            relevant.

Weitere ähnliche Inhalte

Was ist angesagt?

Confidence intervals
Confidence intervalsConfidence intervals
Confidence intervalsTanay Tandon
 
Lecture2 hypothesis testing
Lecture2 hypothesis testingLecture2 hypothesis testing
Lecture2 hypothesis testingo_devinyak
 
Introduction to t-tests (statistics)
Introduction to t-tests (statistics)Introduction to t-tests (statistics)
Introduction to t-tests (statistics)Dr Bryan Mills
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testingrishi.indian
 
Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)Harve Abella
 
Sample Size Estimation
Sample Size EstimationSample Size Estimation
Sample Size EstimationNayyar Kazmi
 
When to use, What Statistical Test for data Analysis modified.pptx
When to use, What Statistical Test for data Analysis modified.pptxWhen to use, What Statistical Test for data Analysis modified.pptx
When to use, What Statistical Test for data Analysis modified.pptxAsokan R
 
Concept of Inferential statistics
Concept of Inferential statisticsConcept of Inferential statistics
Concept of Inferential statisticsSarfraz Ahmad
 
Statistical analysis and interpretation
Statistical analysis and interpretationStatistical analysis and interpretation
Statistical analysis and interpretationDave Marcial
 
Confidence Intervals: Basic concepts and overview
Confidence Intervals: Basic concepts and overviewConfidence Intervals: Basic concepts and overview
Confidence Intervals: Basic concepts and overviewRizwan S A
 
Sample size determination
Sample size determinationSample size determination
Sample size determinationGopal Kumar
 
Research method ch07 statistical methods 1
Research method ch07 statistical methods 1Research method ch07 statistical methods 1
Research method ch07 statistical methods 1naranbatn
 
INFERENTIAL STATISTICS: AN INTRODUCTION
INFERENTIAL STATISTICS: AN INTRODUCTIONINFERENTIAL STATISTICS: AN INTRODUCTION
INFERENTIAL STATISTICS: AN INTRODUCTIONJohn Labrador
 
hypothesis testing-tests of proportions and variances in six sigma
hypothesis testing-tests of proportions and variances in six sigmahypothesis testing-tests of proportions and variances in six sigma
hypothesis testing-tests of proportions and variances in six sigmavdheerajk
 

Was ist angesagt? (20)

Confidence intervals
Confidence intervalsConfidence intervals
Confidence intervals
 
Lecture2 hypothesis testing
Lecture2 hypothesis testingLecture2 hypothesis testing
Lecture2 hypothesis testing
 
Effect Size
Effect SizeEffect Size
Effect Size
 
Introduction to t-tests (statistics)
Introduction to t-tests (statistics)Introduction to t-tests (statistics)
Introduction to t-tests (statistics)
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
 
Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
 
Sample Size Estimation
Sample Size EstimationSample Size Estimation
Sample Size Estimation
 
When to use, What Statistical Test for data Analysis modified.pptx
When to use, What Statistical Test for data Analysis modified.pptxWhen to use, What Statistical Test for data Analysis modified.pptx
When to use, What Statistical Test for data Analysis modified.pptx
 
Sample Size Determination
Sample Size DeterminationSample Size Determination
Sample Size Determination
 
Concept of Inferential statistics
Concept of Inferential statisticsConcept of Inferential statistics
Concept of Inferential statistics
 
Statistical analysis and interpretation
Statistical analysis and interpretationStatistical analysis and interpretation
Statistical analysis and interpretation
 
Confidence Intervals: Basic concepts and overview
Confidence Intervals: Basic concepts and overviewConfidence Intervals: Basic concepts and overview
Confidence Intervals: Basic concepts and overview
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statistics
 
Sample size determination
Sample size determinationSample size determination
Sample size determination
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Research method ch07 statistical methods 1
Research method ch07 statistical methods 1Research method ch07 statistical methods 1
Research method ch07 statistical methods 1
 
INFERENTIAL STATISTICS: AN INTRODUCTION
INFERENTIAL STATISTICS: AN INTRODUCTIONINFERENTIAL STATISTICS: AN INTRODUCTION
INFERENTIAL STATISTICS: AN INTRODUCTION
 
hypothesis testing-tests of proportions and variances in six sigma
hypothesis testing-tests of proportions and variances in six sigmahypothesis testing-tests of proportions and variances in six sigma
hypothesis testing-tests of proportions and variances in six sigma
 
Two sample t-test
Two sample t-testTwo sample t-test
Two sample t-test
 

Ähnlich wie What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & the p-Value Fallacy

Dr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptx
Dr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptxDr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptx
Dr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptxPriyankaSharma89719
 
Commonly Used Statistics in Medical Research Part I
Commonly Used Statistics in Medical Research Part ICommonly Used Statistics in Medical Research Part I
Commonly Used Statistics in Medical Research Part IPat Barlow
 
Sample size
Sample sizeSample size
Sample sizezubis
 
P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...David Pratap
 
STATISTICS : Changing the way we do: Hypothesis testing, effect size, power, ...
STATISTICS : Changing the way we do: Hypothesis testing, effect size, power, ...STATISTICS : Changing the way we do: Hypothesis testing, effect size, power, ...
STATISTICS : Changing the way we do: Hypothesis testing, effect size, power, ...Musfera Nara Vadia
 
Sample Size Estimation and Statistical Test Selection
Sample Size Estimation  and Statistical Test SelectionSample Size Estimation  and Statistical Test Selection
Sample Size Estimation and Statistical Test SelectionVaggelis Vergoulas
 
Ezz eazy biostatistics for crash course
Ezz eazy biostatistics for crash courseEzz eazy biostatistics for crash course
Ezz eazy biostatistics for crash courseBasalama Ali
 
Analytic Methods and Issues in CER from Observational Data
Analytic Methods and Issues in CER from Observational DataAnalytic Methods and Issues in CER from Observational Data
Analytic Methods and Issues in CER from Observational DataCTSI at UCSF
 
Baker esni handouts slides
Baker esni handouts slidesBaker esni handouts slides
Baker esni handouts slidesBartsMSBlog
 
Baker esni handouts reading papers
Baker esni handouts reading papersBaker esni handouts reading papers
Baker esni handouts reading papersBartsMSBlog
 
Aron chpt 7 ed effect size
Aron chpt 7 ed effect sizeAron chpt 7 ed effect size
Aron chpt 7 ed effect sizeKaren Price
 
Parametric tests seminar
Parametric tests seminarParametric tests seminar
Parametric tests seminardrdeepika87
 
009906275.pdf
009906275.pdf009906275.pdf
009906275.pdfEidTahir
 
Tutorial Solution Week9
Tutorial Solution Week9Tutorial Solution Week9
Tutorial Solution Week9Laura Arrigo
 
1. complete stats notes
1. complete stats notes1. complete stats notes
1. complete stats notesBob Smullen
 
Aron chpt 7 ed effect size f2011
Aron chpt 7 ed effect size f2011Aron chpt 7 ed effect size f2011
Aron chpt 7 ed effect size f2011Sandra Nicks
 

Ähnlich wie What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & the p-Value Fallacy (20)

Dr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptx
Dr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptxDr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptx
Dr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptx
 
Commonly Used Statistics in Medical Research Part I
Commonly Used Statistics in Medical Research Part ICommonly Used Statistics in Medical Research Part I
Commonly Used Statistics in Medical Research Part I
 
Sample size
Sample sizeSample size
Sample size
 
Research by MAGIC
Research by MAGICResearch by MAGIC
Research by MAGIC
 
P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...
 
STATISTICS : Changing the way we do: Hypothesis testing, effect size, power, ...
STATISTICS : Changing the way we do: Hypothesis testing, effect size, power, ...STATISTICS : Changing the way we do: Hypothesis testing, effect size, power, ...
STATISTICS : Changing the way we do: Hypothesis testing, effect size, power, ...
 
Meta analysis with R
Meta analysis with RMeta analysis with R
Meta analysis with R
 
Sample Size Estimation and Statistical Test Selection
Sample Size Estimation  and Statistical Test SelectionSample Size Estimation  and Statistical Test Selection
Sample Size Estimation and Statistical Test Selection
 
Ezz eazy biostatistics for crash course
Ezz eazy biostatistics for crash courseEzz eazy biostatistics for crash course
Ezz eazy biostatistics for crash course
 
Analytic Methods and Issues in CER from Observational Data
Analytic Methods and Issues in CER from Observational DataAnalytic Methods and Issues in CER from Observational Data
Analytic Methods and Issues in CER from Observational Data
 
Seawell_Exam
Seawell_ExamSeawell_Exam
Seawell_Exam
 
Baker esni handouts slides
Baker esni handouts slidesBaker esni handouts slides
Baker esni handouts slides
 
Baker esni handouts reading papers
Baker esni handouts reading papersBaker esni handouts reading papers
Baker esni handouts reading papers
 
Aron chpt 7 ed effect size
Aron chpt 7 ed effect sizeAron chpt 7 ed effect size
Aron chpt 7 ed effect size
 
Parametric tests seminar
Parametric tests seminarParametric tests seminar
Parametric tests seminar
 
Bill howe 5_statistics
Bill howe 5_statisticsBill howe 5_statistics
Bill howe 5_statistics
 
009906275.pdf
009906275.pdf009906275.pdf
009906275.pdf
 
Tutorial Solution Week9
Tutorial Solution Week9Tutorial Solution Week9
Tutorial Solution Week9
 
1. complete stats notes
1. complete stats notes1. complete stats notes
1. complete stats notes
 
Aron chpt 7 ed effect size f2011
Aron chpt 7 ed effect size f2011Aron chpt 7 ed effect size f2011
Aron chpt 7 ed effect size f2011
 

Mehr von Pat Barlow

Fundamentals of measurement
Fundamentals of measurementFundamentals of measurement
Fundamentals of measurementPat Barlow
 
The Development of the Biostatistics & Clinical Epideimiolgy Skills (BACES) A...
The Development of the Biostatistics & Clinical Epideimiolgy Skills (BACES) A...The Development of the Biostatistics & Clinical Epideimiolgy Skills (BACES) A...
The Development of the Biostatistics & Clinical Epideimiolgy Skills (BACES) A...Pat Barlow
 
Maximizing Benefit: Five Strategies for Getting the Most from Your Survey Ass...
Maximizing Benefit: Five Strategies for Getting the Most from Your Survey Ass...Maximizing Benefit: Five Strategies for Getting the Most from Your Survey Ass...
Maximizing Benefit: Five Strategies for Getting the Most from Your Survey Ass...Pat Barlow
 
Brief Look at Association vs causation
Brief Look at Association vs causationBrief Look at Association vs causation
Brief Look at Association vs causationPat Barlow
 
New Benchmark 500 Uploads!
New Benchmark 500 Uploads!New Benchmark 500 Uploads!
New Benchmark 500 Uploads!Pat Barlow
 
REVISED 5-14: Curriculum vitae Barlow
REVISED 5-14: Curriculum vitae BarlowREVISED 5-14: Curriculum vitae Barlow
REVISED 5-14: Curriculum vitae BarlowPat Barlow
 
Common measures of association in medical research (UPDATED) 2013
Common measures of association in medical research (UPDATED) 2013Common measures of association in medical research (UPDATED) 2013
Common measures of association in medical research (UPDATED) 2013Pat Barlow
 
Comparing research designs fw 2013 handout version
Comparing research designs fw 2013 handout versionComparing research designs fw 2013 handout version
Comparing research designs fw 2013 handout versionPat Barlow
 
Learning by doing aalhe presentation handout
Learning by doing aalhe presentation handoutLearning by doing aalhe presentation handout
Learning by doing aalhe presentation handoutPat Barlow
 
Common measures of association in medical research handout
Common measures of association in medical research handoutCommon measures of association in medical research handout
Common measures of association in medical research handoutPat Barlow
 
Commonly used Statistics in Medical Research Handout
Commonly used Statistics in Medical Research HandoutCommonly used Statistics in Medical Research Handout
Commonly used Statistics in Medical Research HandoutPat Barlow
 
Commonly Used Statistics in Survey Research
Commonly Used Statistics in Survey ResearchCommonly Used Statistics in Survey Research
Commonly Used Statistics in Survey ResearchPat Barlow
 
Retrospective application of systems thinking and isomorphism to a complex mu...
Retrospective application of systems thinking and isomorphism to a complex mu...Retrospective application of systems thinking and isomorphism to a complex mu...
Retrospective application of systems thinking and isomorphism to a complex mu...Pat Barlow
 
Methods for developing assessment instruments to generate useful data in t…
Methods for developing assessment instruments to generate useful data in t…Methods for developing assessment instruments to generate useful data in t…
Methods for developing assessment instruments to generate useful data in t…Pat Barlow
 
Application of assessment and evaluation data to improve a dynamic graduate m...
Application of assessment and evaluation data to improve a dynamic graduate m...Application of assessment and evaluation data to improve a dynamic graduate m...
Application of assessment and evaluation data to improve a dynamic graduate m...Pat Barlow
 
Comparing Research Designs
Comparing Research DesignsComparing Research Designs
Comparing Research DesignsPat Barlow
 

Mehr von Pat Barlow (16)

Fundamentals of measurement
Fundamentals of measurementFundamentals of measurement
Fundamentals of measurement
 
The Development of the Biostatistics & Clinical Epideimiolgy Skills (BACES) A...
The Development of the Biostatistics & Clinical Epideimiolgy Skills (BACES) A...The Development of the Biostatistics & Clinical Epideimiolgy Skills (BACES) A...
The Development of the Biostatistics & Clinical Epideimiolgy Skills (BACES) A...
 
Maximizing Benefit: Five Strategies for Getting the Most from Your Survey Ass...
Maximizing Benefit: Five Strategies for Getting the Most from Your Survey Ass...Maximizing Benefit: Five Strategies for Getting the Most from Your Survey Ass...
Maximizing Benefit: Five Strategies for Getting the Most from Your Survey Ass...
 
Brief Look at Association vs causation
Brief Look at Association vs causationBrief Look at Association vs causation
Brief Look at Association vs causation
 
New Benchmark 500 Uploads!
New Benchmark 500 Uploads!New Benchmark 500 Uploads!
New Benchmark 500 Uploads!
 
REVISED 5-14: Curriculum vitae Barlow
REVISED 5-14: Curriculum vitae BarlowREVISED 5-14: Curriculum vitae Barlow
REVISED 5-14: Curriculum vitae Barlow
 
Common measures of association in medical research (UPDATED) 2013
Common measures of association in medical research (UPDATED) 2013Common measures of association in medical research (UPDATED) 2013
Common measures of association in medical research (UPDATED) 2013
 
Comparing research designs fw 2013 handout version
Comparing research designs fw 2013 handout versionComparing research designs fw 2013 handout version
Comparing research designs fw 2013 handout version
 
Learning by doing aalhe presentation handout
Learning by doing aalhe presentation handoutLearning by doing aalhe presentation handout
Learning by doing aalhe presentation handout
 
Common measures of association in medical research handout
Common measures of association in medical research handoutCommon measures of association in medical research handout
Common measures of association in medical research handout
 
Commonly used Statistics in Medical Research Handout
Commonly used Statistics in Medical Research HandoutCommonly used Statistics in Medical Research Handout
Commonly used Statistics in Medical Research Handout
 
Commonly Used Statistics in Survey Research
Commonly Used Statistics in Survey ResearchCommonly Used Statistics in Survey Research
Commonly Used Statistics in Survey Research
 
Retrospective application of systems thinking and isomorphism to a complex mu...
Retrospective application of systems thinking and isomorphism to a complex mu...Retrospective application of systems thinking and isomorphism to a complex mu...
Retrospective application of systems thinking and isomorphism to a complex mu...
 
Methods for developing assessment instruments to generate useful data in t…
Methods for developing assessment instruments to generate useful data in t…Methods for developing assessment instruments to generate useful data in t…
Methods for developing assessment instruments to generate useful data in t…
 
Application of assessment and evaluation data to improve a dynamic graduate m...
Application of assessment and evaluation data to improve a dynamic graduate m...Application of assessment and evaluation data to improve a dynamic graduate m...
Application of assessment and evaluation data to improve a dynamic graduate m...
 
Comparing Research Designs
Comparing Research DesignsComparing Research Designs
Comparing Research Designs
 

What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & the p-Value Fallacy

  • 1. What’s Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & the p-Value Fallacy Patrick B. Barlow, The University of Tennessee
  • 2. On the Agenda… • Recap of causation • The basics of hypothesis testing – From research question to testable hypothesis • Effect size – What is it? – What can impact effect size? • Confidence Intervals – What are they? – How do you interpret? – What are the implications for interpreting statistical findings? • Statistical significance & p-values – What counts as “statistically significant”? – Weaknesses of the p-value – The p-value fallacy • Putting it all Together
  • 3. Recap: Bradford Hill Criteria • Strength of causal inference is affected by a number of different factors: – Strength of association – Consistency – Specificity – Temporal relationship – Biological gradient – Plausibility – Coherence – Experiment (reversibility) – Analogy (consideration of alternate explanations)
  • 4. From research question to testable hypothesis Statistical significance & p-values THE BASICS OF HYPOTHESIS TESTING
  • 5. The Basics of Hypothesis Testing In statistics, hypothesis testing forms the basis for the majority of inferential statistical tests. • Three basic components: – Null hypothesis (H0) – Alternative/research hypothesis (H1) – Error • Type I • Type II • Was originally conceived as a way to minimize error over infinite trials rather than specify the absolute “truth” in a single scenario. – Goodman equated hypothesis testing to, “a system of justice that is not concerned with which individual defendant is found guilty or innocent…but tries instead to control the overall number of incorrect verdicts.”
  • 6. The Basics of Hypothesis Testing Null Hypothesis (H0) Alternative Hypothesis (H1) • Almost always the • The statement that you will statement that no be trying to “prove” by difference or relationship conducting your inferential exists between the variables statistics. of interest. • It is almost always the • Example: A study looking statement that a difference at deep vein thrombosis or relationship does exist (DVT) & the risk of between the variables of pulmonary embolism (PE) interest. – The null hypothesis would be… • What would be an alternative – “Having DVT does not hypothesis for our example? increase one’s risk for – “Having DVT increases the developing a PE.” risk of developing a PE.”
  • 7. The Basics of Hypothesis Testing The two most common errors we encounter in statistical testing are Type I & Type II error. Both of these errors pose serious risks to the integrity of your conclusions if ignored. • Type I error: falsely concluding a statistically significant relationship does exist when in fact it does not – “Alpha”, “False positive”, “False alarm”, “Red-herring”, etc. – Origin of the “p<.05” as statistically significant. • Type II error: failing to detect a statistically significant relationship when in fact one does exist – “Beta”, “Miss”, “False negative” – Statistical power & Type II error The probability for committing either error is interdependent, so the researcher/analyst must consider which error would be more costly to their study.
  • 8. Your Turn Questions Instructions (for each research topic) 1. What is your research question? 2. What would you propose to use as a research design? In groups of 2-3, work 3. What would be the null together to brainstorm at hypothesis? 4. What are two possible least two research alternative/research hypotheses questions/topics, & that could be tested? 5. Considering the relationship answer each of the between Type I & II error, which following questions: would be more costly/serious to commit if conducting your particular study? Be prepared to discuss your answers!
  • 9. What is it? How do we interpret effect sizes? How does effect size relate to issues of statistical power, sample size, and error? EFFECT SIZE
  • 10. What is it? Generally speaking, the effect size represents the magnitude or strength of the relationship between two variables. • The proportion of variance in the DV explained by your IV. • Example… • The difference in the mean on your DV among levels of your IV. • Example… • The difference in proportion of patients with an outcome in the exposed vs. the unexposed groups of your IV. Two types 1. Unstandardized Effect Sizes: 2. Standardized Effect Sizes:
  • 11. How do we interpret unstandardized Average BMI Between Men & Women effect sizes? Following Physical Fitness Intervention 29 Interpreted in the same 28.5 metric as your variables. 28 Mean Example: 27 difference = 3.0 26 26 kg/m2 In a fitness study looking at 25 differences between the Average BMI 25 sexes, men (M=26.0, Men SD=3.0) reported 24 Women significantly higher average BMI than women (M=23.0, 23 23 SD=2.5), p = .02. 22 What is the unstandardized effect size? 21 20 Pre Intervention Post Intervention
  • 12. Your Turn In pairs, calculate & interpret (in sentence format) the unstandardized effect size. Be ready to share your interpretations. 1. Patients admitted to “academic” hospital clinics (M=.50, SD=.40) had lower average 90-day readmissions than patients seen by non- academic clinics (M=1.5, SD=.75), p = .02. 2. A researcher looks at differences in number of side effects patients had on three difference drugs (A, B, and C). Comparison of Drug “A” to Drug “B” shows average side effects to be 4(SD=2.5) and 7(SD=4.8), respectively, p=.04 3. An article shows a difference in average number of COPD-related readmissions before (M=1.5, SD=2.0) and after (M=.05, SD=.90) a patient education intervention, p=.08. 4. An article shows a difference in average number of COPD-related readmissions before (M=1.5, SD=2.0) and after (M=.05, SD=.90), and six months following a patient education intervention (M=0.80, SD=3.0), p =.12
  • 13. How do we interpret standardized effect sizes? Two of the most common standardized effect sizes are Risk / Odds Ratios and Pearson r/R2
  • 14. Interpreting ORs and RRs • Odds/Risk ratio ABOVE 1.0 = Your exposure INCREASES risk of the event occurring – For OR/RRs between 1.00 and 1.99 the risk is increased by (OR – 1)%. – For OR/RRs 2.00 or higher, the risk is increased OR times, but you could also still use (OR – 1)%. • Example: – Smoking is found to increase your odds of breast cancer by OR = 1.25. What is the increase in odds? • You are 25% more likely to have breast cancer if you are a smoker. – Smoking is found to increase your risk of developing lung cancer by RR = 4.8. What is the increase in risk? • You are 4.8 times more likely to develop lung cancer if you are a smoker vs. non-smoker.
  • 15. Interpreting ORs and RRs • Odds/Risk ratio BELOW 1.0 = Your exposure DECREASES risk of the event occurring – The risk is decreased by (1 – OR)% – Often called a PROTECTIVE effect • Example: – Addition of the new guidelines for pacemaker/ICD interrogation produced an OR for device interrogation of OR = .30 versus the old guidelines. What is the reduction in odds? • (1 – OR) = (1 – .30) = 70% reduction in odds.
  • 16. Your Turn Instructions Practice Feel free to make up your 1. OR = 3.00 own examples or just 2. OR = .39 use, “Odds/Risk of 3. RR = 1.50 having disease if you 4. OR = 1.00 have the exposure of 5. RR = .22 interest.” 6. RR = 18.99 7. OR = .78 What does the OR/RR 8. RR = 6.30 say about the strength of relationship?
  • 17. Interpreting r / R2 Pearson r R2 • Provides the strength • Literally calculated the of a linear relationship square of an r statistic. between exactly two • Also known as the continuous, quantitati coefficient of ve variables. determination • Can vary between -1 • Provides the (perfect negative) to 1 proportion of shared (perfect positive) variance between your • Most correlational IV and DV studies only report r – What’s the range?
  • 18. How do we interpret effect sizes?
  • 19. How does effect size relate to issues of statistical power, sample size, and error? Effect size vs. Statistical Power, sample size, and error. • As effect size increases , statistical power also increases . Which means that (1) you need a smaller sample size, and (2) have a lower chance of making a Type II error (i.e. a “miss”). So, when possible, measure for a large effect size!
  • 20. An OR/RR is only as important as the confidence interval that comes with it! What are they? How do you interpret? How do they affect our conclusions? CONFIDENCE INTERVALS
  • 21. What are they? • Confidence intervals provide, as the name suggests, the confidence in a particular inferential statistic. • Provide the range of values within which we are confident the true population parameter (e.g. mean, proportion, etc.) exists. • Usually set at 95% • They are calculated by using: • Standard error of measurement (Sm or SE) • Point estimate for your sample (e.g. t statistic) • Degrees of freedom for the sample
  • 22. What are they? OR / RR example 95% Confidence intervals are added to any OR/RR calculation to provide an estimate on the accuracy of the estimation. • Size Matters! – Wide CI = weaker inference – Narrow CI = stronger inference – CI crosses over 1.0 = non- significant • Any 95% CI can instantly tell us: 1. Sample size 2. Accuracy of estimation 3. Statistical significance 1.0
  • 23. Interpreting 95% Confidence Intervals 95% CI of an Odds or Risk Your Turn Ratio • What you read… Interpret these 95% CIs – OR = 4.5 (95% CI =2.8 – 1. OR 2.4 (95% CI 1.7 - 3.3) 6.1) • What you interpret… 2. OR 6.7 (95% CI 1.4 - – Lower bounds: OR = 2.8 107.2) – Upper bounds: OR = 6.1 • How you interpret… 3. OR 1.2 (95% CI .147 - 1.97) – “We are 95% confident 4. OR .37 (95% CI .22 - .56) that the true odds of disease for exposed vs. 5. OR .57 (95% CI .12 - .99) unexposed lies between 2.8 and 6.1.” 6. OR .78 (95% CI .36 – 1.65)
  • 24. What counts as “statistically significant”? Weaknesses of the p-value The p-value fallacy STATISTICAL SIGNIFICANCE
  • 25. What counts as “Statistically significant?” • To be considered statistically significant, the probability of obtaining a value of the test statistic (e.g. t, z, F, or χ2) must smaller than the probability for committing a Type I error. • In other words, the probability (p) must be less than (<) what you have chosen for your alpha value (.05). – So, in most cases we conclude that a relationship if statistically significant if the test returns a p<.05.
  • 26. Interpretation & Practice • If a statistically significant relationship is found, then we conclude that observed relationship is too great to exist by chance alone. • Which of the following are statistically significant results? 1. t(34)=5.89, p = .002 2. F(3, 285)=1.09, p = .101 3. χ2(4)=18.78, p = .04 4. t(68) = 4.25, p = .05
  • 27. Weakness of p-values • Not truly compatible with hypothesis testing – Absence of evidence vs. evidence of absence • Never meant to be the sole indicator of significance – Average knowledge of statistical interpretation in evidence-based professions • No consideration of effect size • What influences p-values? – Sample size – Chance – Effect size – Statistical power
  • 28. The “p-value fallacy” P-values have become the “have your cake and eat it too” of the statistical world. • You get the supposed accuracy of a single study (short term) while being able to simultaneously avoid errors in the long term. • Comes from misinterpretation of p-values as absolute indicators of the strength of a relationship. That is, seeing p = .03 as more significant than p = .04.
  • 29. How to use multiple sources to become a better consumer of Epidemiologic Evidence PUTTING IT ALL TOGETHER
  • 30. Going beyond the p-value • Measures of effect size provides a far more vivid description of the magnitude of the relationship. – An OR of 4.30 is stronger than an OR of 1.50. – A mean difference of 35pts is larger than a mean difference of 20pts. – 65% of the variance is more than 20% of the variance • The 95% CI provides far more information on the accuracy of the inference. – Which is more accurate? • OR = 2.5 (95% CI = 1.2 – 10.0) vs. OR = 2.5 (95% CI = 1.2 – 3.1)
  • 31. When reading an article… Always consider: 1. What is the research question? Have the researchers used the correct null & alternative hypotheses? 2. How large is the… − Sample? Subgroup? Etc. − Effect size? (standardized or unstandardized) − Confidence interval? 3. Finally, what is the p-value?
  • 32. Just because a finding is not significant does not mean that it is not meaningful. You should always consider the effect size and context of the research when making a decision about whether or not any finding is clinically relevant.

Hinweis der Redaktion

  1. Alternatively, the second example could be interpreted as: “Smoking increases your risk of lung cancer by 380% vs. non-smoking”
  2. Insert literature examples.