SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Downloaden Sie, um offline zu lesen
Basic Biostatistics in Medical Research

      Lecture 1: Basic Concepts

                Leah J. Welty, PhD
    Biostatistics Collaboration Center (BCC)
      Department of Preventive Medicine
        NU Feinberg School of Medicine
Objectives
• Assist participants in interpreting statistics
  published in medical literature

• Highlight different statistical methodology
  for investigators conducting own research

• Facilitate communication between medical
  investigators and biostatisticians
10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   2
Lecture 1: Basic Concepts

       Data Types
   Summary Statistics
  One Sample Inference
Data Types

• Categorical (Qualitative)
     – subjects sorted into categories
            • diabetic/non-diabetic
            • blood type: A/B/AB/O


• Numerical (Quantitative)
     – measurements or counts
            • weight in lbs
            • # of children
10/1/2009    Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   4
Categorical Data

• Nominal
     – categories unordered
            • blood type: A/B/AB/O
     – called binary or dichotomous if 2 categories
            • gender: M/F


• Ordinal
     – categories ordered
            • cancer stage I, II, III, IV
            • non-smoker/ex-smoker/smoker
10/1/2009    Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   5
Numerical Data
• Discrete
     – limited # possible values
            • # of children: 0, 1, 2, …


• Continuous
     – range of values
            • weight in lbs: >0



10/1/2009    Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   6
Determining Data Types
• Ordinal (Categorical) vs Discrete (Numerical)
• Ordinal
   – Cancer Stage I, II, III, IV
   – Stage II ≠ 2 times Stage I
   – Categories could also be A, B, C, D
• Discrete
   – # of children: 0, 1, 2, …
   – 4 children = 2 times 2 children
10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   7
Determining Data Types


• Type of data determines which types of
  analyses are appropriate

• Continuous vs. Categorical




10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   8
Describing Data
• General summary
     – condense information to manageable form
     – numerically
            • ‘center’ of data values
            • spread of data; variability
     – graphically/visually


• What analyses are / are not appropriate

10/1/2009    Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   9
Summarizing Categorical Variables
• Compute #, fraction, or % in each category

• Display in table

                          SubjNo                 Smoker
                            1                    Never
                            2                    Current
                            3                    Never
                            …
                           2738                  Former
                           2739                  Former


10/1/2009   Biostatistics Collaboration Center (BCC)       Lecture 1: Basic Concepts   10
Summarizing Categorical Variables

                        SubjNo                  Smoker
                          1                     Never
                          2                     Current
                          …
                         2739                   Former




                     Never           Former            Current         Total
Smoking Status 621 (23%)              776 (28%)        1342 (49%)       2739




10/1/2009   Biostatistics Collaboration Center (BCC)        Lecture 1: Basic Concepts   11
Summarizing Numerical Variables

• Numeric Summaries
     – center of data
            • mean
            • median
     – spread/variability
            • standard deviation
            • quartiles; inner quartile range
• Graphical Summaries
     – histogram, boxplot
10/1/2009    Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   12
Numeric Summaries: Center
• Mean
     – average value
     – not robust to outlying values


• Median
     – 50th percentile or quantile
     – the data point “in the middle”
     – ½ observations below, ½ observations above

10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   13
Numeric Summaries: Center
•     Hospital stays: 3, 5, 7, 8, 8, 9, 10, 12, 35

•     Mean
     (3 + 5 + 7 + … + 35) / 9 = 10.78 days
     Only two patients stay > 10.78 days!
•     Median
     3 5 7 8 8 9 10 12 35
     Same even if longest stay 100 days!
10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   14
Numeric Summaries: Spread
• Standard deviation (s, sd)
     – reported with the mean
     – based on average distance of observations from mean
        sd = √sum of (obs – mean)2 / (n-1)

        Hospital stays: 3, 5, 7, 8, 8, 9, 10, 12, 35
                       sd = 9.46 days

     – not robust (sd = 30.9 if longest stay 100 days)
     – sometimes ~95% of obs w/in 2 sd of mean
10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   15
Numeric Summaries: Spread
• Inner Quartile Range (IQR), Quartiles
     – reported with median (+min, max, quartiles)
     – IQR = distance between 75th & 25th %iles

            3    5      7       8      8       9       10   12       35

     – 10 – 7 = 3 days
     – range of middle 50% of data
     – same even if longest stay 100 days!
10/1/2009   Biostatistics Collaboration Center (BCC)        Lecture 1: Basic Concepts   16
Numeric Summaries


   Center                        Spread
                                                               Common &
   Mean                         Std dev (sd)                   Mathematically
                                                               Convenient

   Median                        IQR                            Robust to
                                                                Outliers




• How to choose which to use?
10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   17
Graphical Summaries
• Histogram
     – divide data into intervals
     – compute # or fraction of observations in each interval
     – good for many observations
                                                         20


                                                         15
                                             Frequency
    Hypothetical cholesterol
    levels for n = 100                                   10
    subjects
                                                         5


                                                         0
                                                              160     180     200    220   240
                                                                    Cholesterol (mg/100mL)
10/1/2009   Biostatistics Collaboration Center (BCC)                  Lecture 1: Basic Concepts   18
Graphical Summaries: Histogram
 • Both means = 2, sd = 1.9, n = 1000
200
                                                    400

150
                                                    300


100                                                 200


 50                                                 100


  0                                                       0
       -4     -2    0     2     4     6     8                 0   2    4      6      8        10   12
                   Measurement A                                      Measurement B
  10/1/2009    Biostatistics Collaboration Center (BCC)           Lecture 1: Basic Concepts        19
Graphical Summaries: Histogram
 • Both means = 2, sd = 1.9, n = 1000
200
                                                    400

                                 median = 2
150
                                 IQR = 2.8          300


100                                                 200


 50
                                                                                  median = 1.4
                                                    100
                                                                                  IQR = 2.1

  0                                                       0
       -4     -2    0     2     4     6     8                 0   2    4      6      8        10   12
                   Measurement A                                      Measurement B
  10/1/2009    Biostatistics Collaboration Center (BCC)           Lecture 1: Basic Concepts        20
Graphical Summaries: Boxplots
• Graphical extension of median, IQR

• Useful for comparing two variables

• Help identify ‘outliers’




10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   21
Graphical Summaries: Boxplots
                                                                                                   Maximum
                                          240
        Cholesterol Levels (mg/100mL)




                                          220
                                                                                                   Q3 (75th %ile)

                                          200         IQR                                          Median

                                                                                                   Q1 (25th %ile)
                                          180


                                          160                                                      Minimum


10/1/2009                               Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   22
Graphical Summaries: Boxplots
                                                                          Maximum
                 10                                                       “outliers”
                                                                       all greater than
                                                                       Q3 + 1.5 * IQR

                   5                                                  largest obs less
                                                                      than Q3 + 1.5 * IQR
            IQR

                   0                                                       Minimum




                  -5
                             Var 1
                        Measurement A                 Var 2
                                                   Measurement B
10/1/2009   Biostatistics Collaboration Center (BCC)      Lecture 1: Basic Concepts   23
Data Type & Data Summaries
• Type determines appropriate summary

• Characteristics of data determined in
  summary determine appropriate analyses
     – more than just mean, sd
     – e.g. should not use t for strongly skewed data
     – median (IQR) appropriate if skewed, outliers


10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   24
Statistical Inference
• Estimation of quantity of interest
     – estimate itself
     – quantify how good an estimate it is

• Hypothesis testing
     – Is there evidence to suggest that the estimate
       is different from another value?

• Today: proportion, mean, median
10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   25
Statistical Inference
• Take results obtained in a (random) sample as
  best estimate of what is true for the population

• Estimate may differ from the population value by
  chance, but it should be close

• How good is the estimate?

• If we took more samples, and got even more
  estimates, how would they vary?
10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   26
Statistical Inference
• Ex: proportion of persons in a population who
  have health insurance, sample size n = 978

        Sample 1                797/978 = 0.81

     We conclude that the estimated % of people with health
      insurance is 81%.

     How variable is our estimate?


10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   27
Sampling Variability
• Need to know the sampling distribution for
  the quantity of interest

• One option: take lots of samples, and
  make a histogram

            Sample 2 782/978 = 0.80
            Sample 3 812/978 = 0.83
10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   28
Sampling Variability
• Not practical to keep repeating a study!

• Statistical theory

     – the sampling distributions for means and proportions
       often look “normal”, bell-shaped

     – from a single sample, can calculate the standard error
       (variability) of our estimated mean or proportion

10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   29
Standard Error
• Standard error (SE) measures the variability of
  your sample statistic (e.g. a mean or proportion)

• small SE means estimate more precise

• SE is not the same as SD!

• SD measures the variability of the sample data;
  SE measures the variability of the statistic
10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   30
Standard Error vs Standard Deviation
• Averages less variable than the individual observations

• Averages over larger n less variable than over smaller n

• Ex: heights

     – most people from 5’0” – 6’2”
     – average height for sample of 100 people ~ between 5’5” – 5’7”


• SE ≤ SD!


10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   31
Inference for Sample Proportions
• Formula for SE of sample proportion

                                prop ⋅ (1 − prop )
                                        n

• For health insurance example
     – n = 978, prop = 0.81
     – SE = √(0.81) * (0.19)/978 = 0.01
     – The standard error of the sample proportion is 0.01.

10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   32
Sampling Distribution of Sample
               Proportion
   • Sample proportions are normally distributed if (prop) * (1-prop) * n > 10
   • if ≤ 10, then need to use exact binomial (not covered here)



                                                         ~ 95% of sample
                                                         proportions within ~2
                                                         SEs of true proportion




true proportion – 1.96 SE            true proportion     true proportion + 1.96 SE

10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   33
Confidence Interval for Sample
                Proportion
•   95% Confidence Interval for Sample Proportion

     (sample prop – 1.96 * SE, sample prop + 1.96 * SE)

•   Interpretation: For about 95/100 samples, this interval will contain
    the true population proportion

•   95% CI for % population with insurance (n = 978, prop = 0.81)

        Check:       (0.81) * (1-0.81) * 978 = 150.5 > 10

        95% CI:      (0.81 – 1.96 * 0.01, 0.81 + 1.96 * 0.01) =
                     (0.79, 0.83)

•   For greater confidence (e.g. 99%CI), wider interval

10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   34
Standard Error for Sample Mean
• Standard error is the standard deviation of the sample,
  divided by the square root of the sample size

                                 SE = SD/√n

• Ex: n = 16
      mean SBP = 123.4 mm Hg
      sd = 14.0 mm Hg

• SE of mean = SD/√n = 14.0/ √16 = 3.5


10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   35
Sampling Distribution for a Mean
• Sample means follow a t-distribution IF

     – underlying data is (approximately) normal
            OR
     – n is large

• A sample mean from a sample of size n with
  have a t distribution with n-1 degrees of freedom

• tn-1
10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   36
Sampling Distribution for Sample Mean

                                                   Normal Distribution

                                                           t20



                                                                        t15




                                         true mean

For a sample of size n = 16, the sample mean would have a t15 distribution,
centered at the true population mean and with estimated standard error sd/√n.
10/1/2009   Biostatistics Collaboration Center (BCC)        Lecture 1: Basic Concepts   37
Confidence Interval for Sample Mean
• Ex: n = 16
      mean SBP = 123.4 mm Hg
      sd = 14.0 mm Hg

• SE of mean = SD/√n = 14.0/ √16 = 3.5

• Assume that SBP is approximately normally distributed
  (at least symmetric, bell-shaped, not skewed) in the
  population

• 95% CI for sample mean
     mean ± 2.131 * SE = 123.4 ± 2.131 * SE
                                = (115.9, 130.8) mm Hg
            for t15 distribution
10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   38
Confidence Interval for Sample Mean
• Suppose instead mean and sd were for sample
  of size n = 100

• 95 % CI for sample mean
     – n = 100
     mean ± 1.984 * sd/√n = 123.4 ± 1.984 * 14/ √100
          for t99 distribution = (120.6, 126.2) mm Hg
     – n = 16
     mean ± 2.131 * sd/√n = 123.4 ± 2.131 * 14/ √16
         for t distribution
                               = (116.5, 130.3) mm Hg
                15


10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   39
Confidence Interval for Sample Mean
• Suppose instead mean and s were for sample of
  size n = 100

• 95 % CI for sample mean
     – n = 100
     mean ± 1.984 * sd/√n = 123.4 ± 1.984 * 14/ √100
                        = (120.6, 126.2) mm Hg
     – n = 16
     mean ± 2.131 * sd/√n = 123.4 ± 2.131 * 14/ √16
                        = (116.5, 130.3) mm Hg

10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   40
Confidence Interval for a Sample
                  Mean
• Confession: it’s never incorrect to use a t-
  distribution (as long as underlying
  population normal or n large)

     BUT

   as n gets large the t-distribution and
   normal distribution become
   indistinguishable
10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   41
Hypothesis Testing


•     Confidence intervals tell you
     – best estimate
     – variability of best estimate


•     Hypothesis testing
     – Is there really a difference between my
       observed value and another value?
10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   42
Hypothesis Testing
• From our sample of n = 978, we estimated
  that 81% of the people in our population
  had health insurance.
• From previous census records, you know
  that 10 years ago, 78.5% of your
  population had health insurance.
• Has the percent of people with insurance
  really changed?

10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   43
Hypothesis Testing
•   Suppose the true percent with health insurance is 78.5%
     – Called the null hypothesis, or H0

•   What is the probability of observing, for a sample of n = 978, a result
    as or more extreme than 81%, given the true percent is 78.5%?
     – Called the p-value, computed using normal distribution for sample
       proportions (and t-distribution for sample means)

•   If probability is small, conclude that supposition might not be right.
     – Reject the null hypothesis in favor of the alternative hypothesis, Ha
       (opposite of the null hypothesis, related to what you think may be true)

•   If the probability is not small, conclude do not have evidence to
    reject the null hypothesis
     – Not the same as ‘accepting’ the null hypothesis, or showing that the null
       hypothesis is true


10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   44
Hypothesis Testing
                              H0: True proportion is 78.5%               opposites
                              Ha: True proportion is not 78.5%


                                                       P-value = Prob(sample prop >
                                                       0.81 or sample prop < 0.76
                                                       given true prop = 0.785) = 0.012




                             0.76                       0.81
                                        Supposed true
                                        proportion = 0.785
10/1/2009   Biostatistics Collaboration Center (BCC)         Lecture 1: Basic Concepts   45
Hypothesis Testing
• It is not very likely (p = 0.012) to observe our data if the
  true proportion is 78.5%.

• We conclude that we have sufficient evidence that the
  proportion insured is no longer 78.5%.

• “Two-sided” test: observations as extreme as 0.81
     > 0.81 or < 0.76

• “One-sided” test: observations larger than 0.81
     –   Has the percent of people with insurance really increased?
     –   Ha: True proportion greater than 78.5%
     –   p-value = 0.006 (only area above 0.81)
     –   Only ok if previous research suggests that the prop is larger

10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   46
Hypothesis Testing
• Similar for sample means, except that p-values
  are computed using t-distribution, if appropriate

• n = 16 with SBP = 123.4, sd = 14.0 mm Hg

• Previous research suggests that population may
  have lifestyle habits protective for high BP

• Do we have evidence to suggest that our
  population has SBP lower than 125 mm Hg (pre-
  hypertensive)?
10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   47
Hypothesis Testing
                        H0: True mean is 125 mm Hg                  almost opposites
                        Ha: True mean is less than 125 mm Hg

• Using t15 distribution, we find a one-sided
  p-value of 0.32

• This too large to reject the null hypothesis
  in favor of the alternative.
    – Usually need p < 0.05 to “reject null” in favor
      of alternative
10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   48
Misinterpretations of the p-value
• A p-value of 0.32 (or > 0.05) DOES NOT mean that we
  “accept the null”, or that there is a 32% chance that the
  null is true

     – we haven’t shown that the SBP of this group is equal to 125
     – true value may be 124.5 or 125.1 mmHg

• Can only “reject the null in favor of the alternative” or “fail
  to reject the null”

• If you “fail to reject” that doesn’t mean the alternative
  isn’t true. You may not have had a large enough n!

10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   49
Distribution-Free Alternatives
• Recall for the normal distribution to be useful for sample
  proportions, we need n * prop * (1-prop) > 10

     – If not, use exact binomial methods (not covered here)

• For the t-distribution to be useful for sample means, we
  need the underlying population to be normal or n large

     – What if this isn’t the case?
            • data are skewed or n is small
            • e.g. length of hospital stay
     – Use nonparametric methods
            • look at ranks, not mean
            • the median is a nonparametric estimate

10/1/2009    Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   50
Nonparametric Methods
• nonparametric methods are methods that don’t require
  assuming a particular distribution

• a.k.a distribution-free or rank methods

• nonparametric tests well suited to hypothesis testing

• not as useful for point estimates or CIs

• espeically useful when data are ranks or scores
     – Apgar scores
     – Vision (20/20, 20/40, etc.)

10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   51
Nonparametric Methods for Sample
             Median
• Alternative to using sample mean & t-distribution
• Instead do inference for median value
     – CI for median uses ranks
            • n = 16, ~ 95% CI for median
                (4th largest value, 13th largest value)
            • n = 25, ~ 95% CI for median
                (8th largest value, 18th largest value)
            • necessary ranks depend on n and confidence level
                Stat packages or tables


     – Hypothesis test for median: Sign Test

10/1/2009    Biostatistics Collaboration Center (BCC)     Lecture 1: Basic Concepts   52
Sign Test for the Median
• Hypothesis about median, not mean

• Assuming hypothesized value of median is
  correct, expect to observe about half
  sample below median and half above.

• Compute probability for (as or more
  extreme) proportion above median
10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   53
Sign Test Example
• Average daily energy intake over 10                         ID     Energy Intake
  days for 11 healthy subjects                                1         5260
                                                              2         5470
                                                              3         5640
• Are these subjects getting the
                                                              4         6180
  recommended level of 7725 kJ?
                                                              5         6390
                                                              6         6515
• Data aren’t strongly skewed, but we                         7         6805
  don’t know if underlying population                         8         7515
  is normal, and n is small                                   9         7515
                                                              10        8230
• Use sign test to see if median level                        11        8770
  for population might be 7725 kJ


10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts     54
Sign Test
• Expect about half values above 7725 kJ

• Only 2/11 subjects values above the 7725 kJ

• If the probability of selecting a subject whose
  value is above 7725 kJ is ½, what’s the
  probability of obtaining a sample of 11 subjects
  in which 2 or fewer are above 7725 kJ?

• p-value = 0.1019 (from stat software)
10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   55
Parametric vs. Nonparametric
• Nonparametric always okay
• Nonparametric more conservative than
  parametric
     – e.g. 95% CIs for medians sometimes twice as
       wide as those for the mean
• If your data satisfy the requirements, or n
  is fairly large, probably best to use
  parametric procedures

10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   56
Useful References
• Intuitive Biostatistics, Harvey Motulsky, Oxford
  University Press, 1995
     – Highly readable, minimally technical

• Practical Statistics for Medical Research,
  Douglas G. Altman, Chapman & Hall, 1991
     – Readable, not too technical

• Fundamentals of Biostatistics, Bernard Rosner,
  Duxbury, 2000
     – Useful reference, somewhat technical

10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   57
Next Time



                     Two group comparisons




10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   58
Contact BCC
• Please go to our web site at:
  http://www.medschool.northwestern.edu
  /depts/bcc/index.htm


• Fill out our online request form for further
  collaboration!


10/1/2009   Biostatistics Collaboration Center (BCC)   Lecture 1: Basic Concepts   59

Weitere ähnliche Inhalte

Was ist angesagt?

Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAileen Balbido
 
bio statistics for clinical research
bio statistics for clinical researchbio statistics for clinical research
bio statistics for clinical researchRanjith Paravannoor
 
Kolmogorov Smirnov
Kolmogorov SmirnovKolmogorov Smirnov
Kolmogorov SmirnovRabin BK
 
5. Biostatistics central tendency mean, median, mode for ungrouped data
5. Biostatistics central tendency mean, median, mode for ungrouped data5. Biostatistics central tendency mean, median, mode for ungrouped data
5. Biostatistics central tendency mean, median, mode for ungrouped dataSudhakar Khot
 
Statistical Graphics / Exploratory Data Analysis - DAY 2 - 8614 - B.Ed - AIOU
Statistical Graphics / Exploratory Data Analysis - DAY 2 - 8614 - B.Ed - AIOUStatistical Graphics / Exploratory Data Analysis - DAY 2 - 8614 - B.Ed - AIOU
Statistical Graphics / Exploratory Data Analysis - DAY 2 - 8614 - B.Ed - AIOUEqraBaig
 
statistical estimation
statistical estimationstatistical estimation
statistical estimationAmish Akbar
 
Lec. biostatistics introduction
Lec. biostatistics  introductionLec. biostatistics  introduction
Lec. biostatistics introductionRiaz101
 
Introduction to biostatistics
Introduction to biostatisticsIntroduction to biostatistics
Introduction to biostatisticsAli Al Mousawi
 
One Way Anova
One Way AnovaOne Way Anova
One Way Anovashoffma5
 
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdf
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdfUnit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdf
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdfRavinandan A P
 
Assumptions underlying the one way anova
Assumptions underlying the one way anovaAssumptions underlying the one way anova
Assumptions underlying the one way anovaTAYYABA MAHR
 
Sample size determination.pptx
Sample size determination.pptxSample size determination.pptx
Sample size determination.pptxMohammedAbdela7
 

Was ist angesagt? (20)

Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
bio statistics for clinical research
bio statistics for clinical researchbio statistics for clinical research
bio statistics for clinical research
 
Kolmogorov Smirnov
Kolmogorov SmirnovKolmogorov Smirnov
Kolmogorov Smirnov
 
Calculating p value
Calculating p valueCalculating p value
Calculating p value
 
BIOSTATISTICS
BIOSTATISTICSBIOSTATISTICS
BIOSTATISTICS
 
Student t-test
Student t-testStudent t-test
Student t-test
 
Bio stat
Bio statBio stat
Bio stat
 
5. Biostatistics central tendency mean, median, mode for ungrouped data
5. Biostatistics central tendency mean, median, mode for ungrouped data5. Biostatistics central tendency mean, median, mode for ungrouped data
5. Biostatistics central tendency mean, median, mode for ungrouped data
 
Statistical Graphics / Exploratory Data Analysis - DAY 2 - 8614 - B.Ed - AIOU
Statistical Graphics / Exploratory Data Analysis - DAY 2 - 8614 - B.Ed - AIOUStatistical Graphics / Exploratory Data Analysis - DAY 2 - 8614 - B.Ed - AIOU
Statistical Graphics / Exploratory Data Analysis - DAY 2 - 8614 - B.Ed - AIOU
 
statistical estimation
statistical estimationstatistical estimation
statistical estimation
 
Lec. biostatistics introduction
Lec. biostatistics  introductionLec. biostatistics  introduction
Lec. biostatistics introduction
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Introduction to biostatistics
Introduction to biostatisticsIntroduction to biostatistics
Introduction to biostatistics
 
Mc Nemar
Mc NemarMc Nemar
Mc Nemar
 
Biostatistics ppt
Biostatistics  pptBiostatistics  ppt
Biostatistics ppt
 
One Way Anova
One Way AnovaOne Way Anova
One Way Anova
 
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdf
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdfUnit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdf
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdf
 
Assumptions underlying the one way anova
Assumptions underlying the one way anovaAssumptions underlying the one way anova
Assumptions underlying the one way anova
 
Test of significance
Test of significanceTest of significance
Test of significance
 
Sample size determination.pptx
Sample size determination.pptxSample size determination.pptx
Sample size determination.pptx
 

Andere mochten auch

Penal code presentation. abduction, assault, criminal force, extortion, force...
Penal code presentation. abduction, assault, criminal force, extortion, force...Penal code presentation. abduction, assault, criminal force, extortion, force...
Penal code presentation. abduction, assault, criminal force, extortion, force...Mamunur Rashid
 
Penal code
Penal codePenal code
Penal codeFAROUQ
 
Discrete Mathematics Presentation
Discrete Mathematics PresentationDiscrete Mathematics Presentation
Discrete Mathematics PresentationSalman Elahi
 
PROBABILITY AND IT'S TYPES WITH RULES
PROBABILITY AND IT'S TYPES WITH RULESPROBABILITY AND IT'S TYPES WITH RULES
PROBABILITY AND IT'S TYPES WITH RULESBhargavi Bhanu
 
Probability powerpoint
Probability powerpointProbability powerpoint
Probability powerpointTiffany Deegan
 
Basic Concept Of Probability
Basic Concept Of ProbabilityBasic Concept Of Probability
Basic Concept Of Probabilityguest45a926
 
Probability Powerpoint
Probability PowerpointProbability Powerpoint
Probability Powerpointspike2904
 
PROBABILITY
PROBABILITYPROBABILITY
PROBABILITYVIV13
 

Andere mochten auch (12)

Probability
ProbabilityProbability
Probability
 
Probability
ProbabilityProbability
Probability
 
Penal code presentation. abduction, assault, criminal force, extortion, force...
Penal code presentation. abduction, assault, criminal force, extortion, force...Penal code presentation. abduction, assault, criminal force, extortion, force...
Penal code presentation. abduction, assault, criminal force, extortion, force...
 
Carbohydrate Counting
Carbohydrate CountingCarbohydrate Counting
Carbohydrate Counting
 
Penal code
Penal codePenal code
Penal code
 
Basic Probability
Basic ProbabilityBasic Probability
Basic Probability
 
Discrete Mathematics Presentation
Discrete Mathematics PresentationDiscrete Mathematics Presentation
Discrete Mathematics Presentation
 
PROBABILITY AND IT'S TYPES WITH RULES
PROBABILITY AND IT'S TYPES WITH RULESPROBABILITY AND IT'S TYPES WITH RULES
PROBABILITY AND IT'S TYPES WITH RULES
 
Probability powerpoint
Probability powerpointProbability powerpoint
Probability powerpoint
 
Basic Concept Of Probability
Basic Concept Of ProbabilityBasic Concept Of Probability
Basic Concept Of Probability
 
Probability Powerpoint
Probability PowerpointProbability Powerpoint
Probability Powerpoint
 
PROBABILITY
PROBABILITYPROBABILITY
PROBABILITY
 

Ähnlich wie Lecture 1 basic concepts2009

AN MFA APPLICATION IN TUBERCULOSIS PREVALENCE ANALYSIS
AN MFA APPLICATION IN TUBERCULOSIS PREVALENCE ANALYSISAN MFA APPLICATION IN TUBERCULOSIS PREVALENCE ANALYSIS
AN MFA APPLICATION IN TUBERCULOSIS PREVALENCE ANALYSISChaoyi WU
 
Ct lecture 12. simple linear regression analysis
Ct lecture 12. simple linear regression analysisCt lecture 12. simple linear regression analysis
Ct lecture 12. simple linear regression analysisHau Pham
 
Quality Symposium 2013 Final
Quality Symposium 2013 FinalQuality Symposium 2013 Final
Quality Symposium 2013 FinalFrank Hong
 
Ct lecture 4. descriptive analysis of cont variables
Ct lecture 4. descriptive analysis of cont variablesCt lecture 4. descriptive analysis of cont variables
Ct lecture 4. descriptive analysis of cont variablesHau Pham
 
2010 smg training_cardiff_day2_session2_dias
2010 smg training_cardiff_day2_session2_dias2010 smg training_cardiff_day2_session2_dias
2010 smg training_cardiff_day2_session2_diasrgveroniki
 
Seminar 10 BIOSTATISTICS
Seminar 10 BIOSTATISTICSSeminar 10 BIOSTATISTICS
Seminar 10 BIOSTATISTICSAnusha Divvi
 
Roth_Society For Medical Decision Making Conference 2011-Accounting for Uncer...
Roth_Society For Medical Decision Making Conference 2011-Accounting for Uncer...Roth_Society For Medical Decision Making Conference 2011-Accounting for Uncer...
Roth_Society For Medical Decision Making Conference 2011-Accounting for Uncer...rothius05
 
Measures and Feedback January 2011
Measures and Feedback January 2011Measures and Feedback January 2011
Measures and Feedback January 2011Scott Miller
 
To infinity and beyond
To infinity and beyond To infinity and beyond
To infinity and beyond Stephen Senn
 
Essential biology 01 statistical analysis
Essential biology 01 statistical analysisEssential biology 01 statistical analysis
Essential biology 01 statistical analysismcnewbold
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier
 
CORBEL Bioimage Analysis webinar slides
CORBEL Bioimage Analysis webinar slidesCORBEL Bioimage Analysis webinar slides
CORBEL Bioimage Analysis webinar slidesCORBEL
 
Lecture 3 (handout)
Lecture 3 (handout)Lecture 3 (handout)
Lecture 3 (handout)ibased
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsElena Sügis
 

Ähnlich wie Lecture 1 basic concepts2009 (20)

Lab 1 intro
Lab 1 introLab 1 intro
Lab 1 intro
 
AN MFA APPLICATION IN TUBERCULOSIS PREVALENCE ANALYSIS
AN MFA APPLICATION IN TUBERCULOSIS PREVALENCE ANALYSISAN MFA APPLICATION IN TUBERCULOSIS PREVALENCE ANALYSIS
AN MFA APPLICATION IN TUBERCULOSIS PREVALENCE ANALYSIS
 
Ct lecture 12. simple linear regression analysis
Ct lecture 12. simple linear regression analysisCt lecture 12. simple linear regression analysis
Ct lecture 12. simple linear regression analysis
 
Quality Symposium 2013 Final
Quality Symposium 2013 FinalQuality Symposium 2013 Final
Quality Symposium 2013 Final
 
Survadapt-Webinar_2014_SLIDES
Survadapt-Webinar_2014_SLIDESSurvadapt-Webinar_2014_SLIDES
Survadapt-Webinar_2014_SLIDES
 
Bioststistic mbbs-1 f30may
Bioststistic  mbbs-1 f30mayBioststistic  mbbs-1 f30may
Bioststistic mbbs-1 f30may
 
6.pdf
6.pdf6.pdf
6.pdf
 
Ct lecture 4. descriptive analysis of cont variables
Ct lecture 4. descriptive analysis of cont variablesCt lecture 4. descriptive analysis of cont variables
Ct lecture 4. descriptive analysis of cont variables
 
2010 smg training_cardiff_day2_session2_dias
2010 smg training_cardiff_day2_session2_dias2010 smg training_cardiff_day2_session2_dias
2010 smg training_cardiff_day2_session2_dias
 
Seminar 10 BIOSTATISTICS
Seminar 10 BIOSTATISTICSSeminar 10 BIOSTATISTICS
Seminar 10 BIOSTATISTICS
 
B.lect1
B.lect1B.lect1
B.lect1
 
Roth_Society For Medical Decision Making Conference 2011-Accounting for Uncer...
Roth_Society For Medical Decision Making Conference 2011-Accounting for Uncer...Roth_Society For Medical Decision Making Conference 2011-Accounting for Uncer...
Roth_Society For Medical Decision Making Conference 2011-Accounting for Uncer...
 
Measures and Feedback January 2011
Measures and Feedback January 2011Measures and Feedback January 2011
Measures and Feedback January 2011
 
To infinity and beyond
To infinity and beyond To infinity and beyond
To infinity and beyond
 
2 UEDA.pdf
2 UEDA.pdf2 UEDA.pdf
2 UEDA.pdf
 
Essential biology 01 statistical analysis
Essential biology 01 statistical analysisEssential biology 01 statistical analysis
Essential biology 01 statistical analysis
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
CORBEL Bioimage Analysis webinar slides
CORBEL Bioimage Analysis webinar slidesCORBEL Bioimage Analysis webinar slides
CORBEL Bioimage Analysis webinar slides
 
Lecture 3 (handout)
Lecture 3 (handout)Lecture 3 (handout)
Lecture 3 (handout)
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in Bioinformatics
 

Kürzlich hochgeladen

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 

Kürzlich hochgeladen (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 

Lecture 1 basic concepts2009

  • 1. Basic Biostatistics in Medical Research Lecture 1: Basic Concepts Leah J. Welty, PhD Biostatistics Collaboration Center (BCC) Department of Preventive Medicine NU Feinberg School of Medicine
  • 2. Objectives • Assist participants in interpreting statistics published in medical literature • Highlight different statistical methodology for investigators conducting own research • Facilitate communication between medical investigators and biostatisticians 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 2
  • 3. Lecture 1: Basic Concepts Data Types Summary Statistics One Sample Inference
  • 4. Data Types • Categorical (Qualitative) – subjects sorted into categories • diabetic/non-diabetic • blood type: A/B/AB/O • Numerical (Quantitative) – measurements or counts • weight in lbs • # of children 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 4
  • 5. Categorical Data • Nominal – categories unordered • blood type: A/B/AB/O – called binary or dichotomous if 2 categories • gender: M/F • Ordinal – categories ordered • cancer stage I, II, III, IV • non-smoker/ex-smoker/smoker 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 5
  • 6. Numerical Data • Discrete – limited # possible values • # of children: 0, 1, 2, … • Continuous – range of values • weight in lbs: >0 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 6
  • 7. Determining Data Types • Ordinal (Categorical) vs Discrete (Numerical) • Ordinal – Cancer Stage I, II, III, IV – Stage II ≠ 2 times Stage I – Categories could also be A, B, C, D • Discrete – # of children: 0, 1, 2, … – 4 children = 2 times 2 children 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 7
  • 8. Determining Data Types • Type of data determines which types of analyses are appropriate • Continuous vs. Categorical 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 8
  • 9. Describing Data • General summary – condense information to manageable form – numerically • ‘center’ of data values • spread of data; variability – graphically/visually • What analyses are / are not appropriate 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 9
  • 10. Summarizing Categorical Variables • Compute #, fraction, or % in each category • Display in table SubjNo Smoker 1 Never 2 Current 3 Never … 2738 Former 2739 Former 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 10
  • 11. Summarizing Categorical Variables SubjNo Smoker 1 Never 2 Current … 2739 Former Never Former Current Total Smoking Status 621 (23%) 776 (28%) 1342 (49%) 2739 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 11
  • 12. Summarizing Numerical Variables • Numeric Summaries – center of data • mean • median – spread/variability • standard deviation • quartiles; inner quartile range • Graphical Summaries – histogram, boxplot 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 12
  • 13. Numeric Summaries: Center • Mean – average value – not robust to outlying values • Median – 50th percentile or quantile – the data point “in the middle” – ½ observations below, ½ observations above 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 13
  • 14. Numeric Summaries: Center • Hospital stays: 3, 5, 7, 8, 8, 9, 10, 12, 35 • Mean (3 + 5 + 7 + … + 35) / 9 = 10.78 days Only two patients stay > 10.78 days! • Median 3 5 7 8 8 9 10 12 35 Same even if longest stay 100 days! 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 14
  • 15. Numeric Summaries: Spread • Standard deviation (s, sd) – reported with the mean – based on average distance of observations from mean sd = √sum of (obs – mean)2 / (n-1) Hospital stays: 3, 5, 7, 8, 8, 9, 10, 12, 35 sd = 9.46 days – not robust (sd = 30.9 if longest stay 100 days) – sometimes ~95% of obs w/in 2 sd of mean 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 15
  • 16. Numeric Summaries: Spread • Inner Quartile Range (IQR), Quartiles – reported with median (+min, max, quartiles) – IQR = distance between 75th & 25th %iles 3 5 7 8 8 9 10 12 35 – 10 – 7 = 3 days – range of middle 50% of data – same even if longest stay 100 days! 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 16
  • 17. Numeric Summaries Center Spread Common & Mean Std dev (sd) Mathematically Convenient Median IQR Robust to Outliers • How to choose which to use? 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 17
  • 18. Graphical Summaries • Histogram – divide data into intervals – compute # or fraction of observations in each interval – good for many observations 20 15 Frequency Hypothetical cholesterol levels for n = 100 10 subjects 5 0 160 180 200 220 240 Cholesterol (mg/100mL) 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 18
  • 19. Graphical Summaries: Histogram • Both means = 2, sd = 1.9, n = 1000 200 400 150 300 100 200 50 100 0 0 -4 -2 0 2 4 6 8 0 2 4 6 8 10 12 Measurement A Measurement B 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 19
  • 20. Graphical Summaries: Histogram • Both means = 2, sd = 1.9, n = 1000 200 400 median = 2 150 IQR = 2.8 300 100 200 50 median = 1.4 100 IQR = 2.1 0 0 -4 -2 0 2 4 6 8 0 2 4 6 8 10 12 Measurement A Measurement B 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 20
  • 21. Graphical Summaries: Boxplots • Graphical extension of median, IQR • Useful for comparing two variables • Help identify ‘outliers’ 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 21
  • 22. Graphical Summaries: Boxplots Maximum 240 Cholesterol Levels (mg/100mL) 220 Q3 (75th %ile) 200 IQR Median Q1 (25th %ile) 180 160 Minimum 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 22
  • 23. Graphical Summaries: Boxplots Maximum 10 “outliers” all greater than Q3 + 1.5 * IQR 5 largest obs less than Q3 + 1.5 * IQR IQR 0 Minimum -5 Var 1 Measurement A Var 2 Measurement B 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 23
  • 24. Data Type & Data Summaries • Type determines appropriate summary • Characteristics of data determined in summary determine appropriate analyses – more than just mean, sd – e.g. should not use t for strongly skewed data – median (IQR) appropriate if skewed, outliers 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 24
  • 25. Statistical Inference • Estimation of quantity of interest – estimate itself – quantify how good an estimate it is • Hypothesis testing – Is there evidence to suggest that the estimate is different from another value? • Today: proportion, mean, median 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 25
  • 26. Statistical Inference • Take results obtained in a (random) sample as best estimate of what is true for the population • Estimate may differ from the population value by chance, but it should be close • How good is the estimate? • If we took more samples, and got even more estimates, how would they vary? 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 26
  • 27. Statistical Inference • Ex: proportion of persons in a population who have health insurance, sample size n = 978 Sample 1 797/978 = 0.81 We conclude that the estimated % of people with health insurance is 81%. How variable is our estimate? 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 27
  • 28. Sampling Variability • Need to know the sampling distribution for the quantity of interest • One option: take lots of samples, and make a histogram Sample 2 782/978 = 0.80 Sample 3 812/978 = 0.83 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 28
  • 29. Sampling Variability • Not practical to keep repeating a study! • Statistical theory – the sampling distributions for means and proportions often look “normal”, bell-shaped – from a single sample, can calculate the standard error (variability) of our estimated mean or proportion 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 29
  • 30. Standard Error • Standard error (SE) measures the variability of your sample statistic (e.g. a mean or proportion) • small SE means estimate more precise • SE is not the same as SD! • SD measures the variability of the sample data; SE measures the variability of the statistic 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 30
  • 31. Standard Error vs Standard Deviation • Averages less variable than the individual observations • Averages over larger n less variable than over smaller n • Ex: heights – most people from 5’0” – 6’2” – average height for sample of 100 people ~ between 5’5” – 5’7” • SE ≤ SD! 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 31
  • 32. Inference for Sample Proportions • Formula for SE of sample proportion prop ⋅ (1 − prop ) n • For health insurance example – n = 978, prop = 0.81 – SE = √(0.81) * (0.19)/978 = 0.01 – The standard error of the sample proportion is 0.01. 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 32
  • 33. Sampling Distribution of Sample Proportion • Sample proportions are normally distributed if (prop) * (1-prop) * n > 10 • if ≤ 10, then need to use exact binomial (not covered here) ~ 95% of sample proportions within ~2 SEs of true proportion true proportion – 1.96 SE true proportion true proportion + 1.96 SE 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 33
  • 34. Confidence Interval for Sample Proportion • 95% Confidence Interval for Sample Proportion (sample prop – 1.96 * SE, sample prop + 1.96 * SE) • Interpretation: For about 95/100 samples, this interval will contain the true population proportion • 95% CI for % population with insurance (n = 978, prop = 0.81) Check: (0.81) * (1-0.81) * 978 = 150.5 > 10 95% CI: (0.81 – 1.96 * 0.01, 0.81 + 1.96 * 0.01) = (0.79, 0.83) • For greater confidence (e.g. 99%CI), wider interval 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 34
  • 35. Standard Error for Sample Mean • Standard error is the standard deviation of the sample, divided by the square root of the sample size SE = SD/√n • Ex: n = 16 mean SBP = 123.4 mm Hg sd = 14.0 mm Hg • SE of mean = SD/√n = 14.0/ √16 = 3.5 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 35
  • 36. Sampling Distribution for a Mean • Sample means follow a t-distribution IF – underlying data is (approximately) normal OR – n is large • A sample mean from a sample of size n with have a t distribution with n-1 degrees of freedom • tn-1 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 36
  • 37. Sampling Distribution for Sample Mean Normal Distribution t20 t15 true mean For a sample of size n = 16, the sample mean would have a t15 distribution, centered at the true population mean and with estimated standard error sd/√n. 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 37
  • 38. Confidence Interval for Sample Mean • Ex: n = 16 mean SBP = 123.4 mm Hg sd = 14.0 mm Hg • SE of mean = SD/√n = 14.0/ √16 = 3.5 • Assume that SBP is approximately normally distributed (at least symmetric, bell-shaped, not skewed) in the population • 95% CI for sample mean mean ± 2.131 * SE = 123.4 ± 2.131 * SE = (115.9, 130.8) mm Hg for t15 distribution 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 38
  • 39. Confidence Interval for Sample Mean • Suppose instead mean and sd were for sample of size n = 100 • 95 % CI for sample mean – n = 100 mean ± 1.984 * sd/√n = 123.4 ± 1.984 * 14/ √100 for t99 distribution = (120.6, 126.2) mm Hg – n = 16 mean ± 2.131 * sd/√n = 123.4 ± 2.131 * 14/ √16 for t distribution = (116.5, 130.3) mm Hg 15 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 39
  • 40. Confidence Interval for Sample Mean • Suppose instead mean and s were for sample of size n = 100 • 95 % CI for sample mean – n = 100 mean ± 1.984 * sd/√n = 123.4 ± 1.984 * 14/ √100 = (120.6, 126.2) mm Hg – n = 16 mean ± 2.131 * sd/√n = 123.4 ± 2.131 * 14/ √16 = (116.5, 130.3) mm Hg 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 40
  • 41. Confidence Interval for a Sample Mean • Confession: it’s never incorrect to use a t- distribution (as long as underlying population normal or n large) BUT as n gets large the t-distribution and normal distribution become indistinguishable 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 41
  • 42. Hypothesis Testing • Confidence intervals tell you – best estimate – variability of best estimate • Hypothesis testing – Is there really a difference between my observed value and another value? 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 42
  • 43. Hypothesis Testing • From our sample of n = 978, we estimated that 81% of the people in our population had health insurance. • From previous census records, you know that 10 years ago, 78.5% of your population had health insurance. • Has the percent of people with insurance really changed? 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 43
  • 44. Hypothesis Testing • Suppose the true percent with health insurance is 78.5% – Called the null hypothesis, or H0 • What is the probability of observing, for a sample of n = 978, a result as or more extreme than 81%, given the true percent is 78.5%? – Called the p-value, computed using normal distribution for sample proportions (and t-distribution for sample means) • If probability is small, conclude that supposition might not be right. – Reject the null hypothesis in favor of the alternative hypothesis, Ha (opposite of the null hypothesis, related to what you think may be true) • If the probability is not small, conclude do not have evidence to reject the null hypothesis – Not the same as ‘accepting’ the null hypothesis, or showing that the null hypothesis is true 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 44
  • 45. Hypothesis Testing H0: True proportion is 78.5% opposites Ha: True proportion is not 78.5% P-value = Prob(sample prop > 0.81 or sample prop < 0.76 given true prop = 0.785) = 0.012 0.76 0.81 Supposed true proportion = 0.785 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 45
  • 46. Hypothesis Testing • It is not very likely (p = 0.012) to observe our data if the true proportion is 78.5%. • We conclude that we have sufficient evidence that the proportion insured is no longer 78.5%. • “Two-sided” test: observations as extreme as 0.81 > 0.81 or < 0.76 • “One-sided” test: observations larger than 0.81 – Has the percent of people with insurance really increased? – Ha: True proportion greater than 78.5% – p-value = 0.006 (only area above 0.81) – Only ok if previous research suggests that the prop is larger 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 46
  • 47. Hypothesis Testing • Similar for sample means, except that p-values are computed using t-distribution, if appropriate • n = 16 with SBP = 123.4, sd = 14.0 mm Hg • Previous research suggests that population may have lifestyle habits protective for high BP • Do we have evidence to suggest that our population has SBP lower than 125 mm Hg (pre- hypertensive)? 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 47
  • 48. Hypothesis Testing H0: True mean is 125 mm Hg almost opposites Ha: True mean is less than 125 mm Hg • Using t15 distribution, we find a one-sided p-value of 0.32 • This too large to reject the null hypothesis in favor of the alternative. – Usually need p < 0.05 to “reject null” in favor of alternative 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 48
  • 49. Misinterpretations of the p-value • A p-value of 0.32 (or > 0.05) DOES NOT mean that we “accept the null”, or that there is a 32% chance that the null is true – we haven’t shown that the SBP of this group is equal to 125 – true value may be 124.5 or 125.1 mmHg • Can only “reject the null in favor of the alternative” or “fail to reject the null” • If you “fail to reject” that doesn’t mean the alternative isn’t true. You may not have had a large enough n! 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 49
  • 50. Distribution-Free Alternatives • Recall for the normal distribution to be useful for sample proportions, we need n * prop * (1-prop) > 10 – If not, use exact binomial methods (not covered here) • For the t-distribution to be useful for sample means, we need the underlying population to be normal or n large – What if this isn’t the case? • data are skewed or n is small • e.g. length of hospital stay – Use nonparametric methods • look at ranks, not mean • the median is a nonparametric estimate 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 50
  • 51. Nonparametric Methods • nonparametric methods are methods that don’t require assuming a particular distribution • a.k.a distribution-free or rank methods • nonparametric tests well suited to hypothesis testing • not as useful for point estimates or CIs • espeically useful when data are ranks or scores – Apgar scores – Vision (20/20, 20/40, etc.) 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 51
  • 52. Nonparametric Methods for Sample Median • Alternative to using sample mean & t-distribution • Instead do inference for median value – CI for median uses ranks • n = 16, ~ 95% CI for median (4th largest value, 13th largest value) • n = 25, ~ 95% CI for median (8th largest value, 18th largest value) • necessary ranks depend on n and confidence level Stat packages or tables – Hypothesis test for median: Sign Test 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 52
  • 53. Sign Test for the Median • Hypothesis about median, not mean • Assuming hypothesized value of median is correct, expect to observe about half sample below median and half above. • Compute probability for (as or more extreme) proportion above median 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 53
  • 54. Sign Test Example • Average daily energy intake over 10 ID Energy Intake days for 11 healthy subjects 1 5260 2 5470 3 5640 • Are these subjects getting the 4 6180 recommended level of 7725 kJ? 5 6390 6 6515 • Data aren’t strongly skewed, but we 7 6805 don’t know if underlying population 8 7515 is normal, and n is small 9 7515 10 8230 • Use sign test to see if median level 11 8770 for population might be 7725 kJ 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 54
  • 55. Sign Test • Expect about half values above 7725 kJ • Only 2/11 subjects values above the 7725 kJ • If the probability of selecting a subject whose value is above 7725 kJ is ½, what’s the probability of obtaining a sample of 11 subjects in which 2 or fewer are above 7725 kJ? • p-value = 0.1019 (from stat software) 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 55
  • 56. Parametric vs. Nonparametric • Nonparametric always okay • Nonparametric more conservative than parametric – e.g. 95% CIs for medians sometimes twice as wide as those for the mean • If your data satisfy the requirements, or n is fairly large, probably best to use parametric procedures 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 56
  • 57. Useful References • Intuitive Biostatistics, Harvey Motulsky, Oxford University Press, 1995 – Highly readable, minimally technical • Practical Statistics for Medical Research, Douglas G. Altman, Chapman & Hall, 1991 – Readable, not too technical • Fundamentals of Biostatistics, Bernard Rosner, Duxbury, 2000 – Useful reference, somewhat technical 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 57
  • 58. Next Time Two group comparisons 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 58
  • 59. Contact BCC • Please go to our web site at: http://www.medschool.northwestern.edu /depts/bcc/index.htm • Fill out our online request form for further collaboration! 10/1/2009 Biostatistics Collaboration Center (BCC) Lecture 1: Basic Concepts 59