General statistics, emphasis of statistics with regards to healthcare, types of stats, methods of sampling, errors in sampling, different types of tests, measures of dispersion, correlation, types of correlation
2. STATISTICS- is a science of compiling, classifying, and
tabulating numerical data and expressing the
results in a mathematical and graphical form.
BIOSTATISTICS- is that branch of statistics concerned
with the mathematical facts and data related to
biological events.
3. • Constant
– Quantities that do not vary e.g. in biostatistics,
mean, standard deviation are considered constant
for a population
• Variable
– Characteristics which takes different values for
different person, place or thing such as height,
weight, blood pressure
4. • Parameter
– It is a constant that describes a population e.g. in a
college there are 40% girls. This describes the
population, hence it is a parameter.
• Statistic
– Statistic is a constant that describes the sample e.g. out
of 200 students of the same college 45% girls. This 45%
will be statistic as it describes the sample
• Attribute
• A characteristic based on which the population can
be described into categories or class e.g. gender,
caste, religion
5. HISTORY
• The science of statistics is said to have originated
from two main sources:
1 . Government records
2. Mathematics
• It developed from registration of heads of families
in ancient Egypt to the Roman census on military
strength , birth and deaths etc and found its
application gradually in the field of health and
medicine.
6. • John Graunt who is neither a physician nor a
mathematician is the FATHER OF HEALTH
STATISTICS.
7. WHAT IS STATISTICS ??
• The following essential features of statistics are evident
from various definitions of statistics:
a) principles and methods for the collection of presentation,
analysis and interpretation of numerical data of different
kinds.
1. Observational data, qualitative data.
2. Data that has been obtained by a repetitive operation.
3. Data affected to a marked degree of a multiplicity of
causes.
b) The science and art of dealing with variation in such a way
as to obtain reliable results.
8. c) Controlled objective methods whereby group
trends are abstracted from observations on
many separate individuals.
d) The science of experimentation which may be
regarded as mathematics applied to
observational data.
9. WHY STATISTICS ??
• Variabilty in measurement can be handled using statistics. Eg:
investigator makes observations according to his judgement of the
situation.
(Depending upon his skills, knowledge, experience.)
• Epidemiology and Biostatistics are sister sciences or disciplines.
• Epidemiology collects facts relating to group of population in places,
times and situation.
• Biostatistics converts all the facts into figures and at the end
translates them into facts, interpreting the significance of their
results.
10. • Epidemiology and biostatistics both deal with the
facts-figures-facts
QUANITATIVE METHADOLOGY
11. USES OF BIOSTATISTICS
1. To test whether the difference between two populations is real or
by chance occurrence.
2. To study the correlation between attributes in the same population.
3. To evaluate the efficacy of vaccines.
4. To measure mortality and morbidity.
5. To evaluate the achievements of public health programs
6. To fix priorities in public health programs
7. To help promote health legislation and create administrative
standards for oral health.
12. COLLECTION OF DATA
• The collective recording of observations either
numerical or otherwise is called data.
• Demographic data comprises details of population
size, disrtibution, geographic distribution , ethnic
group , socio-economic factors and their trends
over time.
• It is obtained from census and other public service
reports.
13. • Depending upon the nature of the variable, data is
classified into:
1. Qualitative data- attributes or qualities.
a) discrete
b) continuous
2. Quantitative data- through measurements using
calipers.
14. Sources of statistical data
Data can be collected
EXPERIMENTS SURVEYS RECORDS
Performed to collect
data for investigations
and research by one
or more workers.
Carried out for Epidemiological
studies in the field by trained
teams to find incidence or
prevalence of health or
disease in a community.
Records are maintained
as a routine in registers
and books over a long
period of time provide
readymade data.
PRIMARY SECONDARY
Data obtained by the investigator himself. Data has already recorded.
Eg: hospital records
15. Primary data can be obtained using any
one of the following methods:
Direct personal
interviews
Oral health
examination
Questionnaire
method
•Face-to-face contact with
the person.
•Subjective phenomena.
•Accurate and any
ambiguity can be clarified.
•Cannot be used in
extensive studies.
• When information is
needed on health
status.
• Cannot be used in
extensive studies.
• Includes treatment
• List of Questions
pertaining to the
survey “questionnaire”
is prepared.
• Various informants are
requested to supply
the information.
16. Sampling and sample design
• Population:- group of all individuals who are the focus
of the investigation is known as population.
• Cencus enumeration:- if the information is obtained
from each and every individual in the population.
• Sample means the group of individuals who actually
available for investigation.
• Sampling units: the individual entities that form the
focus of the study.
• Sampling frame/list: list of sampling units
17. Sample selection
Purposive selection
•Representing the population as
a whole.
•Great temptation to
deliberately or purposively
select the individual who seen
to represent the population
under study.
•Easy to carry out.
•Does not need the preperation
of sampling frame.
Random selection
•Sample of units is selected
in such a way that all the
characteristics of the
population is reflected in
the sample.
•Random indicates the
chance of the population
unit being selected in the
sampe.
18. Sampling Design
BASED UPON TYPE AND NATURE OF THE POPULATION AND
THE OBJECTIVES OF THE INVESTIGATION.
1. Sample random sampling
2. Systematic random sampling
3. Stratified random sampling
4. Clusture sampling
5. Multiphase sampling pathfinder survey
19. Sample random sampling
• Each and every unit in the population has an equal
chance of being included in the sample.
• Selection of unit is by chance only.
Two methods
Lottery methods
•Population units are
numbered on
separate slip.
•Shuffled and
blindfold selection.
Table of random numbers
•Random arrangement of
digits from 0-9 in rows and
columns.
•Selection is done either in a
horizontal or vertical
direction
20. Systematic random sampling
• Select one unit at random and then selecting
additional units at evenly spaced interval till the sample
of required size has been drawn.
Stratified random selection
• Population to be sampled is subdivided into groups
(age/sex/genetic) known as Strata. ( i.e each group is
homogenous in characteristics.)
• Then a simple randon selection is done from each
stratum.
• More representative, provide greater accuracy and
concentrate on wider geographical area.
21. Cluster sampling
• The population forms natural groups or clusters
such as village, wards blocks or children of a school.
• Sample of the clusters is selected and then all the
units in each of the selected cluster is surveyed.
• Simpler, less time and cost.
• High standard of errors.
22. Multiphase sampling
• Part of information is collected from the whole sample and
part from the sub sample.
• First phase: All the children in school are surveyed.
• Second phase: Only the ones with oral health problems.
• Third phase: section that needs treatment are selected.
• Sub-samples further becomes smaller and smaller.
• Adapted when the interest is in any specific disease.
23. Multistage sampling
• First stage is to select the groups or clusters.
• Then subsamples are taken in as many subsequent
stages as necessary to obtain the desired sample.
24. Errors in sampling
Sampling errors
•Faulty sample design
•Small sample sie
Non-Sampling errors
•Coverage errors- due to non-
response or non cooperation
of the informant.
•Observational errors: interview
bias, imperfect experimental
technique.
•Processing errors: statistical
analysis
25. Data presentation
Two main types of data presentation are:
• Tabulation
• Graphic representation - charts and diagrams
– Tables are simple device used for the presentation of statistical
data.
PRINCIPLES:
– Tables should be as simple as possible.(2-3 small tables).
– Data should be presented according to size or importance,
chronologically or alphabetically.
– Should be self explanatory.
– Each row and column should be labelled concisely and clearly.
Tabulation
26. – Specific unit of measure for the data should be given.
– Title should be clear, concise and to the point.
– Total should be shown.
– Every table should contain a title as to what is depiceted in the
table.
– In small table, vertical lines seperating the column may not be
necessary.
– If the data are not orignal, their source should be given in a
footnote.
27. TYPES OF TABLES
MASTER TABLE
Contains all the
data obtained
from a survey
SIMPLE TABLE
One way tables which
supply the answer to
questions about one
characteristic of data
only.
FREQUENCY DISTRIBUTION
TABLE
Two column frequent table.
First column list the classes
into which the data are
grouped.
Second column lists the
frequency for each
classification
28. • Most convincing and appealing ways of depicting statistical
results.
Principles
1. Every diagram must be given a title that is self explanatory.
2. Simple and consistent with the data.
3. The values of the variable are presented on the horizontal or
X-axis and frequency on the vertical line Y-axis.
4. Number of lines drawn in any graph should not be many.
5. Scale of presentation for X-axis and Y- axis should be
mentioned.
6. The scale of division of both the axes should be proportional
and the divisions should be marked along the details of the
variable and frequencies presented on the axes.
Charts and diagrams
29. • Represents qualitative data.
• Bars can be either vertical or horizontal.
• Suitable scale is chosen
• Bars are usually equally spaced
• They are of three types:
• simple bar chart- represents only one variable.
• multiple bar chart- each category of a variable
there are set of bars.
• component /proportional bar chart- individual bar
is divided into 2 or more parts
Bar chart
30.
31. Pie chart
• Entire graph looks like a pie.
• It is divided into different sectors corresponding to
the frequencies.
32. Line diagram
Useful to study changes of values in the variable over time and is the
simplest type of diagram.
Time such as hours, days , weeks , months or years
33. • Pictorial presentation of frequency distribution
• No space between the cells on a histogram.
• class interval given on vertical axis
• area of rectangle is proportional to the frequency
Histogram
34. • Obtained by joining midpoints of histogram blocks
at the height of frequency by straight lines usually
forming a polygon.
Frequency polygon
35. • when number of observations is very large and class
interval is reduced the frequency polygon losses its
angulations becoming a smooth curve known as
frequency curve
Frequency curve
36. Pictogram
• Popular method of presenting data to the
common man through small pictures or
symbols.
Spot map/shaded map/Cartogram
• These maps are prepared to show geographic
distribution of frequencies of characteristics
37. Measures of statistical averages or
central tendency
• central value around which all the other
observations are distributed.
• Main objective is to condense the entire mass of
dat and to facilitate the comparison.
• the most common measures of central tendency
that are used in sental sciences:
– mean
– median
– mode
38. • Refers to arithmetic mean
• It is obtained by adding the individual observations
divided by the total number of observations.
• Advantages – it is easy to calculate.
most useful of all the averages.
• Disadvantages – influenced by abnormal values.
Mean
39. • When all the observation are arranged either in
ascending order or descending order, the middle
observation is known as median.
• In case of even number the average of the two
middle values is taken.
• Median is better indicator of central value as it is
not affected by the extreme values.
Median
40. • Most frequently occurring observation in a data is called mode
• Not often used in medical statistics.
• EXAMPLE
• Number of decayed teeth in 10 children
• 2,2,4,1,3,0,10,2,3,8
• Mean = 34 / 10 = 3.4
•
• Median = (0,1,2,2,2,3,3,4,8,10) = 2+3 /2
• = 2.5
• Mode = 2 ( 3 Times)
Mode
41. • There are three types of variability
– Biological variability
– Real variability
– Experimental variability
Types of variability
42. Biological variability
• It is the natural difference which occurs in
individuals due to age, gender and other
attributes which are inherent
• This difference is small and occurs by chance
and is within certain accepted biological limits
• e.g. vertical dimension may vary from patient
to patient
43. Real Variability
• Such variability is more than the normal
biological limits
• the cause of difference is not inherent or
natural and is due to some external factors
• e.g. difference in incidence of cancer among
smokers and non smokers may be due to
excessive smoking and not due to chance only
44. Experimental Variability
• It occurs due to the experimental study
• they are of three types
– Observer error
• the investigator may alter some information or not record the
measurement correctly
– Instrumental error
• this is due to defects in the measuring instrument
• both the observer and the instrument error are called non sampling
error
– Sampling error or errors of bias
• this is the error which occurs when the samples are not chosen at
random from population.
• Thus the sample does not truly represent the
population.
45. MEASURES OF DISPERSION
• Dispersion is the degree of spread or variation of
the variable about a central value.
• Helps to know how widely the observations are
spread on either side of the average.
• Most common measures of dispersion are:
1. RANGE
2. MEAN DEVIATION
3. STANDARD DEVIATION
46. RANGE MEAN DEVIATION
STANDARD
DEVIATION
•Defined as the
difference between
the value of the
largest item and the
smallest item.
•Gives no information
about the values that
lie between the
extreme values.
•It is the average of the
deviation from the
arithematic mean.
•M.D= Ʃ(X-Xi)
n
•Ʃ-sum of
•X- arithematic mean
•Xi- value of each
observation in the data
•n- number of
observation in the data
•Most important and
widely used measure of
studying dispersion.
•Greater the S.D , greater
will be the magnitude of
dispersion from the mean.
•Smaller S.D means a
higher degree of
uniformity of the
observations.
• S.D= Ʃ(X-Xi)²
n
47. Coefficient of variation
• It is used to compare attributes having two
different units of measurement e.g. height
and weight
• Denoted by CV
• CV = SD X 100 / Mean
• and is expressed as percentage
48. • When the data is collected from a very large number of
people and a frequency distribution is made with
narrow class intervals, the resulting curve is smooth
and symmetrical- NARROW CURVE.
• These limits on either side of measurement are called
confidence limits .
Normal distribution/normal curve/
Gaussian distribution
49. STANDARD NORMAL DEVIATION
• There may be many normal curves but only one standard
normal curve.
Characteristics
• Bell shaped
• Perfectly symmetrical
• Frequency increases from one side reaches its highest and
decreases exactly the way it had increased .
• Total area of the curve is one, its mean is zero and standard
deviation is one.
• The highest point denotes mean, median and mode which
coincide.
50. Z-TEST
• Used to test the significance of difference in means
for large samples.
Criteria:
1. Sample must be randomly selected.
2. Data must be quantitative.
3. The variable is assumed to follow a normal
distribution in the population.
4. Samples should be larger than 30.
51. • When different samples are drawn from the same
population, the estimates might differ - sampling
variability.
• It deals with technique to know how far the difference
between the estimates of different samples is due to
sampling variation.
a) Standard error of mean
b) Standard error of proportion
c) Standard error of difference between two means
d) Standard error of difference between two proportion.
Tests of significance
52. 1. Standard error of mean: Gives the standard
deviation of the means of several samples from
the same population.
Example : Let us suppose, we obtained a random
sample of 25 males, age 20-24 years whose mean
temperature was 98.14 deg. F with a standard
deviation of 0.6. What can we say of the true mean
of the universe from which the sample was drawn?
53. Standard Error of Proportion
•Standard error of proportion may be defined as a unit that
measures variation which occurs by chance in the proportions of a
character from sample to sample or from sample to population or
vice versa in a qualitative data.
54. Standard Error of Difference Between two Means
•The standard error of difference between the two means is 7 .5.
•The actual difference between the two means is (370 - 318) 52, which is more than
twice the standard error of difference between the two means, and therefore
"significant".
55. Standard Error of Difference Between Proportions
•The standard error of difference is 6 whereas the observed difference (24.4 - 16.2)
was 8.2.
• In other words the observed difference between the two groups is less than twice
the S.E. of difference, i.e., 2 x 6.
• There was no strong evidence of any difference between the efficacy of the two
vaccines. Therefore, the observed difference might be easily due to chance.
56. • A null hypothesis or hypothesis of no difference
(H0) asserts that there is no real difference in
sample and the population in particular matter
under consideration and the difference found is
accidental and arised out of sampling variations.
• The alternative hypothesis of significant
difference (H1) stated that there is a difference
between the two groups compared.
57. • A test of significance such as Z-test is performed to
accept the null hypothesis H0 or to reject it and
accept the alternative hypothesis H1.
• To make minimum error in rejection or acceptance
of H0, we divide the sampling distribution or the
area under the normalcurve into two regions or
zone.
i. A zone of acceptance
ii. A zone of rejection.
58. • The distance from the mean at which H0 is rejected
is called the level of significance.
• It falls in the zone of rejection for H0, shaded areas
under the curves and it is denoted by letter P which,
indicates the probability or relative frequency of
occurrence of the difference by chance.
• Greater the Z value, lesser will be the P.
59. i. Zone of acceptance: If the result of a sample falls in the plain area, i.e. within the
mean ± 1.96 SE the null hypothesis is accepted, hence this area is called the zone of
acceptance for
null hypothesis.
ii. Zone of rejection: If the result of a sample falls in the shaded area, i.e. beyond mean
± 1.96 SE it is significantly different from the universe value. Hence, the H0 of no
difference is rejected and the alternate H1 is accepted. This shaded area, therefore, is
called the zone of rejection for null hypothesis.
60. • Degree of freedom:
Defined as the number of independent members in
the sample.
EXAMPLE:-
X+Y+Z/3=5
Out of 3 values, we can choose only 2 of them
freely, but the choice of the third depends upon
the fact that the total of the three values should be
15.
61. SIGNIFICANCE OF DIFFERENCE BETWEEN MEANS OF
SMALL SAMPLES BY STUDENT’S t-TEST
• Small samples or their Z values do not follow normal
distribution as the large ones do.
• So, the Z value based on normal distribution will not give
the correct level of significance or probability of a small
sample value occurring by chance.
• In case of small samples, t-test is applied instead of Z-test.
• It was designed by W.S.Gossett whose pen name was
Student. Hence, this test is also called Student’s t-test.
62. • There are two types of student t Test
Unpaired t test
Paired t test
Criteria for applying t-test
• 1. Random samples
• 2. Quantitative data
• 3. Variable normally distributed
• 4. Sample size less than 30.
63. • This test is applied to unpaired data of independent
observations made on individuals of two different
or separate groups or samples drawn from two
populations, to test if the difference between the
two means is real or it can be attributed to sampling
variability .
• EXAMPLE: between means of the control and
experimental groups.
Unpaired t test
64. • It is applied to paired data of dependent
observation from one sample only when each
individual given a pair of observations.
• The individual gives a pair of observation i.e.
observation before and after taking a drug
Paired t test
65. The CHI SQUARE TEST FOR QUALITATIVE DATA (X² TEST)
• Developed by Karl Pearson.
• Chi-square (x²) Test offers an alternate method of testing
the significance of difference between two proportions. It
has the advantage that it can also be used when more than
two groups are to be compared.
• It is most commonly used when data are in frequencies
such as in the number of responses in two or more
categories.
66. • Important applications in medical statistics as test
of:
• 1. Proportion
• 2. Association
• 3. Goodness of fit.
• Test of Proportions
• As an alternate test to find the significance of
difference in two or more than two proportions.
67. • Test of Association
• The test of association between two events in
binomial or multinomial samples is the most
important application of the test in statistical
methods. It measures the probability of association
between two discrete attributes.
• Two events can often be studied for their
association such as smoking and cancer, treatment
and outcome of a disease, vaccination and
immunity, nutrition and intelligence, etc.
68. • Test of Goodness of Fit
• Chi-square (χ2) test is also applied as a test of
“goodness of fit”, to determine if actual
numbers are similar to the expected or
theoretical numbers—goodness of fit to a
theory.
69. Analysis of Variance (ANOVA) Test
• Not confined to comparing two sample means, but
more than two samples drawn from corresponding
normal populations.
• Eg. In experimental situations where several
different treatments (various therapeutic
approaches to a specific problem or various drug
levels of a particular drug) are under comparison.
• It is the best way to test the equality of three or
more means of more than two groups.
70. • Requirements
– Data for each group are assumed to be independent and
normally distributed
– Sampling should be at random
• One way ANOVA
– Where only one factor will effect the result between 2
groups
• Two way ANOVA
– Where we have 2 factors that affect the result or outcome
• Multi way ANOVA
– Three or more factors affect the result or outcomes between
groups
71. CORRELATION AND REGRESSION
• Correlation: When dealing with measurement on 2
sets of variable in a same person, one variable may
be related to the other in same way. (i.e change in
one variable may result in change in the value of
other variable.)
• Correlation is the relationship between two sets of
variable.
• Correlation coefficient is the magnitude or degree
of relationship between 2 variables. (varies from -1
to +1).
72. • Obtained by plotting scatter diagram (i.e one variable
on x-axis and other on y-axis).
• Perfect Positive Correlation
• In this, the two variables denoted by letter X and Y are
directly proportional and fully correlated with each
other.
• The correlation coefficent (r) = + 1, i.e. both variables
rise or fall in the same proportion.
• Perfect Negative Correlation
• Values are inversely proportional to each other, i.e.
when one rises, the other falls in the same proportion,
i.e. the correlation coefficient (r) = –1.
74. Regression
• To know in an individual case the value of one variable,
knowing the value of the other, we calculate what is known
as the regression coefficient of one measurement to the
other.
• It is customary to denote the independent variate by x and
the dependent variate by y.
• The value of b is called the regression coefficient of y upon
x. Similarly, we can obtain the regression of x upon y.
75. REFERENCES
• Essentials Of Preventive Community Dentistry –
Dr.Soben Peter. Third Edition
• Essentials Of Preventive Community Dentistry –
Dr.Soben Peter. Fourth Edition
• Mahajan's Methods in Biostatistics for Medical
Students and Research Workers. 8th edition.
• Parks textbook of preventive and social medicine.
18th edition.