2. Introduction
The Chi-square test is one of the most commonly used non-parametric test.
It was introduced by Karl Pearson as a test of association.
The Greek Letter χ2 is used to denote this test.
The chi-squared distribution with k degrees of freedom is the distribution of a
sum of the squares of k independent standard normal random variables.
It is determined by the degrees of freedom.
The simplest chi-squared distribution is the square of a standard normal
distribution.
The chi-squared distribution is used primarily in hypothesis testing.
It can be applied on categorical data or qualitative data using a contingency
table.
Used to evaluate unpaired/unrelated samples and proportions.
3. 3
.It is a mathematical expression, representing
the ratio between experimentally obtained result
(O) and the theoretically expected result (E)
based on certain hypothesis.
It uses data in the form of frequencies (i.e., the
number of occurrence of an event).
Chi-square test is calculated by dividing the
square of the overall deviation in the observed
and expected frequencies by the expected
frequency.
4. 4
Degrees of Freedom
The number of independent pieces of information which are
free to vary, that go into the estimate of a parameter is called
the degrees of freedom.
The degrees of freedom of an estimate of a parameter is
equal to the number of independent scores that go into the
estimate minus the number of parameters used as
intermediate steps in the estimation of the parameter itself
The number of degrees of freedom for ‘n’ observations is
‘n-k’ and is usually denoted by ‘ν ’, where ‘k’ is the number
of independent linear constraints imposed upon them.
5. 5
Chi Square Distribution
The mean of the distribution is equal to the number of
degrees of freedom: μ = v.
The variance is equal to two times the number of
degrees of freedom: σ2 = 2 * v
When the degrees of freedom are greater than or equal
to 2, the maximum value for Y occurs when Χ2 = v - 2.
As the degrees of freedom increase, the chi-square curve
approaches a normal distribution.
6. 6
.If there are two classes, three
classes, and four classes, the
degree of freedom would be 2-1,
3-1, and 4-1.
. In a contingency table, the degree
of freedom is calculated in a
different manner: d.f. = (r-1) (c-1)
where- r = number of row in a
table, c = number of column in a
table.
Thus in a 2×2 contingency table,
the degree of freedom is (2-1 ) (2-
1) = 1.
Similarly, in a 3×3 contingency
table, the number of degree of
7. 7
Characteristics of Chi Square
This test is based on frequencies and not on the
parameters like mean and standard deviation.
The test is used for testing the hypothesis and
is not useful for estimation.
This test possesses the additive property as has
already been explained.
This test can also be applied to a complex
contingency table with several classes and as
such is a very useful test in research work.
This test is an important non-parametric test as
no rigid assumptions are necessary in regard to
the type of population, no need of parameter
values and relatively less mathematical details
are involved.
8. 8
Conditions for applying the Chi-Square test
1. The frequencies used in Chi-Square test must be absolute and not in
relative terms.
2. The total number of observations collected for this test must be large.
3. Each of the observations which make up the sample of this test must
be independent of each other.
4. As λ 2 test is based wholly on sample data, no assumption is made
concerning the population distribution.
5.Expected values greater than 5 in 80% or more of the cells.
6.Moreover, if number of cells is fewer than 5, then all expected values
must be greaterthan 5.
9. 9
Steps Required
Identify the problem
Make a contingency table and note the observed frequency (O) is each classes of one event,
row wise i.e. horizontally and then the numbers in each group of the other event, column
wise i.e. vertically.
Set up the Null hypothesis (Ho); According to Null hypothesis, no association exists between
attributes. This need s setting up of alternative hypothesis (HA).
Calculate the expected frequencies (E).
Find the difference between observed and Expected frequency in each cell (O-E). 6.
Calculate the chi-square value applying the formula. The value is ranges from zero to Infinite.
E
EO 2
)(
2
10. 10
Uses of Chi Square Test
In the test for independence, the
null hypothesis is that the row and
columnvariables are independent of
each other. We have studied earlier,
that the hypothesistesting is done
under the assumption that the null
hypothesis is true Test of goodness of fitThe test
of goodness of fit of a
statistical model measures how
accurately the testfits a set of
observations
Tests for independence of attributes
Test of goodness of fit
11. 11
Steps in Testing Goodness of fit
A Null and Alternative hypothesis established and a
significance level is selected for rejection of null
hypothesis.
A random sample of observations is drawn from a
relevant statistical population.
A set of expected frequencies is derived under the
assumption that the null hypothesis is true.
The observed frequencies compared with the expected
frequencies
The calculated value of Chi-Square goodness of fit test
is compared with the table value. If the calculated
value of Chi-Square goodness of fit test is greater than
the table value, we will reject the null hypothesis and
conclude that there is a significant difference between
the observed and the expected frequency.
12. A certain drug is claimed to be effective in curing cold . in an experiment on
500 persons with cold. half of them were given the drug and half of them
were given the sugar pills. the patients reactions to the treatment are recorded
in the following table.
on the basis of the data can it be concluded that there is significant difference
in the effect of the drug and sugar pills?
Helped Harmed No Effect Total
Drug 150 30 70 250
Sugar Pills 130 40 80 250
Total 280 70 150 500
13. H0:THere is no significant difference in the effect of the drug and
Sugar pills.
Expected Frequency = RT (CT)
GT
140 35 75 250
140 35 75 250
280 70 150 500
14. =3.522
V= (r-1)(c-1)
= (2-1)(3-1)=2
v=2
x2
0.05 = 5.99
The calculated value of Chi Square is less than
the table value. Hence the hypothesis is
accepted. There is no significant difference
in the effect of the drug and sugar pills.
O E (O-E)2 (O-E)2/E
150 140 100 0.714
130 140 100 0.714
30 35 25 0.714
40 35 25 0.714
70 75 25 0.333
80 75 25 0.333
3.522
E
EO 2
)(
2
15.
16. 16
Limitations
A reasonably strong
association may not
come up as
significant if the
sample size is small,
and conversely, in
large samples.
chi-square is
highly sensitive to
sample size. As
sample size
increases, absolute
differences become
a smaller and
smaller proportion
of the expected
value.
Chi-square is also
sensitive to small
frequencies in the
cells of tables.
01
02
03
04
we may find statistical
significance when the
findings are small and
uninteresting., i.e., the
findings are not
substantively
significant, although
they are statistically
significant.
17. 17
Conclusions
80%
50%
10%
30%
70%
50%
20%
60%
The rule of thumb here is that if
either (i) an expected value in a
cell is less than 5 or (ii) more
than 20% of the expected values
in cells are less than 5, then chi-
square should not and usually is
not computed.