The document discusses sampling and sampling distributions for estimation. It notes that sampling is used when the population is too large to observe entirely, like India's population of TV viewers. Random sampling of 10,000 TV sets is used to determine viewing preferences. The chapter examines questions around sample size, selection methods, and knowing when a sample accurately reflects the population. Simple random sampling, systematic sampling, stratified sampling and cluster sampling are probability sampling methods discussed. The central limit theorem states that as sample size increases, the sampling distribution of means approaches a normal distribution.
2. India’s population = 132 Cr.
TV Viewership = 66 Cr.
No. of TV Sets = 16 Cr. (hypothetical)
We want to determine what programs Indian watch and 10000
TV sets are sampled to determine for this.
Why select only 1000 sets out of 16 Cr.
Because time and average cost of interview prohibit the
rating companies from trying to reach millions of people.
2
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
3. IN THIS CHAPTER, WE EXAMINE
QUESTIONS SUCH AS
How many people should be interviewed?
How should they be selected?
How do we know when our sample accurately
reflects the entire population?
3
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
4. WHY SAMPLING?
The testing process is destructive (Time Constraint)
The population is too large to be completely tested
It is almost impossible to define the population
Average Cost is too high
4
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
5. DEFINITIONS
Population: All items that have been chosen for study. It is
also called Census.
Sample: A portion chosen from the population.
Parameters: Characteristics that describe a population
Statistics: Characteristics that describe a sample
Census: Process of obtaining responses from/about each
member of the population
Sampling: Process of selecting a subset from members of the
population
5
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
6. CONVENTIONS TO BE USED
Characteristics Population
Parameter
Sample Statistics
Size N n
Mean µ ҧ𝑥
Std. Deviation σ s
Proportion p or π ҧ𝑝 or p
6
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
7. SAMPLING METHODS
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
7
A sample in which the
probability that an
element of population will
be drawn is not known
Classifications :
Convenience Sampling
Judgemental Sampling
Voluntary Response
Sampling
A sample in which the
probability that an
element of population will
be drawn is known.
It is also called random
sampling
Methods:
Simple Random Sampling
Systematic Sampling
Stratified Sampling
Cluster Sampling
Non Probability SM Probability SM
8. SIMPLE RANDOM SAMPLING
Simple Random Sampling selects samples by methods that
allow each possible sample to have an equal probability of
being picked and each item in the entire population to have an
equal chance of being included in the sample.
Ex: Selecting a pair of 2 students from four students A,B,C,D
How to do Random Sampling:
The easiest way is the use of random numbers. These numbers can be
generated by a computer programmed to scramble numbers or by a
table of random numbers/digits.
Another method is to write the name of each number on a slip of
paper and deposit the slips in a box.
8
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
9. SIMPLE RANDOM SAMPLING
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
9
It eliminates bias, hence is
more representative of the
population.
This theory is more
reliable & highly
developed
It saves time & effort
Requires an upto date &
complete list of population
units to be sampled.
If area of coverage is large,
random samples are also
widely scattered
geographically.
Merits Demerits
10. SYSTEMATIC SAMPLING
In systematic sampling, elements are selected from the
population at a uniform interval that is measured in time,
order or space.
Ex: If we wanted to interview every 20th student on a
college campus, we would chose a random starting point in
the first 20 names in the student directory and then pick
every 20th name thereafter.
10
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
11. STRATIFIED RANDOM SAMPLING VS
In stratified random sampling, we divide the population
into relatively homogeneous called strata.
Each group has small variation within itself but there is a
wide variation between the groups.
In cluster random sampling, we divide the population into
groups or clusters and then select a random sample of these
clusters.
Each group has considerable variation within itself but there
is a noticeable similarity between the groups.
11
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
12. SAMPLING DISTRIBUTIONS
Sampling Distribution of the Mean: It is a
probability distribution of all the possible means of
the samples is a distribution of the sample means.
Ex: Suppose our samples each consist of ten 25 year
old women from a city with a population of 1,00,000.
By computing the mean height and SD of each of
these samples, we would quickly see that mean and
SD of each sample would be different.
Sampling Distribution of Proportion……… refers to
the proportion instead of mean
12
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
13. SAMPLING DISTRIBUTION – EXAMPLE
Population Sample Sample
Statistics
Sampling
Distribution
All professional
basketball teams
Group of 5
players
Mean Height SD of Mean
All parts
produced by
manufacturing
process
50 parts Proportion
defective
SD of proportion
13
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
14. A SAMPLING DISTRIBUTION
Let’s create a sampling distribution of means…
Take a sample of size 1,500 from the US. Record the mean
income. Our census said the mean is $30K.
$30K
15. A SAMPLING DISTRIBUTION
Let’s create a sampling distribution of means…
Take another sample of size 1,500 from the US. Record the
mean income. Our census said the mean is $30K.
$30K
16. A SAMPLING DISTRIBUTION
Let’s create a sampling distribution of means…
Take another sample of size 1,500 from the US. Record the
mean income. Our census said the mean is $30K.
$30K
17. A SAMPLING DISTRIBUTION
Let’s create a sampling distribution of means…
Take another sample of size 1,500 from the US. Record the
mean income. Our census said the mean is $30K.
$30K
18. A SAMPLING DISTRIBUTION
Let’s create a sampling distribution of means…
Take another sample of size 1,500 from the US. Record the
mean income. Our census said the mean is $30K.
$30K
19. A SAMPLING DISTRIBUTION
Let’s create a sampling distribution of means…
Take another sample of size 1,500 from the US. Record the
mean income. Our census said the mean is $30K.
$30K
20. A SAMPLING DISTRIBUTION
Let’s create a sampling distribution of means…
Let’s repeat sampling of sizes 1,500 from the US. Record the
mean incomes. Our census said the mean is $30K.
$30K
21. A SAMPLING DISTRIBUTION
Let’s create a sampling distribution of means…
Let’s repeat sampling of sizes 1,500 from the US. Record the
mean incomes. Our census said the mean is $30K.
$30K
22. A SAMPLING DISTRIBUTION
Let’s create a sampling distribution of means…
Let’s repeat sampling of sizes 1,500 from the US. Record the
mean incomes. Our census said the mean is $30K.
$30K
23. A SAMPLING DISTRIBUTION
Let’s create a sampling distribution of means…
Let’s repeat sampling of sizes 1,500 from the US. Record the
mean incomes. Our census said the mean is $30K.
$30K
The sample means would stack
up in a normal curve. A normal
sampling distribution.
24. A SAMPLING DISTRIBUTION
Say that the standard deviation of this distribution is $10K.
Think back to the empirical rule. What are the odds you
would get a sample mean that is more than $20K off.
$30K
The sample means would stack
up in a normal curve. A normal
sampling distribution.
-3z -2z -1z 0z 1z 2z 3z
25. A SAMPLING DISTRIBUTION
Say that the standard deviation of this distribution is $10K.
Think back to the empirical rule. What are the odds you
would get a sample mean that is more than $20K off.
$30K
The sample means would stack
up in a normal curve. A normal
sampling distribution.
-3z -2z -1z 0z 1z 2z 3z
2.5% 2.5%
26. STANDARD ERROR (S.E.)
The standard deviation of the distribution of a
sample statistic is known as the standard error of
the statistic.
SE indicates how spread out (dispersed) the means
of the sample are.
SE indicates not only the size of the chance error
that has been made, but also the accuracy we are
likely to get if we use a sample statistic to estimate
a population parameter.
A distribution of sample means that is less spread
out (having small SE) is a better estimator of the
population mean 26
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
27. The expected value of the sample mean is equal to the population mean:
E X X X
( )
The variance of the sample mean is equal to the population variance divided by
the sample size:
V X
nX
X
( )
2
2
The standard deviation of the sample mean, known as the standard error of
the mean, is equal to the population standard deviation divided by the square
root of the sample size:
n
XSD X
X
)(s.e.
RELATIONSHIPS BETWEEN POPULATION PARAMETERS AND
THE SAMPLING DISTRIBUTION OF THE SAMPLE MEAN
28. CENTRAL LIMIT THEOREM
As sample size increases, the sampling distribution of means
approaches normal distribution, irrespective of the nature of
population distribution.
As a thumb rule, for n≥30, SDM is taken to be normally
distributed.
This is called Central Limit Theorem.
The significance of CLT is that it permits us to use sample
statistics to make inferences about population parameters
without knowing anything about the shape of the frequency
distribution of that population.
Sample means from population which are normally distributed
are also normally distributed regardless of size if sample.
28
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
29. CONVENTIONS TO BE USED
Characteristics Population
Parameter
Sample Statistics
Size N n
Mean µ ҧ𝑥
Std. Deviation σ s
Proportion p or π ҧ𝑝 or p
29
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
30. WORKING METHODOLOGY
Make sure population is infinite i.e. N is not
given
Check whether n≥30; if yes, SDM is considered to
be normally distributed
Find Z score using formula:
Z =
𝑥 − 𝜇 ҧ𝑥
𝜎 ҧ𝑥
where
𝑥 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛;
𝜇 ҧ𝑥 = Mean of Means; 𝜇 ҧ𝑥 = 𝜇
𝜎 ҧ𝑥 =
𝜎
𝑛
30
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
31. PRACTICE PROBLEMS – SDM / CLT
A bank calculates that its individual savings accounts are having a
mean of $2000 and SD of $600. If the bank takes a random sample of
100 accounts, what is the probability that the sample mean will lie
between $1900 and $2050? (0.75)
31
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
32. PRACTICE PROBLEMS – SDM / CLT
A continuous manufacturing process produces items whose
weights are normally distributed with a mean of 8 kg and SD of 3
kg. A random sample of 16 items is to be drawn. What is the
probability that sample mean exceeds 9 kgs. (9.18%)
32
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
33. THE FINITE POPULATION MULTIPLIER
Most of the populations decision are examined on finite
population i.e. it has limited size.
Standard Error of the mean for Finite Population is given by:
𝜎 ҧ𝑥 =
𝜎
𝑛
𝑥
𝑁 −𝑛
𝑁 −1
𝑤ℎ𝑒𝑟𝑒
𝑁 −𝑛
𝑁 −1
is called Finite Population Multiplier
N = Size of population
n = sample size
Population & Sampling Ratio
If n/N > 0.05; population is finite
If n/N ≤ 0.05; population is infinite
When the sampling fraction is less than 0.05, the finite population multiplier need
not to be used.
33
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
34. PRACTICE PROBLEMS
From a population of 125 items with a mean of 105 and SD of
17, 64 items were chosen.
Find Standard Error. (1.4904)
What is the P(107.5 < ҧ𝑥 < 109)? (0.0428)
34
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
35. PRACTICE PROBLEMS
From a population of 75 items with a mean of 364 and
Variance of 18, 32 items were chosen.
Find Standard Error.
What is the P(363 < ҧ𝑥 < 366)?
35
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
37. ESTIMATION
When you are ready to cross a street, you
estimate the speed of the car that is approaching
towards you, the distance between you and the
car and your own speed.
Based on these quick estimates, you decide
whether to wait, walk or run…..
37
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
38. REASONS FOR ESTIMATES
Unit head estimates of next year admissions
Credit Manager estimates whether a purchase will eventually
pay his bills
Homemakers estimate about the increase in commodity prices
All these people make estimates without worry about whether
they are scientific but with the hope that the estimates bear a
reasonable resemblance to the outcome.
38
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
39. TYPES OF ESTIMATES
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
39
A single number that is
used to estimate an
unknown population
parameter.
Ex: Department head makes
an estimate that our current
data indicates that MBA
course will have 300
students in the next year.
It indicates the errors in two
ways:
Often insufficient as it is
either right or wrong.
Evaluation of precision of
estimator is not possible.
Range of values that is used
to estimate an unknown
population parameter.
Ex: Department head makes
an estimate that our current
data indicates that MBA
course will have 280-320
students in the next year.
It indicates the errors in two
ways:
Extent of range
Probability of true population
parameter lying within that
range.
Point Estimate Interval Estimate
40. ESTIMATOR & ESTIMATES
An estimator is a sample statistic used to estimate a population
parameter.
Sample Mean ҧ𝑥 can be a estimator of the Population Mean µ.
Sample Proportion ҧ𝑝 can be a estimator of the Population Proportion p.
An estimate is a specific observed value (numerical value) of a
statistic.
40
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
Population in which
we are interested
Population
Parameter we wish
to Estimate
Sample Statistic we
will use as an
Estimator
Estimate we make
Employees in a
furniture factory
Mean turnover per
year
Mean turnover for a
period of 1 month
8.9% turnover per
year
Teenagers in a given
community
Proportion who have
criminal record
Proportion of a
sample of 50
teenagers
2% have criminal
records
41. CHARACTERISTICS (CRITERIA) OF A GOOD
ESTIMATOR
It should be unbiased: Sample mean is an unbiased estimator of
population because mean of sampling distribution of means is equal
to the population mean i.e. µ ҧ𝑥 = µ
It should be efficient: Efficiency refers to the size of the standard
error of the statistic. The distribution with small standard error or
deviation is preferred.
It should be consistent: Large samples are always more consistent.
As sample size increases, it becomes almost certain that the value of
the statistic comes very close to the value of the population
parameter.
It should be sufficient: No other estimator could be able to extract
more information from the sample being estimated. 41
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
42. PRACTICE PROBLEMS – POINT ESTIMATES
ABC Co. Ltd is considering expanding its seating capacity and
needs to know both the average number of people who attend
events there and the variability in this number. The following
are the attendances (in thousands) at nine randomly selecting
sporting events. Find point estimates of the mean and the
variance of the population from which sample was drawn. 8.8,
14.0, 21.3, 7.9, 12.5, 20.6, 16.3, 14.1, 13.0
(14.28, 21.12)
42
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
43. INTERVAL ESTIMATE
Interval Estimate: Range of values within which a
population parameter is likely to be.
Confidence Level: Probability that is associated with an
interval estimate.
Confidence Interval: Range of estimate for a given
confidence level.
ഥ𝒙 − 𝒛 𝝈ഥ𝒙 ≤ µ ≤ ഥ𝒙 + 𝒛 𝝈ഥ𝒙
43
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
Sample Mean
(Point Estimate
of Mean)
Confidence
Coefficient
Standard
Error
Population
Mean
45. INTERVAL ESTIMATES OF MEAN FROM
LARGE SAMPLES
There are two cases:
Case 1: When Population SD is known
Case 2: When Population SD is not known
45
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
46. COMPUTATIONAL PROCEDURE
Choose level of confidence
Find ‘Z’ for chosen level
Compute Standard Error
If σ is known
For infinite population: 𝜎 ҧ𝑥 =
𝜎
𝑛
For finite population: 𝜎 ҧ𝑥 =
𝜎
𝑛
𝑥
𝑁 −𝑛
𝑁 −1
If σ is not known
𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝐸 = 𝑠 ҧ𝑥 = ො𝜎 ҧ𝑥 =
𝑠
𝑛
where Sample SD = s = ො𝜎 =
Σ 𝑥− ҧ𝑥 2
𝑛−1
s = Sample SD is used to estimate of the population SD
Construct Confidence Interval
𝑳𝑪𝑳 = ഥ𝒙 − 𝒛 𝝈ഥ𝒙
𝑳𝑪𝑳 = ഥ𝒙 + 𝒛 𝝈ഥ𝒙 46
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
47. PRACTICE PROBLEMS – ESTIMATION
Sample mean life of 200 batteries of a make is 36 months.
Estimate the mean life of that make of batteries with 95%
confidence. Standard Deviation of population is known to be 10
months. (34.61 ≤ µ≤ 37.39)
47
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
48. PRACTICE PROBLEMS – ESTIMATION
50 randomly selected pieces of plastic rope had a mean
breaking strength of 25 psi & SD of 1.4 psi. Find mean
breaking strength at 99% confidence level. (psi = pounce per
square inch) (24.49 ≤ µ≤ 25.51)
48
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
49. PRACTICE PROBLEMS – ESTIMATION
A large automotive parts wholesaler needs an estimate of the
mean life it can expect from windshield wiper blades under
typical driving conditions, Already, management has
determined that the SD of the population life is 6 months. A
random sample of 100 wiper blades has been selected with
mean life of 21 months. Find an interval estimate of mean life
with confidence level of 95%. (19.82 ≤ µ≤ 22.18)
49
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
50. PRACTICE PROBLEMS – ESTIMATION
From a population of 540, a sample of 60 individuals is taken.
From this sample, the mean is found to be 6.2 and the SD is
1.368.
Find the estimated standard error of the mean (0.167)
Construct a 96 percent confidence interval of the mean.
(5.86 ≤ µ≤ 6.54)
50
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
51. INTERVAL ESTIMATES OF MEAN FROM
SMALL SAMPLES (T DISTRIBUTION)
In certain cases, where normal distribution is not the
appropriate sampling distribution i.e. when we are estimating
the population SD and the sample size is small i.e. less than 30
In such cases, other distribution is appropriate called t –
distribution
Also called Student’s distribution
The second condition is that population standard deviation
must be unknown.
51
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
52. T - DISTRIBUTION
The shape of the t distribution is very similar to the shape
of the standard normal distribution.
The t distribution has a (slightly) different shape for each
possible sample size.
They are all symmetric and unimodal.
They are somewhat broader than Z, reflecting the
additional uncertainty resulting from using s in place of .
As n gets larger and larger, the shape of the t distribution
approaches the standard normal.
Contains more area under tails.
We need to know degree of freedom in t distribution. If
sample size is n, then df = n – 1.
52
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
53. CONDITIONS FOR T DISTRIBUTION
n<30
Populations SD (σ) is not known.
Populations assumed to be normal or nearly normal
Note:
Since σ is not known, ො𝜎 ҧ𝑥 is used in lieu of 𝜎 ҧ𝑥
Interval Estimation of Population Mean is
ഥ𝒙 −𝒕ෝ𝝈ഥ𝒙≤ µ ≤ ഥ𝒙 +𝒕 ෝ𝝈ഥ𝒙 where t =
ഥ𝒙 − µ
ෝ𝜎ഥ𝑥
In t-distribution table, it shows area and t-values for
only few %ages (10,5,2,1)
53
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
54. COMPUTATIONAL PROCEDURE
Choose Confidence Level
Find total chance of error i.e. α = 1 – CL
Find degree of freedom i.e. df = n – 1
Extract t value using df & α
Compute estimate intervals.
54
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
55. PRACTICE PROBLEMS – T DISTRIBUTION
Determine the 95% Confidence Interval for mean burning time
of marine flares if 9 flares were tested and yielded a mean
burning time of 40 minutes with a SD of 10 minutes.
(32.32 ≤ µ≤ 47.68)
55
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
56. PRACTICE PROBLEMS – T DISTRIBUTION
Seven homemakers were randomly sampled and it was
determined that the distances they walked in their housework
had an average of 39.2 miles per week and a SD of 3.2 miles
per week. Construct a 95% confidence interval for the
population mean (36.24 ≤ µ≤ 42.16)
56
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
57. DECISION FLOW DIAGRAM -
ESTIMATION
57
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
Start
Is
n≥30
Is pop.
Known to
be
normally
distributed
Use ‘Z’ table Stop
Use a
Statistician
Is SD
known
?
Use ‘Z’
table
Stop
Use ‘t’
table
Stop
Not
Known
Known
58. SAMPLING DISTRIBUTION OF
PROPORTIONS (SDP)
Means Proportions
Population Mean
µ p
Sample Mean
ҧ𝑥 ҧ𝑝
Mean of SDM
µ ҧ𝑥 = µ µ ҧ𝑝 = p
SD of SDM
σ ҧ𝑥 σ ҧ𝑝
Estimation of SDM
ො𝜎 ҧ𝑥 ො𝜎 ҧ𝑝
58
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
60. PRACTICE PROBLEMS – SDP
A TV company wishes to find out the proportion of families in a
city who owns a TV. A sample survey of 400 families revealed
that 320 of them owned a TV. Can we estimate with 95%
confidence the percentage of families in entire city who own a
TV. (76.08% ≤ p≤ 83.92%)
60
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
61. PRACTICE PROBLEMS – SDP
Delhi police intends to introduce a new uniform for officers
cadre. A survey estimates the proportion of officers who would
prefer change. Results showed that 45 out of 75 favored
change. Estimate the population proportion in favor of proposal
with 90% confidence level. (50.65% ≤ p≤ 69.35%)
61
BirinderSingh,AssistantProfessor,PCTE
Ludhiana
62. PRACTICE PROBLEMS – SDP
Dr. Benjamin, a noted social psychologist, surveyed 150 top
executives and found that 42% of them were unable to add
fractions correctly.
Estimate the standard error of the population. (0.0403)
Construct a 99% confidence interval for the true proportion of top
executives who cannot correctly add fractions. (0.316 ≤ p≤ 0.524)
62
BirinderSingh,AssistantProfessor,PCTE
Ludhiana