Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Cluster Sampling, Schemes and Mind Maps of Design history

University of San Agustin (USA)Design history

A cluster sample is a probability sample in which each sampling unit is a collection or a group of elements. • It is useful when: (i) A list of elements of the ...

Typology: Schemes and Mind Maps

2021/2022

Uploaded on 08/01/2022

hal_s95 🇵🇭

4.4

(655)

10K documents

1 / 35

This page cannot be seen from the preview

Don't miss anything!

Cluster Sampling

• A cluster sample is a probability sample in which each sampling unit is a collection or a group of

elements.

• It is useful when:

(i) A list of elements of the population is not available but it is easy to obtain a list of clusters.

(ii) The cost of obtaining observations increases as the distance that separates the elements.

• If only a sample of elements is taken from each selected cluster, the method is known as two-stage

sampling.

• Often a hierarchy of clusters is used: First some large clusters are selected, next some smaller clusters

are drawn within the selected large clusters, and so on until finally elements are selected within the

final-stage clusters.

Discover Schemes and Mind Maps of Design history University of San Agustin (USA)

Partial preview of the text

Download Cluster Sampling and more Schemes and Mind Maps Design history in PDF only on Docsity!

Cluster Sampling

A cluster sample is a probability sample in which each sampling unit is a collection or a group of elements.
It is useful when:

(i) A list of elements of the population is not available but it is easy to obtain a list of clusters. (ii) The cost of obtaining observations increases as the distance that separates the elements.

If only a sample of elements is taken from each selected cluster, the method is known as two-stage sampling.
Often a hierarchy of clusters is used: First some large clusters are selected, next some smaller clusters are drawn within the selected large clusters, and so on until finally elements are selected within the final-stage clusters.

EXAMPLE: In a survey of students from a city, we first select a sample of schools, then we select a sample of classrooms within the selected schools, and finally we select a sample of students within the selected classes.
This general method is known as multistage sampling, although it is also sometimes loosely described as cluster sampling.
Although strata and clusters are both groupings of elements, they serve for entirely different sampling purposes.
Since strata are all represented in the sample, it is advantageous if they are internally homogeneous in the survey variables.
As only a sample of clusters are sampled, the ones selected need to represent the ones unselected; this is best done when the clusters are as internally heterogeneous in the survey variables as possible.

The estimator of the population mean μ is the sample mean ¯y, given by

y ¯ =

∑ni=1 yi ∑ni=1 mi

The estimator has the form of a ratio estimator , therefore the estimated variance of y¯ is

V ar̂ (¯y) =^ (^ N^ −^ n N nM 2

) ∑ni=1 (yi − ym¯ i) 2 n − 1 where the average cluster size for the population (M ) can be estimated by m¯ if M (the number of elements in the population) is unknown.

The estimated variance is biased, except if the cluster sizes mi are equal.

Anyway, it is a good estimator of V ar(¯y) if n ≥ 20.

Example : A firm is interested in estimating the average per capita income in a certain city. There is not an available list of resident adults. The city is marked off into rectangular blocks, except for two industrial areas and three parks which contain a few houses. The researchers decide that each of the city blocks will be considered a cluster, the two industrial areas will be considered a cluster and, finally, the three parks will be considered a cluster. The clusters are numbered from 1 to 60 and there is budget for sampling n = 20 clusters and to interview every household within each cluster.

Number of 55 60 63 58 71 78 69 58 52 71 Residents mi 73 64 69 58 63 75 78 51 67 70 Total Income 2210 2390 2430 2380 2760 3110 2780 2370 1990 2810 per cluster yi 2930 2470 2830 2370 2390 2870 3210 2430 2730 2880

mu.hat <- sum(y)/sum(m.vec) s2.c <- sum((y-(mu.hatm.vec))ˆ2)/(n-1) var.mu.hat <- ((N-n)/(Nnmbarˆ2))s2.c

B <- 2*sqrt(var.mu.hat) cbind(mu.hat,s2.c,var.mu.hat,B) }

Example

m <- c(55,60,63,58,71,78,69,58,52,71, 73,64,69,58,63,75,78,51,67,70) y <- c(2210,2390,2430,2380,2760,3110,2780, 2370,1990,2810,2930,2470,2830,2370, 2390,2870,3210,2430,2730,2880)

cluster.mu(60,m.vec=m,y,total=T,M=NA)

Cluster sampling is an ideal situation to use pps sampling (sampling with probabilities proportional to size), since the number of elements in a cluster mi forms a natural measure of the size of the cluster and it is convenient to sample with probabilities proportional to mi.
In this case, πi = m Mi and the estimator of the population mean μ is

μ ˆpps =^1 n ∑^ n i=1y^ ¯i where y¯i is the mean for the i-th cluster, and the estimated variance of μˆpps is

V ar̂ (ˆμpps) = (^) n(n^1 − 1)∑^ n i=1^ (¯yi^ −^ μˆpps)

Assume that the sample elements are divisions 3, 6 and 8 where the total number of sick days are respectively

yi = 4320 y 2 = 4160 y 3 = 5790 In this case, ¯yi = 43202100 = 2. 06 y¯ 2 = 41601910 = 2. 18 y¯ 3 = 57903200 = 1. 81

hence, ˆμpps =^13

∑^3

i=1y^ ¯i^ = 2.^02

The estimated variance of μˆpps is

V ar̂ (ˆμpps) = (^3 1) · 2 ∑^3 i=1^ (¯yi^ −^ μˆpps)

And the interval with a 95% of confidence is
1. 02 ± 1. 96 · √ 0. 012 ⇒ [1.8053; 2.2347]

Cluster Sampling with Stratification

Cluster sampling can be combined with stratified sampling, because a population can be divided in L strata and a cluster sample can be selected from each stratum.
As in the case of ratio estimators we can consider separate estimators and combined estimators.
Usually the total number of elements in each cluster is not known and we cannot calculate weights. Then, the usual estimators in cluster sampling are the combined estimators.

Program in Stata :

use http://www.ats.ucla.edu/stat/stata/seminars/svy_stata_intro/oscs1, clear

use C:\QM\Eje1Cluster.dta, clear count

fpc=757^ (total:^757 school^ districts)
pw=757/189^ (sample^ of^189 districts)
dnum: Identification^ number^ of^ each district

svyset dnum [pweight=pw], fpc(fpc) svydes

svy: mean api svy: total stype

Compute the average^ proportion^ of^ English language learners
and students^ eligible^ for^ subsidized school meals for elementary,
middle, and high^ schools

svy: mean ell meals, over(stype)

Regression models^ show^ that^ these^ socioeconomic variables
predict API score^ and^ whether^ the^ school achieved
its API target

svy: reg api00 ell meals

Compute the average proportion of English language learners

and students eligible for subsidized school meals for elementary,

middle, and high schools

svyby(∼ell+meals, ∼stype, design=dclus1, svymean)

Regression models show that these socioeconomic variables

predict API score and whether the school achieved

its API target

regmodel <- svyglm(api00 ∼ ell + meals, design=dclus1) summary(regmodel)

Observations :

With cluster sampling, the smaller the size of the clusters the better is. When there is a hierarchy of clusters, the smallest ones will generally be the preferred choice.
For example, in a High School example, the students could be grouped by grade levels or classes; here grade levels are too large to serve as clusters for sampling purposes, and classes are the obvious choice.
The problem with cluster sampling is that, because clusters usually comprise existing groupings that were formed for other purposes, the lowest level of clustering still often yields clusters that are too large to be used efficiently in cluster sampling.
The solution to this problem is to divide the clusters into sub-clusters for sampling purposes; essentially this is what is done in multistage sampling.

In two-stage cluster sampling, the sample of elements is obtained as a result of two stages of sampling.
The population elements are first grouped into disjoint subpopulations, called primary sampling units ( PSU ). Then, in a first-stage sampling, a sample of PSU is drawn.
In the second-stage sampling units ( SSU ) may be clusters of elements, for each PSU in the first-stage sample.
A sample of SSU is drawn (second-stage sampling) from each PSU in the first-stage sample. When the SSU are clusters, every element in the selected SSU is surveyed.

Example of cluster sampling

The Swedish Board of Education take annual surveys in Sweden to measure drug use among youngster students. Data on drug use is collected through anonymous questionnaires from every student in a sample of ninth-grade classes. The sampling frame consists of a list of all ninth-grade classes.

Example of two-stage cluster sampling, with schools as PSU and with classes as SSU : (i) A sample of schools is drawn from a frame containing all the schools in the country. (ii) From every selected school, a sample of ninth-grade classes is drawn and all students in the selected classes are surveyed.

Cluster Sampling, Schemes and Mind Maps of Design history

Related documents

Partial preview of the text

Download Cluster Sampling and more Schemes and Mind Maps Design history in PDF only on Docsity!

Cluster Sampling

Example

∑^3

Cluster Sampling with Stratification

Compute the average proportion of English language learners

and students eligible for subsidized school meals for elementary,

middle, and high schools

Regression models show that these socioeconomic variables

predict API score and whether the school achieved

its API target