Multi-stage Sampling: A Complex Form of Cluster Sampling, Lecture notes of Law

An in-depth explanation of multi-stage sampling, a more complex form of cluster sampling. It covers the concept, its use in surveying teachers in Enugu, Nigeria, and the difference between it and convenience sampling. Additionally, it discusses the central limit theorem, normal distribution, mean, variance, descriptive and inferential statistics, and ANOVA. useful for university students studying statistics, research methods, or sociology.

Typology: Lecture notes

2020/2021

Uploaded on 06/20/2022

jazmine-ary
jazmine-ary 🇺🇸

3 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
lecture notes CJ 3347
*Multi-stage Multi-cluster sampling
AKA Area Probability Sampling - "representative of a population"
With or without replacement
No frame, but map
multi-stage multi-cluster sampling. For example, this may have a sampling
frame of addresses in the United States. However, this results in sampling
bias due to the fact that only individuals with an official, registered address
have the ability to be selected.
Multi-stage sampling represents a more complicated form of cluster
sampling in which larger clusters are further subdivided into smaller, more
targeted groupings for the purposes of surveying.
EX: In Iyoke et al. (2006) Researchers used a multi-stage sampling design
to survey teachers in Enugu, Nigeria, in order to examine whether
socio-demographic characteristics determine teachers' attitudes towards
adolescent sexuality education. First-stage sampling included a simple
random sample to select 20 secondary schools in the region. The second
stage of sampling selected 13 teachers from each of these schools, who
were then administered questionnaires.
*Convenience sampling
or non-probability sampling. This is often referring to convenience samples
(using what you've got).
Convenience sampling is a non-probability sampling technique where
subjects are selected because of their convenient accessibility and
proximity to the researcher.
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Multi-stage Sampling: A Complex Form of Cluster Sampling and more Lecture notes Law in PDF only on Docsity!

lecture notes CJ 3347

*Multi-stage Multi-cluster sampling

AKA Area Probability Sampling - "representative of a population"

With or without replacement

No frame, but map

multi-stage multi-cluster sampling. For example, this may have a sampling

frame of addresses in the United States. However, this results in sampling

bias due to the fact that only individuals with an official, registered address

have the ability to be selected.

Multi-stage sampling represents a more complicated form of cluster

sampling in which larger clusters are further subdivided into smaller, more

targeted groupings for the purposes of surveying.

EX: In Iyoke et al. (2006) Researchers used a multi-stage sampling design

to survey teachers in Enugu, Nigeria, in order to examine whether

socio-demographic characteristics determine teachers' attitudes towards

adolescent sexuality education. First-stage sampling included a simple

random sample to select 20 secondary schools in the region. The second

stage of sampling selected 13 teachers from each of these schools, who

were then administered questionnaires.

*Convenience sampling

or non-probability sampling. This is often referring to convenience samples

(using what you've got).

Convenience sampling is a non-probability sampling technique where

subjects are selected because of their convenient accessibility and

proximity to the researcher.

*You are concerned about your standard error being too large for your data.

What concept states that if you increase your sample size, your standard

error will decrease?

Law of Large Numbers

*A colleague of yours is concerned that your data will not approximate a normal distribution, which is important when conducting a regression analysis. What do you say to refute this claim? A theoretical probability was used to select each case, therefore we have best approximated the population distribution *After having a conversation with your colleague about your distribution, they still don't understand how it is considered a normal distribution. Explain further by saying… The central limit theorem states that our large sample size has increased our empirical probabilities, therefore our distribution is normal. The central limit theorem also states that as your sample size approaches the population, your standard error decreases and your sample mean will get closer to the population mean.

*Explain what a normal distribution looks like.

It has a mean of x̄ = 0 and a standard deviation of s = 1

Its properties are expressed in standardized scores

Symmetric, unimodal, theoretical distribution

50% of values left (less than mean) and right of center (greater than mean)

Properties expressed in z-scores

SO: 𝑋bar = 0, s = 1 and ~68% of distribution between +1.00 and -1.

Particular to normal shape, does not apply to skewed distributions

Mean=Median=Mode (all in the center)

Symmetry about the center

BELL CURVE

*Variance Summation notation s squared= Σ(𝑋−𝑋bar)squared / 𝑛 − The average of the squared differences from the Mean. n - 1 is the degrees of freedom You have one less than the sample size of cases to randomly assign Less biased calculation To calculate the variance: Find the mean. Calculate deviations from the mean for each value (X - Xbar) Square each of these values. Why do we do this? Sum the squared deviations (sum of squares = SS = Σ(𝑋−Xbar)squared). Divide the sum of squares by n - 1. Steps:

  1. Work out the Mean (the simple average of the numbers)
  2. Then for each number: subtract the Mean
  3. and square the result (the squared difference).
  4. Then work out the average of those squared differences

*Sample

A sample is a set of observations drawn from a population.

A sample is a subset of people, items, or events from a larger population

that you collect and analyze to make inferences. To represent the

population well, a sample should be randomly collected and adequately

large.

If the sample is random and large enough, you can use the information

collected from the sample to make inferences about the population. For

example, you could count the number of apples with bruises in a random

sample and then use a hypothesis test to estimate the percentage of all the

apples that have bruises.

*Descriptive statistics

They provide simple summaries about the sample and the measures.

Together with simple graphics analysis, they form the basis of virtually

every quantitative analysis of data.

Descriptive statistics are brief descriptive coefficients that summarize a

given data set, which can be either a representation of the entire population

or a sample of it. Descriptive statistics are broken down into measures of

central tendency and measures of variability, or spread.

With descriptive statistics you are simply describing what is or what the

data shows.

*Inferential statistics The mathematical procedures whereby we convert information about the sample into intelligent guesses about the population fall under the rubric of inferential statistics. EX: Blood samples, sampling pizza Two methods: -Interval estimation -Hypothesis testing Both use sample statistics to make estimations about population parameters HT is more common (in CJ) For instance, we use inferential statistics to try to infer from the sample data what the population might think. Or, we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study. Inferential statistics are techniques that allow us to use these samples to make generalizations about the populations from which the samples were drawn. It is, therefore, important that the sample accurately represents the population. The process of achieving this is called sampling (sampling strategies are discussed in detail here on our sister site). Inferential statistics arise out of the fact that sampling naturally incurs sampling error and thus a sample is not expected to perfectly represent the population. Population too large so take a sample instead.

To use the F-test to determine whether group means are equal, it's just a matter of including the correct variances in the ratio. In one-way ANOVA, the F-statistic is this ratio: F = variation between sample means / variation within the samples *R² ( look more at equation) The proportion of variance accounted for by the regression model. The Pearson Correlation Coefficient Squared R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression. R-squared = Explained variation / Total variation R-squared is always between 0 and 100%: 0% indicates that the model explains none of the variability of the response data around its mean. 100% indicates that the model explains all the variability of the response data around its mean. In general, the higher the R-squared, the better the model fits your data. EX: A regression model accounts for 38.0% of the variance while the other accounts for 87.4%. The more variance that is accounted for by the regression model the closer the data points will fall to the fitted regression line. Theoretically, if a model could explain 100% of the variance, the fitted values would always equal the observed values and, therefore, all the data points would fall on the fitted regression line. R-squared cannot determine whether the coefficient estimates and predictions are biased, which is why you must assess the residual plots. R-squared does not indicate whether a regression model is adequate. You can have a low R-squared value for a good model, or a high R-squared value for a model that does not fit the data!

*Mean Squared error (look at equation)

In statistics, the mean squared error (MSE) or mean squared deviation

(MSD) of an estimator (of a procedure for estimating an unobserved

quantity) measures the average of the squares of the errors or

deviations—that is, the difference between the estimator and what is

estimated.

is a measure of the quality of an estimator—it is always non-negative, and

values closer to zero are better.

*Correlation (look for equation)

A correlation is a single number that describes the degree of relationship

between two variables.

Correlation is the degree to which two variables vary together. It is a test of

the magnitude and the direction of the relationship between two variables.

The three types of relationships are positive, negative, and

zero-relationship. Positive relationships indicate that both variables are

varying in the same direction together. They can both decrease or increase

together, but they must move in the same direction. Negative, or inverse,

relationships indicate that the variables are moving in opposite directions of

each other. One will increase as the other decreases, or vice versa. A

zero-relationship indicates that as the X variable varies, the Y variable does

nothing. This results in a straight horizontal line. A perfect positive

relationship is identified by a correlation coefficient of +1.0, while a perfect

negative relationship is identified by a correlation coefficient of -1.0.

*What are the four steps in correct order that must be followed in order to complete a hypothesis test? Step 1: State the Hypothesis Step 2: Identify the critical value Step 3: Compute the test statistic Step 4: Draw your conclusion

Fully exhaustive - All possible outcomes are those outcomes (has to be

heads or tails)

*Anova (look for equation) ANOVA three groups or more Used a lot in experimental design psychology Evaluates all components at once Advantage: 2 or more means can collapse into a single, interpretable value Disadvantage: does not allow for retrospective analysis of individual components (that value cannot be broken down into its original values) As between group variance increases, support for 𝐻𝐴 increases As within group increases, less likely to reject null Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences among group means and their associated procedures (such as "variation" among and between groups) ANOVAs are useful for comparing (testing) three or more means (groups or variables) for statistical significance. It is conceptually similar to multiple two-sample t-tests, but is more conservative (results in less type I error) and is therefore suited to a wide range of practical problems. When we have only two samples we can use the t-test to compare the means of the samples but it might become unreliable in case of more than two samples. If we only compare two means, then the t-test (independent samples) will give the same results as the ANOVA. EX: EXAMPLE: Suppose we want to test the effect of five different exercises. For this, we recruit 20 men and assign one type of exercise to 4 men (5 groups). Their weights are recorded after a few weeks. We may find out whether the effect of these exercises on them is significantly different or not and this may be done by comparing the weights of the 5 groups of 4 men each.

As mentioned above, the t-test can only be used to test differences between two means. When there are more than two means, it is possible to compare each mean with each other mean using many t-tests. But conducting such multiple t-tests can lead to severe complications and in such circumstances we use ANOVA. Thus, this technique is used whenever an alternative procedure is needed for testing hypotheses concerning means when there are several populations. There are four basic ASSUMPTIONS used in ANOVA. the expected values of the errors are zero the variances of all errors are equal to each other the errors are independent they are normally distributed