






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An in-depth explanation of multi-stage sampling, a more complex form of cluster sampling. It covers the concept, its use in surveying teachers in Enugu, Nigeria, and the difference between it and convenience sampling. Additionally, it discusses the central limit theorem, normal distribution, mean, variance, descriptive and inferential statistics, and ANOVA. useful for university students studying statistics, research methods, or sociology.
Typology: Lecture notes
1 / 11
This page cannot be seen from the preview
Don't miss anything!







lecture notes CJ 3347
*A colleague of yours is concerned that your data will not approximate a normal distribution, which is important when conducting a regression analysis. What do you say to refute this claim? A theoretical probability was used to select each case, therefore we have best approximated the population distribution *After having a conversation with your colleague about your distribution, they still don't understand how it is considered a normal distribution. Explain further by saying… The central limit theorem states that our large sample size has increased our empirical probabilities, therefore our distribution is normal. The central limit theorem also states that as your sample size approaches the population, your standard error decreases and your sample mean will get closer to the population mean.
*Variance Summation notation s squared= Σ(𝑋−𝑋bar)squared / 𝑛 − The average of the squared differences from the Mean. n - 1 is the degrees of freedom You have one less than the sample size of cases to randomly assign Less biased calculation To calculate the variance: Find the mean. Calculate deviations from the mean for each value (X - Xbar) Square each of these values. Why do we do this? Sum the squared deviations (sum of squares = SS = Σ(𝑋−Xbar)squared). Divide the sum of squares by n - 1. Steps:
*Inferential statistics The mathematical procedures whereby we convert information about the sample into intelligent guesses about the population fall under the rubric of inferential statistics. EX: Blood samples, sampling pizza Two methods: -Interval estimation -Hypothesis testing Both use sample statistics to make estimations about population parameters HT is more common (in CJ) For instance, we use inferential statistics to try to infer from the sample data what the population might think. Or, we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study. Inferential statistics are techniques that allow us to use these samples to make generalizations about the populations from which the samples were drawn. It is, therefore, important that the sample accurately represents the population. The process of achieving this is called sampling (sampling strategies are discussed in detail here on our sister site). Inferential statistics arise out of the fact that sampling naturally incurs sampling error and thus a sample is not expected to perfectly represent the population. Population too large so take a sample instead.
To use the F-test to determine whether group means are equal, it's just a matter of including the correct variances in the ratio. In one-way ANOVA, the F-statistic is this ratio: F = variation between sample means / variation within the samples *R² ( look more at equation) The proportion of variance accounted for by the regression model. The Pearson Correlation Coefficient Squared R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression. R-squared = Explained variation / Total variation R-squared is always between 0 and 100%: 0% indicates that the model explains none of the variability of the response data around its mean. 100% indicates that the model explains all the variability of the response data around its mean. In general, the higher the R-squared, the better the model fits your data. EX: A regression model accounts for 38.0% of the variance while the other accounts for 87.4%. The more variance that is accounted for by the regression model the closer the data points will fall to the fitted regression line. Theoretically, if a model could explain 100% of the variance, the fitted values would always equal the observed values and, therefore, all the data points would fall on the fitted regression line. R-squared cannot determine whether the coefficient estimates and predictions are biased, which is why you must assess the residual plots. R-squared does not indicate whether a regression model is adequate. You can have a low R-squared value for a good model, or a high R-squared value for a model that does not fit the data!
*What are the four steps in correct order that must be followed in order to complete a hypothesis test? Step 1: State the Hypothesis Step 2: Identify the critical value Step 3: Compute the test statistic Step 4: Draw your conclusion
*Anova (look for equation) ANOVA three groups or more Used a lot in experimental design psychology Evaluates all components at once Advantage: 2 or more means can collapse into a single, interpretable value Disadvantage: does not allow for retrospective analysis of individual components (that value cannot be broken down into its original values) As between group variance increases, support for 𝐻𝐴 increases As within group increases, less likely to reject null Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences among group means and their associated procedures (such as "variation" among and between groups) ANOVAs are useful for comparing (testing) three or more means (groups or variables) for statistical significance. It is conceptually similar to multiple two-sample t-tests, but is more conservative (results in less type I error) and is therefore suited to a wide range of practical problems. When we have only two samples we can use the t-test to compare the means of the samples but it might become unreliable in case of more than two samples. If we only compare two means, then the t-test (independent samples) will give the same results as the ANOVA. EX: EXAMPLE: Suppose we want to test the effect of five different exercises. For this, we recruit 20 men and assign one type of exercise to 4 men (5 groups). Their weights are recorded after a few weeks. We may find out whether the effect of these exercises on them is significantly different or not and this may be done by comparing the weights of the 5 groups of 4 men each.
As mentioned above, the t-test can only be used to test differences between two means. When there are more than two means, it is possible to compare each mean with each other mean using many t-tests. But conducting such multiple t-tests can lead to severe complications and in such circumstances we use ANOVA. Thus, this technique is used whenever an alternative procedure is needed for testing hypotheses concerning means when there are several populations. There are four basic ASSUMPTIONS used in ANOVA. the expected values of the errors are zero the variances of all errors are equal to each other the errors are independent they are normally distributed