








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Descriptive Statistics, Inferential Statistics by Parina Patel.
Typology: Study Guides, Projects, Research
1 / 14
This page cannot be seen from the preview
Don't miss anything!









- October 15, Parina Patel (a) Dependent Variable–DV: Variables whose change the researcher wishes to explain (b) Independent Variable–IV: Variables that help explain the change in the dependent variable
(a) Nomial: A nominal variable has qualitative categories that can- not be ranked in a meaningful way in terms of degree or mag- nitude. Examples of nominal variables include RACE, TYPE OF BANKRUPTCY, TYPE OF CORPORATION, NAME. All of these variables have qualitative categories that cannot be or- dered in terms of magnitude or degree. This is the least powerful type of variable. 1 (b) Ordinal: An ordinal variable has qualitative categories that are ordered in terms of degree or magnitude. Examples of a nomi- nal variable include CLASS or DEGREE OBTAINED. The vari- able DEGREE OBTAINED may include the following categories: (^1) Alphabetizing the categories does not count as ordering the variable, because the ordering has to be in terms of degree or magnitude.
Frequency tables are a detailed description of the categories/values for one variable. A frequency table most often includes all of the following: 4
Table 1 provides an example of a frequency table for an ordinal variable (note it is ordinal because the categories are qualitative and ordered) named Socioeconomic Class. If there were numbers assigned to each category that were also ordered, we could treat this as an interval level variable.
Table 1: Frequency Table–Socioeconomic Class Socioeconomic Class Frequency Percent Cumm. Percent Upper 50 7.14% 7.14% Upper Middle 150 21.43% 28.57% Middle 300 42.86% 71.43% Lower Middle 150 21.43% 92.86% Lower 50 7.14% 100% Total 700 100%
example below, we can find the percentage of young people that listen to music. 5
Table 2: Crosstab of Music Preference and Age AGE Preference Young Middle Age Old Music 14 10 3 News-talk 4 15 11 Sports 7 9 5
Measures of central tendencies provide the most occurring or middle value/category for each variable. There are three measures of central tendencies–mode, me- dian, and mean. See Table 3 for a summary of measures of central tendencies.
Measures of variability is defined as the dispersion (or deviation) away from the mean for each variable. Measures of variability only exist for interval level variables. There are three measures of variability–range, standard deviation, and variance. A discussion of each can be found below followed by a summary table (Table 3).
(b) It is perfectly symmetrical. (c) All measures of central tendencies (mode, median, and mean) lie in the middle middle of the curve. These measures of central tendencies divide the curve in half (where 50% of the values lie to the left of the mean, and 50% lie to the right). (d) Approximately 95% of the values are found two standard devia- tions away from the mean (in both directions).
Variables that are determined by nature are normally distributed (graph- ically they have a normal curve) such as age, weight, height, etc. It is important to understand what a normal curve looks like and its char- acteristics because almost all methods described below assume nor- mality. If this assumption is violated (i.e. a variable is not normally distributed) it can have an effect on the statistical results (resulting in significance when in reality it is not significant, or not resulting in statistical significance when it is significant). If variables are not nor-
mally distributed, it is easy to make transformations, such as logging or taking the square root, in order to achieve normality. 7
(a) One Null Hypothesis: The null hypothesis usually states there is no relationship between the variables being tested. The null hypothesis is already determined and based on the method being used. Most null hypotheses state that one statistic or number is equal to another statistic or number. This is usually displayed as: H 0 : a = b (b) One Alternative Hypothesis: 8 The alternative hypothesis usu- ally states that the two or more variables are somehow related. (^7) The best way to see if an interval variable is normal is with a histogram. A histogram
is a graph which places the values of the interval variable on the X axis, and the frequency or density on the Y axis. In Stata, the command for a histogram is histogram var1, freq normal (^8) This is also referred to as a research hypothesis. I refer to this as a research hypothesis
or an alternative hypothesis
the confidence level, the α level can be displayed as a percentage or a probability. The α level represents the probability or percentage of rejecting the null when it should not have been rejected. In other words, the α level is the probability or percentage of making a mistake, and the lower the α level the better the results. A statistical significance (or α level) of 1% is better than a statistical significance of 5%. α (probability) = 1– probability confidence level α (percentage) = 100 – percentage confidence level (f) What does a statistical test do: When performing any type of inferential statistics and any type of statistical testing, a value is generated based on the data (either a T, F, Z, or χ^2 ), and this value is being compared to some corresponding critical value 10 (T, F, Z, or χ^2 ) in order to determine statistical significance.
The following methods are used if a researcher is interested in comparing differences in two or more group means, and determining whether the differ- ence is statistically significant or a result of sampling error. This portion will provide a table of some of the differences between the methods, and a brief discussion of important details and clarification of each method.
Table 4: Comparing Group Means Method Grouping Variable Mean Variable Stata Command Two sample T Dichotomous Interval ttest var1, by(var2)^11 Paired T Before and After Interval ttest var1==var One way ANOVA Nominal Interval oneway var1 var2^12
research methods textbook. (^11) In two sample t test var1 is the interval variable, and var2 is the grouping variable.
Note the grouping variable MUST be a dummy with 2 categories (^12) In ANOVA var1 is the interval variable, and var2 is the grouping variable
For instance, a researcher wishes to know if there is a difference in the amount of debt (in dollars) in Chapter 7 and Chapter 13 bankruptcies. The researcher is interested in two variables, (1) the amount of debt in dollars (interval), and (2) Chapter 7 or Chapter 13 (dummy). The null hypothesis for this example is that there is no difference in the amount of debt between Chapter 7 and Chapter 13 bankruptcies. In other words, the average amount of debt for Chapter 7 equals the average amount of debt for Chapter 13. The alternative hypothesis is that the average amount of debt is different for these two types of bankruptcies. This test assumes that the interval variable is normally distributed across both groups, and that the variances are equal across both groups.
If you are looking to explain a variable using one or more variables then you want to use any of methods described below depending on the level of measurement of the dependent variable. Note that for all of the models described below, you can never have a nominal variable (with ore than two categories) as in independent variable. If you want to use a nominal level variable as an IV, then you must recode it into one or more dummy variables. Suppose you wanted the model to control for race. Since race is a nominal variable you cannot include it in your analysis as an IV, however, you could create dichotomous variables for all the categories of race and include these newly formed dichotomous variables.
Table 6: Explaining a Dependent Variable Method Dependent Independent^13 Stata Variable Variable Command OLS Regression Interval^14 Dummy/Interval regress dv iv1 iv2 iv3...^15 Logit/Probit Dummy Dummy/Interval logit dv iv1 iv2 iv3... Cumulative Logit Ordinal Dummy/Interval ologit dv iv1 iv2 iv3... Multinominal Logit Nominal Dummy/Interval mlogit dv iv1 iv2 iv3... Poisson Count^16 Dummy/Interval poisson dv iv1 iv2 iv3...
The null hypothesis for all of these methods is that the independent vari- able does not have an effect on the dependent variable. This null hypothesis is performed for each independent variable. 17
(^13) You can also use ordinal level variables as independent level variables as long as the categories are ordered in terms of degree or magnitude and the numbers corresponding to the categories are also ordered. (^14) The DV in regression analysis MUST be an interval variable. It cannot be an ordinal variable that is being treated as an interval variable, or a dichotomous variable being treated as an interval variable. (^15) dv=your dependent variable, iv=your independent variable(s) (^16) The dependent variable must not have any negative values and can be non-normal. (^17) For all of these methods, the null hypothesis is that the coefficient (β) is equal to 0,
and the alternative hypothesis is that the coefficient is a value no equal to zero. This literally means different things for the different methods, but can be interpreted the same way. For regression a coefficient equal to zero literally means that there is no linear change in the dependent variable as the independent variable changes, or that the slope is equal to 0. For the logit and poisson models, the null hypothesis literally means that the logit or the log of the odds is zero.
3.4.1 Assumptions
All of these models have certain assumptions. If these assumptions are not met, it affects the way the results are interpreted and can lead to serious er- rors in the statistical tests. All of these models have some similar assumptions about the observations, independent, and dependent variables. All models assume that the observations are independent and are randomly sampled from the entire population of interest. 18 The models also assume that no important independent variables are omitted from the model, and that all variables are measured without error. All of these models are stating that there is some functional relationship between the independent and dependent variables. The exact functional relationship is will depend on the method you use to fit your data. For instance, if you are using OLS Regression, this will assume the functional relationship between the independent and dependent variables is linear or that it takes the form of a straight line.
(^18) Instead of handpicking cases that will ensure statistical significance.