Download Lecture Notes on Descriptive and Inferential Statistics | EDUR 8131 and more Study notes Linguistics in PDF only on Docsity! Notes 1: Descriptive Statistics 1. Descriptive and Inferential Statistics • descriptive statistics are used to describe data • inferential statistics are used to draw inferences from sample to population • statistic vs. parameter (M vs. µ), sample vs. census 2. Variables Anything that varies or takes different values is a variable. Anything that does not vary is a constant. 3. Scales of measurement Scales Criteria Nominal categories Ordinal categories, rank Interval categories, rank, equal interval Ratio categories, rank, equal interval, true zero point Scales Examples Nominal Types of flowers, sex, dropout/stay-in, vote/abstain Ordinal SES, Likert scales responses, class rank Interval time, temperature (in the abstract with no beginning or ending) Ratio age, weight, height, time to complete task • researchers usually do not make a distinction between interval and ratio variables, and it is seldom necessary to do so • the majority of variables in education are nominal or ordinal; very few interval or ratio variables exist in the social sciences 4. Types of Variables • independent (IV, cause, predictor) and dependent (DV, effect, criterion), the IV and DV can be identified by noting the temporal order of the variables, the IV will be that which is first in the time sequence (e.g., there will be difference in mathematics scores between males and females; IV = sex, DV = mathematics scores; sex occurs before mathematics scores) • qualitative, which is also referred to as categorical and nominal (with the simplest case being dichotomous or binary) • quantitative, which is usually any variable with an underlying continuum (variables measured either at the ordinal, interval, or ratio scale) • continuous; the measurement of such variables that could theoretically take more refined values (height, weight, IQ); with continuous variables, one only has reported values, not the exact values, to work with (e.g., IQ = 107), but one can establish the "limits of the exact value" by adding and subtracting one-half the unit of measurement from the reported value (e.g., IQ [106.5, 107.5]) • discrete; the measurement of such variables takes only separate values or whole numbers, such as counts of something 2 Version: 2/14/2005 5. Hypotheses Researcher’s expectations about research outcomes; either directional, non-directional, or null. These three types of hypotheses are explained below in the matrix. Type of Hypothesis Type of Independent Variable Qualitative (Categorical) Quantitative (Continuous) Directional Group differences exist; one group expected to perform better than the other group(s). Example: Group A will do better than group B. Either a positive or negative relationship will exist. Example: Higher scores on A are associated with higher scores on B. Example: Higher scores on A are associated with lower scores on B Non-directional Group differences exist, but it is not clear which group will do better. Example: There will be a difference between groups A and B. Relationship will exist, but it is not clear if it will be positive or negative. Example: Variable A is associated with variable B. Null No difference expected; groups will do the same. Example: There is no difference between groups A and B. No relationship expected. Example: Variable A is not associated with variable B. 6. Frequency Distributions A frequency distribution typically displays raw scores in a distribution of scores in rank order and indicates the number of times a given raw score occurred in the distribution. Two types of frequency distributions exist: • un-grouped frequency: indicates how often each raw score occurred (e.g., each IQ score, each age) • grouped frequency: with larger range of values, it is often better to group scores into classes or intervals and count frequency by class or interval; determine number of classes by dividing the range by an appropriate number, such as 10 (this will give class widths) • relative frequency: proportion or percentage of occurrence; appropriate for either un-grouped or grouped frequency • cumulative relative frequency: running total of relative frequency; shows total of all scores in proportion or percentage as a cumulative for a given score Examples Below are ages for a group of students in an introductory statistics course. The un-grouped frequency is displayed for the ages. A grouped frequency is also displayed. Note the relative frequency for both examples. Scores: 21, 22, 29, 22, 22, 31, 35, 43, 44, 51, 51, and 55 5 Version: 2/14/2005 • skewed (positive and negative) 9. Percentiles and Percentile Ranks • percentiles: points (or scores) in distribution below which a given percent, P, of cases lie (e.g., P75 = 110 for IQ) • percentile ranks: percentage of a distribution which lies below a score (e.g., IQ = 110, PR110 = 75) • quartiles: divides a distribution at three points—P25, P50, and P75—thus creating four quarters which are called quartiles; the first quartile, Q1, represents the bottom 25% of scores, the second quartile, Q2, the next 25%, and so on to Q4. 10. Box-and-whisker display Another graph that displays various bits of information for a distribution. See example below. 105 100 95 90 85 80 75 70 65 60 M edian Q 1 Q 3 X max X m in 11. Univariate Summary Measures Univariate refers to one variable. Summary measures refer to indices that provide concise descriptions of distributions of scores, like the mean (average). There are two common types of summary measures: • Central tendency: a typical score or average • Variability: the spread or dispersion of scores 12. Notation and Symbols Before discussing measures of central tendency and variability, it is first important to understand various mathematical symbols used in calculating univariate summary measures. • Xi: this represents the i th raw score in the distribution; often presented as just X • n: the sample size; the number of scores in a distribution; sometimes it has a subscript, such as n2 or nb, to denote to which group it is referring (e.g., ng, for sample size of girls) • ∑: just means to sum or add scores; sometimes it is displayed as ∑ = = 3 1 n i which means that the total number of observations to sum is three, and one starts the summation at observation number 1 • ∑X: sum all the X's (e.g., 1, 2, 3, 4, 5; ∑N = 1 + 2 + 3 + 4 + 5= 15) • ∑X2: sum the square of the X's (e.g., 12, 22, 32 = 1 + 4 + 9 = 14) • (∑X)2: square the sum of the X's (e.g., 1 + 2 + 3 = 6, then square 6, 62 = 36) 6 Version: 2/14/2005 13. Measures of Central Tendency Central tendency refers to typical or average scores in distribution. There are three commonly used measures of central tendency: • mode (Mo): most frequent score in distribution; uni-, bi-, tri-, multi-modal distributions; good for nominal or qualitative data, but can also be used with ordinal, interval, and ratio variables • median (Md, Mdn, X50): score in middle is scores are rank ordered; 50% above, 50% below; good for ordinal data, or interval/ratio data when the distribution is highly skewed (ex. income in US is positively skewed, so use Mdn); not appropriate for nominal data • mean ( ,X M): what one usually thinks of as the average; ∑ nX i = mean = X ; the balancing point—that point at which the sum of squares [SS] is minimum, and the Sum of Deviations scores will equal exactly zero (both discussed later); best used with ratio or interval data, but may be used with some ordinal data; not appropriate for nominal data. 14. Central Tendency and Distributions Placement of mean, median, and mode for normal, bi-modal, rectangular, and skewed distributions 15. Mean of Groups To find the mean of two (or more) groups, use the following formula (or a simply adaptation of it) .X = 21 2211 nn XnXn + + where 1X is the mean of group 1, 2X is the mean of group 2, n1 is the sample size for group 1 and n2 is the sample size for group 2, and .X is the grand mean. 16. Inferences and Sampling Error • population mean: if one uses a sample, then the mean is denoted by the symbol X or M, if one refers to the population (i.e., census) mean, the symbol µ is used (µ is the Greek letter for small m) • sampling error: the measures of central tendency can be used for inferences from sample to population; any randomly formed error (chance error) between sample statistic (e.g., X ) and population parameter (e.g., µ) is called sampling error: sampling error = parameterstatistic − (e.g. µ−X ) 17. Components of Variance • deviation score ( XX − )= xi: the mean subtracted from the raw score • sum of deviation scores ∑ − XX = ∑ ix sum of all deviation scores • sums of squares (SS) = 2)(∑ − XX = ∑ 2ix : square each deviation score and sum • least squares criterion: the mean, X , is know as the least-squares criterion because the mean will provide the smallest sum of squares, and since it provides the smallest SS, it is know as the least squares criterion 7 Version: 2/14/2005 18. Variability Variability is the spread or dispersion of the scores (also may be viewed as the tendency for scores to differ from one another or to depart from the typical or average score). There are a number of indices of variability. The most commonly used are: • exclusive range (R): difference between largest and smallest scores, i.e., Xmax – Xmin (e.g., Xi = 3, 6, 2, 9: Xmax = 9, Xmin = 2, R = 9 – 2 = 7) • inclusive range: we won’t use this range, but some statisticians use the inclusive range which is defined as the difference between largest and smallest score plus 1, i.e., Xmax – Xmin+ 1 (e.g., Xi = 3, 6, 2, 9: Xmax = 9, Xmin = 2, R = 9 – 2 + 1 = 8) • sample variance (VAR or s2): find SS and divide it by n - 1, i.e., s 2 = ( ) 1 2 − −∑ n XX = 1 2 − ∑ n x = 1−n SS the computational formula is s 2 = ( ) )1( 22 − − ∑∑ nn XXn • population variance (σ2): σ2 = ( ) N X 2 ∑ − µ • sample standard deviation (SD or s): s = 2s = ( ) 1 2 − −∑ n XX = 1 2 − ∑ n xi so s is just like the variance except in the original scale of measurement • population standard deviation (σ): σ = 2σ Why isn't the range as good a measure of variability as s? Note: Always calculate to three decimal places Example Calculation of Central Tendency and Variability Find the three measures of central tendency, s 2 , and s for the following: 9, 10, 5, 6, 5, 7, 8, and 5.