Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Statistics and Data Analysis: Descriptive and Inferential Techniques - Prof. Roy Heatwole, Exams of Statistics

An overview of various statistical techniques, including bar graphs, pie charts, dot plots, stem-and-leaf plots, histograms, inferential statistics, and parametric and non-parametric methods. It covers concepts such as random sampling, variables (categorical and quantitative), relative frequencies, center and spread, time series analysis, and data analysis techniques like mean, median, mode, range, deviation, variance, standard deviation, empirical rule, percentiles, and residual analysis. It also introduces concepts like trend, correlation, and regression.

Typology: Exams

2009/2010

Uploaded on 12/15/2010

jdasdourian
jdasdourian 🇺🇸

5

(1)

16 documents

1 / 9

Toggle sidebar

Related documents


Partial preview of the text

Download Statistics and Data Analysis: Descriptive and Inferential Techniques - Prof. Roy Heatwole and more Exams Statistics in PDF only on Docsity! Math 220: Elementary to Statistics, With Mr. Heatwole Study Guide by Denton Asdourian Chapter 1  Statistics is the at and science of learning from data  Design- planning an investigative study, such as how to obtain relevant data  Sample- subjects for whom we have data  Population- all subjects of interest  Descriptive Statistics- refers to methods of summarizing the data (graphs/averages/percents) o Bar Graph, Pie Chart, Dot Plot, Stem-and-Leaf Plot, Histogram  Inferential Statistics- refers the methods of making decision or predictions about a population, based on data obtained from a sample of that population  Parameter- a numerical summary of the population  Statistic- a numerical summary of a sample taken from the population  Random Sampling- each subject in the population has the same chance of being in the sample Chapter 2  Variable- any characteristic that is recorded for subjects in a study  Categorical- each observation belongs to one of a set of categories o A key feature to describe is the relative number of observations (ex. %’s) in the various categories o Ex- gender, religious affiliation, place of residence, belief in life after death  Quantitative- observations on it take numerical values that represent different magnitudes of the variable o Discrete- usually a count, ex. 0, 1, 2, 3, … o Continuous- has a continuum of infinitely many possible values o Key features to describe are the center (mean) and the spread (standard deviation) o Ex- quantity, magnitude, averages  Relative Frequencies- o Proportion- the frequency of the observations in that category divided by the total number of observations o Percentage- the proportion multiplied by 100  Overall Pattern- o Unimodal- data that has a single mound o Bimodal- data that has two distinct mounds o Shape-  Symmetric- the side of the distribution below a central value is a mirror image of the side above that central value  Skewed- one side of the distribution stretches out longer than the other side  To the left- left tail is longer than the right  To the right- right tail is longer than the left  Time Series- a data set collected over time  Time Plot- used to display time-series data graphically  Trend- a common pattern indicating either a long tendency of the data to rise or a long tendency to fall  Mean- the sum of the observations divided by the number of observations  Median- the midpoint of the observations when they are ordered from the smallest to the largest and vice versa  Outlier- an observation that falls well above or well below the overall bulk of the data  Resistant- numerical summary of the observations  Binary Data- take only two values, 0 and 1  Mode- the value that occurs most frequently  Range- the difference between the largest and the smallest observations  Deviation- the difference between the observation and the sample mean o The deviation is positive when the observation falls above the mean and negative when the observation falls below the mean o The sum of the deviations always equals zero  Variance- the average of the squared deviations o The square root of the variance is the standard deviation  Standard Deviation- o = sum of squared deviations, n - 1 = sample size – 1 o The larger the standard deviation s, the greater the spread of the data o S = 0 only when all observations take the same value o S can be influenced by outliers  Empirical Rule- if a distribution of data is bell-shaped, then: o 68% of the observations fall within 1 standard deviation of the mean  o 95% of the observations fall within 2 standard deviations of the mean  o All or nearly all observations fall within 3 standard deviations of the mean   Percentile- the pth percentile is a value such that p percent of the observations fall below or at that value o Three useful percentiles are the quartiles  First quartile- p = 25 (25th percentile)  Second quartile- p = 50 (50th percentile aka the median)  Third quartile- p = 75 (75th percentile) o Inter quartile range (IQR)- the distance between the third and first quartile. IQR = Q3 – Q1  An observation is a potential outlier if it falls more than 1.5 x IQR below Q1 or more than 1.5 x IQR above Q3  Five-number Summary- the min value, Q1, median, Q3, and the max value  Z-Score- for an observation is the number of standard deviations that it falls from the mean z= x− x̄ s = observation – mean divided by standard deviation Chapter 3  Response Variable- the outcome variable on which comparisons are made  Explanatory Variable- defines the groups to be compared with respect to values on the response variable  Association- exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable o Positive- as x goes up, y tends to go up o Negative- as x goes up, y tends to go down  Correlation- summarizes the direction of the association between two quantitative variables and the strength of its straight-line trend. o Denoted by r, it takes values between -1 and +1 o The closer r is to ±1 , the closer the data points fall to a straight line, and the stronger is the linear association. The closer r is to 0, the weaker is the linear association o Calculating the Correlation r : r= 1 n−1∑ z x z y= 1 n−1∑ ( x− x̄ s x )( y− ȳ s y )  n is the number of points, x̄ and ȳ are means, and s x s y are standard deviations for x and y. the sum is taken over all n observations  Case-control study- a retrospective observational study in which subjects who have a response outcome of interest (the cases) and subjects who have the other response outcome (the controls) are compared on an explanatory variable Chapter 5  Probability- the proportion of times that the outcome would occur in a long run of observations o It is a proportion, therefore it takes a value between 0 and 1 o The total of the probabilities for all the possible outcomes equals 1  Random Phenomena- the many things in your life for which the outcome is uncertain o With a small number of observations, outcomes may look quite different from what you expect o The proportion of times that something happens is highly random and variable in the short run but very predictable in the long run  Randomness- randomly assigning subjects to treatments or randomly selecting people for a sample  Trial- ex. each simulated roll of a die  Cumulative Proportion- each value for the number of trials  Law of Large Numbers- as the number of trials increase, the proportion of occurrences of any given outcome approaches a particular number “in the long run” o Only guarantees long-run performance  Independent- previous trials do not affect the trial that’s about to occur  Subjective Definition of Probability- the probability of an outcome is defined to be your degree of belief that the outcome will occur, based on the available information  Bayesian Statistics- uses subjective probability as its foundation  Sample Space- the set of possible outcomes for a random phenomenon  Tree Diagram- a graph of branches showing what can happen on different trials o The number of branches doubles at each stage  Event- a subset of a sample space. Corresponds to a particular outcome or a group of possible outcomes  Probability of an Event A- obtained by adding the probabilities of the individual outcomes in the event. Denoted by P(A) o When all the possible outcomes are equally likely, P(A) = # of outcomes in event A / # of outcomes in the sample space  Complement of an Event A- consists of all the outcomes in the sample space that are not in A. It is denoted by Ac. the probabilities of A and Ac of add to 1, so P(Ac) = 1 – P(A)  Disjoint Events- events that do not share any outcomes in common (also called mutually exclusive) o If the events are disjoint then, P(A and B) = 0), so P(A or B) = P(A) + P(B)  Intersection- of A and B consists of outcomes that are both A and B o For the intersection of two independent events: P(A and B) = P(A) X P(B)  Union- of A and B consists of outcomes that are in A or B (meaning A occurs or B occurs or both) o For the union of two events, P(A or B) = P(A) + P(B) – P(A and B)  Conditional Probability of Event A, given ( | ) that event B has occurred = P(A|B )= P(AandB ) P(B) o P(AandB )=P( A|B)´ P(B ) P(AandB )=P(B|A )´ P(A )  Independent- when the probability that one (A) occurs is not affected by whether or not the other (B) event occurs and vice versa P(A|B )=P( A ) or P(B|A )=P(B ) , P(AandB )=P( A )´ P(B)  Probability Model- specifies the possible outcomes for a sample space and provides assumptions on which the probability calculations for events composed of those outcomes are based Chapter 6  Random Variable- a numerical measurement of the outcome of a random phenomenon.  Probability Distribution- of a random variable, specifies its possible values and their probabilities o Discrete- when a random variable (X) has separate possible values  Prob Distri for Discrete Ran Var: for each x, the probability P(x) falls between 0 and 1. The sum of the probabilities for all the possible x values equals 1.  Mean- m=∑ xP (x ) called weighted average o Continuous- having possible values that are an interval rather than a set of separate #s  Prob Distri for Continuous Ran Var: is specified by a curve that determines the probability that the random variable falls in any particular interval of values  Each interval has probability between 0 and 1. This is the area under the curve, above that interval.  The interval containing all possible values has probability equal to 1, so the total area under the curve equals 1. o *continuous variables are measured in a discrete manner because of rounding.  A prob dist for a continuous ran var is used to approx the prob dist for the possible rounded values. o Expected value of X- mean of the probability distribution of a random variable X  Normal Distribution- is symmetric, bell shaped, and characterized by its mean m and standard deviation s . The probability within any particular number of standard deviations of m is the same for all normal distributions (0.68 within 1 s , 0.95 within 2 s , 0.997 within 3 s . o Cumulative Probability- falling below the point m+ zs = x o Complement Probability- falling above the point m+ zs =x o Z-score for a value x of a random variable- the number of standard deviations that x falls from the mean m . Equation: z= x−m s o Standard Normal Distribution- the normal distribution with mean m =0 and S.D. s =1. It is the distribution of normal z-scores  When a random variable’s values are converted to z-scores by subtracting the mean and dividing by the S.D., the z-scores have the standard normal dist ( m =0, s =1)  Conditions for Binomial Distribution- o For each of n trials has two possible outcomes. The outcome of interest is called a “success” and the other outcome is called a “failure” o Each trial has the same probability of success and is denoted by p o The probability of a failure is denoted by 1 – p o The n trials are independent. o The binomial random variable X is the number of successes in the n trials o Formula for binomial probabilities- P( x )= n ! x ! (n−x )! pn−x , x=0,1,2, .. . , n  n! is called n factorial = 1 x 2 x 3 x … x n o Mean m and Standard Deviation s : m=np , s=√np(1−p )  Sampling Distribution- specifies the possible sample proportion values and their probabilities o For binary data: Mean = p and Standard Deviation = √ p (1−p) n o If n is sufficiently large that the expected numbers of outcomes of the two types: [np and n(1−p ) ] ³ 15 , then this sampling distribution is approx normal o Standard Error- the standard deviation of a sampling distribution  Standard error of x̄ - the standard deviation of the sampling distribution of the sample mean x̄ o For a quantitative variable, the sampling distribution of x̄ has: center- mean = m and spread- standard error = s /√n o Sampling distribution of x̄ more bell shaped as n increases  Central Limit Theorem- the sampling distribution of sample mean often has approx a normal distribution Chapter 7  Statistical Inference- uses sample statistics to make decisions and predictions about population parameters  Confidence Interval- an interval of numbers within which the unknown parameter value is believed to fall. It is constructed by adding and subtracting a margin of error of the sampling distribution of that point estimate  Confidence Level- the probability that this method produces an interval that contains the parameter  Point Estimate- a single number that is our “best guess” for the parameter  Interval Estimate- an interval of numbers within which the parameter value is believed to fall  Margin of Error (m)- measures how accurate the point estimate is likely to be in estimating a parameter. It increases as the confidence lvl increases and decreases as the sample size increases.  Standard Error- an estimated standard deviation of a sampling distribution  Confidence Interval estimating a Population Proportion- o Point Estimate: p̂ o Standard Error: se=√ p̂(1− p̂ )/n Then multiply se by the z-score to get m o Confidence Interval: p̂±z ( se ) o Sample size for estimating a population proportion: n = (z)2 p̂(1− p̂ ) /m2  Sample size needed for large-sample confidence interval: n p̂ ³ 15 and n(1− p̂ ) ³ 15  Confidence Interval estimating a Population Mean- o Point Estimate: x̄ o Standard Error: . Then multiply se by the t-score to get m o Degrees of freedom- df = n – 1 o Confidence Interval: x̄±t (se ) o Sample Size for Margin of Error m: n=4 s 2 /m2  A statistical method is said to be robust with respect to a particular assumption if it performs adequately even when that assumption is violated  The key results for finding the sample size for a random sample are as follows: o The margin of error depends on the standard error of the sampling distribution of the point estimate. o The standard error itself depends on the sample size  Bootstrap-computational invention, a simulation method that resamples from the observed data Chapter 8 (Not 8.5 or 8.6)  Significance Test- a method of using data to summarize the evidence about a hypothesis  1. Assumptions- specify the variable and parameter