Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Data Analysis: Collecting, Summarizing, and Analyzing Statistics - Descriptive & Inferenti, Study notes of Statistics

An introduction to statistics, including the concepts of descriptive and inferential statistics, probability, populations, samples, parameters, and statistics. It covers the importance of statistical inference, the difference between a census and a sample, and the concepts of observational units, variables, frequency distributions, relative frequencies, and measures of center and spread. It also introduces the concepts of measures of center (mean and median), measures of variation (range, quartiles, and interquartile range), and the five-number summary.

Typology: Study notes

2009/2010

Uploaded on 11/17/2010

supercoolguy91
supercoolguy91 🇺🇸

4

(4)

27 documents

1 / 3

Toggle sidebar

Related documents


Partial preview of the text

Download Data Analysis: Collecting, Summarizing, and Analyzing Statistics - Descriptive & Inferenti and more Study notes Statistics in PDF only on Docsity!

Statistics Test # 1 Chapter 1 ---  Statistics – Science of collecting, summarizing, analyzing, an d interpreting data (info, usually in the form of numbers) o Descriptive Statistics – Summarizing the data o Inferential Statistics – Analyzing and interpreting data. Inductive thought process  Probability – Field of mathematics involved with determining the relative frequency of certain events. Deductive thought process  Population – Collection of all elements (individuals or items) under consideration o Example: All the possible tosses of a coin  Sample – A part of the population from which information is obtained  Parameter – A numerical characteristic (descriptive measure) of the population  Statistic – A numerical characteristic (descriptive measure) of a sample  Statistical Inference – Inductive thought process – Based on observation o Either experimental or hypothesis testing  Census – 100% enumeration of a population  Why take a sample instead of a census? o Convenience, Necessity, and Accuracy o Random Sample – Elements are drawn at random with replacement o Simple Random Sample – Elements are drawn at random without replacement o Sampling Frame – List of the elements in the population from which the sample will be drawn (n= # of elements in population) Chapter 2 ---  Observational Unit – The elements or objects described  Variable – The measured characteristic for an observational unit o Qualitative – simply classifies elements into categories or classes  Example: non-numeric, numeric but used to rename object, ordinal values o Quantitative – This type of data is numerical and such that differences in the vales have meaning  Example: values are finite or countably infinite, positive values is an interval  Frequency Distribution – Two column table including the values of the variable and the frequency of each value

 Relative Frequency distribution – Two column table including the values of the variable and the relative frequency (frequency divided by the number of observations) of each value o Relative Frequency always totals 1, write as a decimal  Bar Chart – Graph with values of a qualitative variable on the horizontal axis and either the frequencies or relative frequencies on the vertical axis o *** Bars don’t touch each other!!  Stem and Leaf Diagram – Used for displaying the distribution of the values of a quantitative variable. Leading digits are used as stems and the next digits as leaves o Gives a quick way to look at the main features of the distribution  Center, Spread, Shape, Outliers  To determine shape (skewed to right/left or symmetric) turn graph on side and skewness is to side with longest tail o Use for small sets of data  Classes – Categories for grouping data  Frequency – The number of observations that fall in a class  Frequency Distribution – A listing of all classes and their frequencies  Relative Frequency – The ratio of the frequency of a class to the total number of observations  Relative-frequency distribution – A listing of all classes and their relative frequencies  Lower cutpoint – The smallest value that could go in a class  Upper cutpoint – The smallest value that could go in the next higher class (equivalent to the lower cutpoint of the next higher class)  Midpoint – The middle of the class ,found by averaging its cutpoints  Width – The difference between the cutpoints of a class  Histogram – graph with values of a quantitative variable on the horizontal axis and either the frequencies or the relative frequencies on the vertical axis o **** Bars touch each other!!!! o Use for large data sets o Want 5-12 bars  Measures of center of a distribution o Mean – Arithmetic Average of the measurements (Notation:  More susceptible to extreme measurements o Median – The number such that half of the measurements are greater and half are smaller than the number (Notation: m)  Measures of Variation o Range – The distance between the smallest and the largest measurements (Notation: range = lgest observation – smest observation ) o First quartile – the value such that one fourth of the measurements are less than that

value (Notation: Q 1 = First Quartile)

o Third Quartile – the value such that one fourth of the measurements are greater than

that value (Notation: Q 3 = Third^ Quartile

o IQR – The distance between the first and third quartiles (Notation: IQR = Q3 – Q1)

 Rule of thumb for outliers: Less than Q1 – 1.5 x IQR, or greater than Q3 + 1.5 x IQR  Five-Number Summary – Describe the distribution with the numbers: minimum, Q1, m, Q3, maximum  Boxplot – graphical display of the five-number summary o Good for comparing several distributions  Measures of Spread o Variance – Measures spread as a function of the distance of measurements from the mean (Notation: o Standard Deviation – The square root of the variance (Notation:  Measuring Center and Spread in populations o Population – o Population Variance – o Population standard deviation –  Chebyshev’s Theorem – For any distribution, a proportion of at least 1 – 1/k^2 of the measurements must lie within k standard deviations of the mean ( for k > 1)  Empirical Rule – For the normal distribution, approximately 68% of the measurements are within one standard deviation of the mean, approximately 95% of the measurements are within two standard deviations, and approximately 99.7% of the measurements are within three standard deviations of the mean  Z score – measures the number of standard deviations a measurement, x, is from the mean (Notation:  Scatterplot – plotting each element as a point on the two –dimensional axis (x on horizontal axis, y on vertical) o Linear relationship – points fall along a line with a slope not equal to 0 o Positive linear relationship – points fall along a line with slope greater than 0 o Negative linear relationship – points fall along a line with a slope less than 0  Linear correlation coefficient – measures the strength of the linear relationship between two variables x and y (Notation: r =