Download Statistical Concepts: Descriptive and Inferential Statistics and more Study notes Statistics in PDF only on Docsity! Statistical Concepts • Statistical concepts and methods provide insights into the behavior of many phenomena that we encounter in life, and thus are used in many fields of specialization in humanities, sciences and engineering. • This discipline is concerned with how to make intelligent judgments and informed decisions in the presence of uncertainty and variation. Without uncertainty or variation there would be no need for statistical methods. • Provides methods for – organizing and summarizing data – drawing conclusions based on information contained in the data. – in addition, it provides us with suggestions for how to design efficient ways for collecting data depending on what inferences we are interested in. …now we move to agree on some terminology The Population is everything you wish to study e.g. all RPI students who have come to campus before noon today. A V i bl i d t t h t i ti ar a e s use o represen a c arac er s c of each member of the population e.g. female vs. male, age or standing (freshman,…senior) of the individuals in the RPI population mentioned above. A Census is a study of the entire population. e.g.1 we may have a population on which we study election results because we do not have time/resources to conduct a census (as on election day). If we have all data for all eligible voters then we have a census. e g 2 for the preceding example all RPI students. . , . • The Size of the Population is the number of members of the population. It is referred to as N. • The Size of the Sample is referred to as n. • A Biased Sample is a sample which does not represent the population. • Discrete Data is data that can take on only certain values. These values are often integers or whole numbers C ti D t i d t th t t k• on nuous a a s a a a can a e on any one of an infinite number of possible values over an interval on the number line (how many real numbers are there between [0,1]?). These values are most often the result of measurement. • Tools of Descriptive Statistics allow you to summarize data • The Techniques of Inferential Statistics allow ustodraw inferencesor conclusions about the population from a sample •An Inference is a deduction of a conclusion • The Relative Frequency of a classification is the number of times an observation falls into that classification represented as a proportion of the total number of observations. It can be expressed as a fraction, decimal, or percentage Th C l ti R l ti F f• e umu a ve e a ve requency o a class is the sum of the relative frequencies of all classes at or below that class represented as a portion of the total number of observations. It can be expressed as a fraction, decimal or percentage, A Bar Chart represents the frequency or relative frequency from the table in the form of a rectangle or bar . Class Year of Students in Introductory Statistics 12 8 10 Number of Students 4 6 Freshmen Sophomore Junior Senior0 2 Class nc y Fr eq ue n en cy el at iv e F Fr eq ue 70 90 110 130 150 170 190 210 230 250 R e Example of a Histogram Compressive Strength (PSI) nc y 80 70 e Fr eq ue n 60 50 40 m ul at iv e 30 20 C u 10 0 E l f C l ti Di t ib ti Pl t 100 150 200 250 Strength xamp e o a umu a ve s r u on o R l f th b f hi tu es o um or s ogram construction Number of Classes and Class― Interval for Continuous Data: √#classes= n —Manual bin calculation based on problem objectives Class Year of Introductory Statistics StudentsClass Year of Introductory Statistics Students Freshman 14% reshman 14% S iSenior 36% en or 36% SophomoreJunior Sophomore 43%Junior 7% Generate a Pie Chart Using the 7% 43% “Faculty.xls” File •Generate school count with a pivot table Create a pie chart from the pi ot table• v • In a Dot Plot each observation is , plotted as a point on a single, horizontal axis. The axis is scaled so that each of the data points can be located uniquely on the axis. When there is more than one observation with the same value the points are “stacked” fon top o each other. STEM-AND-LEAF DIAGRAM • Each number in the data set should consist of at least two digits • Divide each number in to two parts, stem and leaf. St l ft t t di it– em: e mos one or wo g s – Leaf: remaining digits • The Shape of a set of data describes how the data are spread out around the center with respect to the symmetry or skewness of the data. • The Variability of a set of data describes how the data are spread out around the t ith t t th thcen er w respec o e smoo ness and magnitude of the variation. • When data are not evenly spread out on either side of the center then we refer to the distribution as being skewed. • The Sample Mean is the center of balance of a set of data, and is found by adding up all of the data values and dividing by the number of observations • The Population Mean is represented by the Greek letter μ • The Sample Median is the value of the middle observation in an ordered set of data • Consider sample data taken from a certain population • Let xi be the ith obs. for i=1,2,…, n where n is the number of observations in the sample. The sample mean is: x n i∑ n x i== 1 Histogram of Bimodal Data 20 : 20 1 0 Frequency 10ency 0 0 2 3 2 1 1 9 1 7 1 5 1 3 1 1 9 7 5 3 X 3 5 7 9 11 13 15 17 19 2123 X • WithapopulationofsizeN thepopulation, mean and variance are computed using: x N i i∑ 1 N ==μ x N ∑ 2)( μ N i i = − = 12σ • The Empirical Rule says that for a bell - shaped, symmetric distribution: - about 68% of all data values are within one standard deviation of the mean - about 95% of all observations are within two standard deviations of the mean almost all (more than 99%) of the- observations are within three standard deviations of the mean. • The Percentile Rank of a value is the percentage of observations that are at or below the value of interest — Pth percentile (p% at or below) —Percentile rank: (b+e/2)/N b = number of data values below the value of interest e = the number of other observations equal to the value (first set e=0 to understand the equation) N = population size “p % Trimmed Mean” truncates p%— of the distribution from each end of it. Provides a measure that is not as ‘extreme’ as the mean or median DemonstrationofPercentile Calculations: ECSE-4500 Exam scores (20 values): 82 65 91 83 75 80 74 63 72 79 93 55 64 84 80 90 81 73 50 95 Scores in sequence: 50,55,63,64,65,72,73,74,75,79,80,80,81,82,83,84,90,91,93,95 The 90th and 10th percentiles: 91, 55 • A Interquartile Range (IQR) is the difference between the third and first quartiles Q3 - Q1 • The Inner Fences of a boxplot are located at: • Q1 - 1.5 (IQR) and •Q3 + 1.5 (IQR) • The Outer Fences of a boxplot are located at: • Q1 - 3 (IQR) and •Q3 + 3 (IQR) • Illustrate a box plot for problem 1-56 using Minitab Box Plot Q1-3(IQR) Q3+3(IQR) * * Q1-1.5(IQR) Q3+1.5(IQR)Q1 Q3Median IQR