Download Statistics: Descriptive Statistics and Data Analysis and more Schemes and Mind Maps Industrial Engineering in PDF only on Docsity! Computer Applications in IE Introduction to Descriptive Statistics Assoc. Prof. Ho Thanh Phong HCMC University of Technology Dept. of Industrial & Systems Engineering 1 Contents OUTLINES • Introduction to Descriptive Statistics • Sample and Population • Grouped Data and the Histogram • Percentiles and Quartiles • Measures of Central Tendency • Measures of Variability • Mean and Standard Deviation • Data displaying • Exploratory Data Analysis Dept. of Industrial & Systems Engineering 2 THUẬT NGỮ Descriptive Statistics: thống kê mô tả Inferential Statistics: thống kê suy luận Population: quần thể Sample: mẫu Census: điều tra tổng thể Dept. of Industrial & Systems Engineering 5 Simple Random Sample Dept. of Industrial & Systems Engineering 6 Sampling from the population is often done randomly, such that every possible sample of equal size (n) will have an equal chance of being selected. A sample selected in this way is called a simple random sample or just a random sample. A random sample allows chance to determine its elements. Population Sample Simple Random sampling Population Sample Biased Sampling Two Types of Data Dept. of Industrial & Systems Engineering 7 Qualitative (Categorical, Nominal or Non-metric): Examples: ❑ Color ❑ Gender ❑ Nationality Quantitative ( Measurable, Countable or Metric): Examples: ❑ Temperatures ❑ Salaries ❑ Number of points scored on a 100-point exam Group Data and the Histogram Dept. of Industrial & Systems Engineering 10 Dividing data into groups or classes or intervals Groups should be: Mutually exclusive ❑ Not overlapping - every observation is assigned to only one group Exhaustive ❑ Every observation is assigned to a group Equal-width (if possible) ❑ First or last group may be open-ended Frequency Distribution Dept. of Industrial & Systems Engineering 11 Class midpoint is the middle value of a group or class or interval ❑ Relative frequency is the percentage of total observations in each class ▪ Sum of relative frequencies = 1 ❑ Cumulative frequency: a running total of frequencies through the classes THUẬT NGỮ Mutually exclusive: loại trừ lẫn nhau Exhaustive: tính đầy đủ Midpoint : điểm giữa Relative frequency : tần suất tương đối Cumulative frequency : tần suất tích lũy Dept. of Industrial & Systems Engineering 12 Percentiles Dept. of Industrial & Systems Engineering 15 Percentiles are measures of central tendency that divide a group of data into 100 parts. Given any set of numerical observations, order them into an ascending array. The position i of the Pth percentile is given by i = nP/100, where n is the number of observations in the set. ❑ If i is the whole number, then Pth percentile value is the average value between ith and (i+1)th location. ❑ If i is NOT, the whole number then Pth percentile value is located at the whole number part of (i+1). THUẬT NGỮ Histogram: biểu đồ cột Percentiles: bách phân vị An ascending array : dãy tăng dần Whole number: số nguyên Dept. of Industrial & Systems Engineering 16 Examples Dept. of Industrial & Systems Engineering 17 A large department store collects data on sales made by each of its salespeople. The number of sales made on a given day by each of 20 salespeople is shown. Also, the data has been sorted in magnitude. n = 22 Sales 9 6 12 10 13 15 16 14 14 16 17 16 24 21 22 18 19 18 20 17 27 29 Sorted Sales 6 9 10 12 13 14 14 15 16 16 16 17 17 18 18 19 20 21 22 24 27 29 Order 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Find the 50th, 80th, and the 90th percentiles of this data set. ❑ To find the 50th percentile, determine the data point in position nP/100 = (22)(50/100) = 11 is a whole number. The 50th percentile is the average value of the 11th values and the 12th value: 16.5. ❑ To find the 80th percentile, the location is nP/100 = (22)(80/100) = 17.6 is not a whole number. The 80th percentile is the value of the 18th values: 21 ❑ To find the 90th percentile, the location is nP/100 = (22)(90/100) = 19.8 The 90th percentile is the value of the 20th values: 24 THUẬT NGỮ Quartiles: tứ phân vị Median: trung vị Interquartile Range: khoảng liên tứ phân Dept. of Industrial & Systems Engineering 20 Examples Dept. of Industrial & Systems Engineering 21 ❑ 1st Quartile Q1 is 25th percentile. Location is i = (22)(25)/(100)=5.5. Value is the 6th value = 14 ❑ 2nd Quartile Q2 is 50th percentile. Location is i = (22)(50)/(100)=11. Value is the average value between 11th and 12th value = 16.5. ❑ 3rd Quartile Q3 is 75th percentile. Location is i = (22)(75)/(100)=16.5. Value is the 17th value = 20 Summary Measures Population Parameters Sample Statistics Dept. of Industrial & Systems Engineering 22 ❑ Measures of Central Tendency ▪ Median ▪ Mode ▪ Mean ❑ Measures of Variability ▪ Range ▪ Interquartile range ▪ Variance ▪ Standard Deviation ❑ Other summary measures: ▪ Skewness ▪ Kurtosis THUẬT NGỮ Median: trung vị Mode: yếu vị Mean: trung bình Range: khoảng thiến biến Interquartile Range: khoảng liên tứ phân Variance: phương sai Standard Deviation: độ lệch chuẩn Dept. of Industrial & Systems Engineering 25 Variance and Standard Deviation Dept. of Industrial & Systems Engineering 26 ( ) N N x x N x N i N i N i 2 1 1 2 1 2 2 − = − = = = = 2 = ( ) 1 1 2 1 1 2 1 2 2 − − = − − = = = = n n x x n xx s n i n i n i 2ss = Population Variance Sample Variance Calculation of Sample Variance Dept. of Industrial & Systems Engineering 27 Find the sample mean and sample variance for the following series of data: No. Value 1 21 2 12 3 34 4 22 5 17 6 18 7 43 8 28 9 56 10 34 11 12 Skewness Dept. of Industrial & Systems Engineering 30 Skewed to left 6 0 05 0 04 0 03 0 02 0 01 0 0 3 0 2 0 1 0 0 x F r e q u e n c y Mean < median < mode Skewness Dept. of Industrial & Systems Engineering 31 Mean = median = mode 6 0 05 0 04 0 03 0 02 0 01 0 0 x 3 0 2 0 1 0 0 F r e q u e n c y Symmetric Skewness Dept. of Industrial & Systems Engineering 32 Mode < median <mean 6 0 05 0 04 0 03 0 02 0 01 0 0 x 3 0 2 0 1 0 0 F r e q u e n c y Skewed to right Kurtosis Dept. of Industrial & Systems Engineering 35 Leptokurtic - peaked distribution 1 00- 1 0 2 0 0 0 1 0 0 0 0 Y F r e q u e n c y Relations between the Mean and Standard Deviation Dept. of Industrial & Systems Engineering 36 Chebyshev’s Theorem ❑ Applies to any distribution, regardless of shape ❑ Places lower limits on the percentages of observations within a given number of standard deviations from the mean Empirical Rule ❑ Applies only to roughly mound-shaped and symmetric distributions ❑ Specifies approximate percentages of observations within a given number of standard deviations from the mean THUẬT NGỮ Chebyshev’s Theorem: định lý Chebyshev Empirical Rule: quy tắc thực nghiệm Dept. of Industrial & Systems Engineering 37