




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A foundational overview of statistics, covering key concepts such as data collection, organization, analysis, and interpretation. It explores the distinction between descriptive and inferential statistics, highlighting their roles in understanding data and drawing conclusions. The document also delves into measures of central tendency, including mean, median, and mode, and discusses their applications in summarizing data. Additionally, it touches upon the concept of range and its significance in describing data variability.
Typology: Lecture notes
1 / 204
This page cannot be seen from the preview
Don't miss anything!





























































































iii
Dean Faculty of Sciences: Prof. Dr. Irshad Ahmad Arshad Course Development Coordinator: Dr. Muhammad Zakria Writers: 1. Dr. Zahid Iqbal
iv
Page # Preface .............................................................................................................. v Message from the Dean .................................................................................. vi Acknowledgments ........................................................................................... vii Introduction of the Course ............................................................................. viii Objectives of the Course ................................................................................. viii Unit–1: Introduction ........................................................................................ 1 Unit–2: Data Presentation–I ............................................................................ 23 Unit– 3 : Data Presentation–II ......................................................................... 33 Unit– 4 : Measure of Central Tendency–I ...................................................... 45 Unit– 5 : Measure of Central Tendency–II ..................................................... 73 Unit– 6 : Measure of Dispersion ...................................................................... 85 Unit– 7 : Correlation & Regression.................................................................. 105 Unit– 8 : Probability Theory ............................................................................ 135 Unit– 9 : Statistical Inference ........................................................................... 149
vi
Mathematics and Statistics are the branches of science which develop a relationship and fill the gaps between the other subjects of natural sciences. Naturally Mathematics and Statistics look a bit abstract but when we use it in other sciences, their applications come across us at once. This course is much needed for our students to inculcate the first-level concepts. The basic mathematics and statistics knowledge is beautifully covered in this book. I wish to express my sincere thanks, appreciation and congratulations to all those persons who have contributed in the development of this book. (Prof. Dr. Irshad Ahmad Arshad) Dean Faculty of Sciences
vii
I am thankful to Almighty Allah for giving me the opportunity to develop this course. I would like to express my gratitude to many people who saw me throughout the preparation of this book and to all those who provided support, talked things over read, wrote, offered comments and assisted in the editing, proofreading and design. I would like to thank Prof. Dr. Irshad Ahmad Arshad, Dr. Muhammad Yameen Danish and Dr. Muhammad Zakria for their constant support and encouragement. I am especially grateful to Prof. Dr. Irshad Ahmad Arshad for his extremely detailed reviews and suggestions of the manuscript many times till the finalization of this book. Many extraordinary talented are responsible for helping to create this text. I would like to thank Ms. Lubna Naz for helping for this course on priority basis. Particular thanks go to each of my colleagues at AIOU. Our everyday conversations regarding Mathematics instruction have been of the utmost importance to the development of this text and to my teaching career. I am highly grateful to Prof. Dr. Irshad Ahmad Arshad, Dean Faculty of Sciences and Prof. Dr. Zia Ul-Qayyum, Vice-Chancellor, AIOU for giving creative ideas of writing challengeable mathematical concepts in a simple way and provided all facilities to complete this course in the shortest possible time. I also acknowledge the team of APCP, the editor Mr. Fazal Karim, layout designer/ composer Mr. Muhammad Hameed and Mr. Anwar-ul-Haq for giving their input to make this book more beautiful readable. I am grateful to all PPU team is very cooperative and helpful for printing and publishing the book. Suggestions and criticism to improve the text will be highly appreciated. It can be forwarded at [email protected] (Dr. Zahid Iqbal) Course Coordinator Department of Statistics
ix
The course aims to teach students, the basic concepts of Statistics and also create value for their use in practical situations. The objectives of the course are: to develop an understanding and desire of Statistics students to present the material in a way which is helpful in motivating the students to study this subject statistics at higher level. How the Course is Organized? The text is divided into nine units in one volume. An attempt has been made to present the material in an informal way. Only those topics of Mathematics and Statistics are covered which are thought to be useful for everyone. e.g. we started studying Mathematics when we were in class one and we continue its studies it till class 10th. In spite of this, most of us do not have full command over every day statistics. Unit– 1 covers some basic concepts of statistics. It is the history of Statistics. This Unit explains the definition, nature, importance and limitations of the Statistics. Unit– 2 explains that how large data is summarized and presented by using different Statistical tools. Unit– 3 explains some of the additional statistical tools that how large data is summarized and presented by using different Statistical tools. Unit– 4 covers the calculations and properties of measures of central tendency, including mean, mode, median etc. Unit– 5 covers some additional calculations and properties of measures of central tendency, including Geometric mean, Harmonic Mean etc.
x Unit– 6 is about different measures of dispersion and their application to data in different fields. Unit– 7 explains that how the Statistical handling the bivariate. Unit– 8 is the basic of the probabilities and their definitions etc. Unit– 9 is about the statistical Inference.
For a layman, ‘Statistics’ means numerical information expressed in quantitative terms. This information may relate to objects, subjects, activities, phenomena or regions of space. As a matter of fact, data have no limits as to their reference, coverage and scope. At the macro level, these are data on gross national product and shares of agriculture, manufacturing and services in GDP (Gross Domestic Product). At the micro level, individual firms, howsoever small or large, produce extensive statistics on their operations. The annual reports of companies contain variety of data on sales, production, expenditure, inventories, capital employed and other activities. These data are often field data, collected by employing scientific survey techniques. Unless regularly updated, such data are the product of a one-time effort and have limited use beyond the situation that may have called for their collection. A student knows statistics more intimately as a subject of study like economics, mathematics, chemistry, physics and others. It is a discipline, which scientifically deals with data and is often described as the science of data. In dealing with statistics as data, statistics has developed appropriate methods of collecting, presenting, summarizing, and analyzing data and thus consists of a body of these methods.
After reading this unit, the student will be able to understand the:
Bowley has defined statistics as: (i) statistics is the science of counting, (ii) Statistics may rightly be called the science of averages, and (iii) statistics is the science of measurement of social organism regarded as a whole in all its manifestations. Boddington defined as: Statistics is the science of estimates and probabilities. Further, W.I. King has defined Statistics in a wider context, the science of Statistics is the method of judging collective, natural or social phenomena from the results obtained by the analysis or enumeration or collection of estimates. Seligman explored that statistics is a science that deals with the methods of collecting, classifying, presenting, comparing and interpreting numerical data collected to throw some light on any sphere of enquiry. Spiegal defines statistics highlighting its role in decision-making particularly under uncertainty, as follows: statistics is concerned with scientific method for collecting, organising, summa rising, presenting and analyzing data as well as drawing valid conclusion and making reasonable decisions on the basis of such analysis. According to Prof. Horace Secrist, Statistics is the aggregate of facts, affected to a marked extent by multiplicity of causes, numerically expressed, enumerated or estimated according to reasonable standards of accuracy, collected in a systematic manner for a pre- determined purpose, and placed in relation to each other. From the above definitions, we can highlight the major characteristics of statistics as follows: (i) Statistics are the aggregates of facts. It means a single figure is not statistics. For example, national income of a country for a single year is not statistics but the same for two or more years is statistics. (ii) Statistics are affected by a number of factors. For example, sale of a product depends on a number of factors such as its price, quality, competition, the income of the consumers, and so on. (iii) Statistics must be reasonably accurate. Wrong figures, if analysed, will lead to erroneous conclusion. Hence, it is necessary that conclusion must be based on accurate figures. (iv) Statistics must be collected in a systematic manner. If data are collected in a haphazard manner, they will not be reliable and will lead to misleading conclusion. (v) Collected in a systematic manner for a pre-determined purpose (vi) Lastly, Statistics should be placed in relation to each other. If one collects data unrelated to each other, then such data will be confusing and will not lead to any logical conclusion. Data should be comparable over time and over space.
Descriptive statistics is the term given to the analysis of data that helps describe, show or summarize data in a meaningful way such that, for example, patterns might emerge from the data. Descriptive statistics do not, however, allow us to make conclusions beyond the data we have analysed or reach conclusions regarding any hypotheses we might have made. They are simply a way to describe our data. Descriptive statistics are very important because if we simply presented our raw data it would be hard to visualize what the data was showing, especially if there was a lot of it. Descriptive statistics therefore enables us to present the data in a more meaningful way, which allows simpler interpretation of the data. For example, if we had the results of 100 pieces of students' coursework, we may be interested in the overall performance of those students. We would also be interested in the distribution or spread of the marks. Descriptive statistics allow us to do this. How to properly describe data through statistics and graphs is an important topic and discussed in other Laerd Statistics guides. Typically, there are two general types of statistic that are used to describe data: Measures of central tendency: these are ways of describing the central position of a frequency distribution for a group of data. In this case, the frequency distribution is simply the distribution and pattern of marks scored by the 100 students from the lowest to the highest. We can describe this central position using a number of statistics, including the mode, median, and mean. You can learn more in our guide: Measures of Central Tendency. Measures of spread: these are ways of summarizing a group of data by describing how spread out the scores are. For example, the mean score of our 100 students may be 65 out of 100. However, not all students will have scored 65 marks. Rather, their scores will be spread out. Some will be lower and others higher. Measures of spread help us to summarize how spread out these scores are. To describe this spread, a number of statistics are available to us, including the range, quartiles, absolute deviation, variance and standard deviation. When we use descriptive statistics it is useful to summarize our group of data using a combination of tabulated description (i.e., tables), graphical description (i.e., graphs and charts) and statistical commentary (i.e., a discussion of the results).
the population of England denotes the number of people within England’s boundaries, perhaps as enumerated at a census. But a physician might embark on a study to try to answer the question “What is the average systolic blood pressure of Englishmen aged 40-59?” But who are the “Englishmen” referred to here? Not all Englishmen live in England, and the social and genetic background of those that do may vary. A surgeon may study the effects of two alternative operations for gastric ulcer. But how old are the patients? What sex are they? How severe is their disease? Where do they live? And so on. The reader needs precise information on such matters to draw valid inferences from the sample that was studied to the population being considered. Statistics such as averages and standard deviations, when taken from populations are referred to as population parameters. They are often denoted by Greek letters: the population mean is denoted by μ(mu) and the standard deviation denoted by ς (low case sigma)
A population commonly contains too many individuals to study conveniently, so an investigation is often restricted to one or more samples drawn from it. A well- chosen sample will contain most of the information about a particular population parameter but the relation between the sample and the population must be such as to allow true inferences to be made about a population from that sample. Consequently, the first important attribute of a sample is that every individual in the population from which it is drawn must have a known non-zero chance of being included in it; a natural suggestion is that these chances should be equal. We would like the choices to be made independently; in other words, the choice of one subject will not affect the chance of other subjects being chosen. To ensure this we make the choice by means of a process in which chance alone operates, such as spinning a coin or, more usually, the use of a table of random numbers To draw a satisfactory sample sometimes presents greater problems than to analyse statistically the observations made on it. Before drawing a sample, the investigator should define the population from which it is to come. Sometimes he or she can completely enumerate its members before beginning analysis – for example, all the livers studied at necropsy over the previous year, all the patients aged 20- 44 admitted to hospital with perforated peptic ulcer in the previous 20 months. In retrospective studies of this kind numbers can be allotted serially from any point in the table to each patient or specimen. Suppose we have a population of size 150, and we wish to take a sample of size five. Contains a set of computer generated random digits arranged in groups of five. Choose any row and column, say the last column of five digits. Read only the first three digits, and go down the column starting with the first row. Thus we have 265, 881, 722, etc. If a number appears
between 001 and 150 then we include it in our sample. Thus, in order, in the sample will be subjects numbered 24, 59, 107, 73, and 65. If necessary we can carry on down the next column to the left until the full sample is chosen. The use of random numbers in this way is generally preferable to taking every alternate patient or every fifth specimen, or acting on some other such regular plan. The regularity of the plan can occasionally coincide by chance with some unforeseen regularity in the presentation of the material for study – for example, by hospital appointments being made from patients from certain practices on certain days of the week, or specimens being prepared in batches in accordance with some schedule. As susceptibility to disease generally varies in relation to age, sex, occupation, family history, exposure to risk, inoculation state, country lived in or visited, and many other genetic or environmental factors, it is advisable to examine samples when drawn to see whether they are, on average, comparable in these respects. The random process of selection is intended to make them so, but sometimes it can by chance lead to disparities. To guard against this possibility the sampling may be stratified. This means that a framework is laid down initially, and the patients or objects of the study in a random sample are then allotted to the compartments of the framework. For instance, the framework might have a primary division into males and females and then a secondary division of each of those categories into five age groups, the result being a framework with ten compartments. It is then important to bear in mind that the distributions of the categories on two samples made up on such a framework may be truly comparable, but they will not reflect the distribution of these categories in the population from which the sample is drawn unless the compartments in the framework have been designed with that in mind. For instance, equal numbers might be admitted to the male and female categories, but males and females are not equally numerous in the general population, and their relative proportions vary with age. This is known as stratified random sampling. For taking a sample from a long list a compromise between strict theory and practicalities is known as a systematic random sample. In this case we choose subjects a fixed interval apart on the list, say every tenth subject, but we choose the starting point within the first interval at random.
A quantity which can vary from one individual or object to another is called a variable. It is usually denoted by the last letters of the alphabet, x, y, z.