




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Statistics is the science which deals with the collection, classifying, presenting, comparing and interpreting numerical data collected to throw light on ...
Typology: Lecture notes
1 / 162
This page cannot be seen from the preview
Don't miss anything!





























































































1.1- Objectives 1.2- Introduction 1.3- Definitions of Statistics and biostatistics 1.3.1- Definitions of statistics 1.3.2- Definitions of biostatistics 1.4- Statistical symbol 1.4.1- Statistical symbol 1.4.2- Scope of Biostatics 1.4.3- Applications of biostatistics 1.5- Data and its types 1.5.1- Data 1.5.2- Types of data 1.5.2.1- Nominal data 1.5.2.2- Ordinal data 1.5.2.3- Ranked data 1.5.2.4- Discrete data 1.5.2.5- Continuous data 1.6- Data collection and related terms 1.6.1- Population, sample, sampling unit and sampling frame 1.6.1.1- Population 1.6.1.2- Sample 1.6.1.3- Sampling unit 1.6.1.4- Sampling frame 1.6.2- Principle steps in a sample survey 1.6.3- Sampling and non sampling errors 1.6.3.1- Sampling errors
1.6.3.2- Non-sampling errors 1.7- Types of sampling schemes 1.7.1- Subjective or purposive or judgment sampling 1.7.2- Probability sampling 1.7.2.1- Simple random sampling 1.7.2.1.1- Selection of a simple random sample 1.7.2.2- Stratified random sampling 1.7.3- Mixed sampling 1.8- Organization and representation of data 1.8.1- Organization of data 1.8.2- Representation of data 1.8.2.1- Bar diagram 1.8.2.2- Pie chart 1.8.2.3- Histogram 1.8.2.4- Frequency polygon and frequency curve 1.8.2.5- Pictograms 1.8.2.6- Line chart 1.8.2.7- Cumulative frequency curve or ogive 1.8.2.8- Scatter diagram 1.9- Summary 1.10- Glossary 1.11- Self assessment question 1.12- References 1.13-Terminal Questions
1. Parents of a child with a genetic defect consider whether or not they should have another child. They will base their decision on the chance that the next child will have the same defect. 2. To choose the best therapy, a physician must compare the diagnosis or future course, of a patient under several therapies. A therapy may be a success, a failure, or somewhere in between; the evaluation of the chance of each occurrence necessarily enters into the decision.
Statistics is the science which deals with the collection, classifying, presenting, comparing and interpreting numerical data collected to throw light on any sphere of enquiry- Lovitt.
The science of statistics is a most useful servant, but only of great value to those who understand its proper use- W.I.King. Statistics provides tools and techniques for research workers- A.M. Mood. Planning is the order of the day and without statistics planning is inconceivable- L.H.C. Tippet.
Statistics may be defined as a science of numerical information which employs the process of measurement and collection, classification, analysis, decision making and communication of results in a manner understandable and verifiable by other- Cecil H. Meyers
Biostatistics is the application of statistics methods applied to biological areas. Biological laboratory experiments, medical research (including clinical research), and health services research all use statistical methods. Many other biological disciplines rely on statistical methodology.
There are three reasons for focusing on biostatistics:
1. Some statistical methods are used more deeply in biostatistics than in other fields. For example, a general statistical textbook would not discuss the life-table method of analyzing survival data of importance in many bio-statistical applications. The topics in this book are adapted to the applications in mind. 2. Examples are drawn from the biological, medical, and health care areas; this helps you maintain motivation. It also helps you in understanding how to apply statistical methods.
3. A third reason for a book on biostatistics is to teach the material to the audience of health professionals. In this case, the interaction between students and teacher, but especially among the students themselves, is of great value in learning and applying the subject matter.
Some of the statistical symbols which are useful to biostatistics students are:
f: Frequency of the variate
x : Arithmetic Mean of a given set of values or of a distribution
M (^) e : Median of a given set of values or of a distribution
M (^) o : Mode of a given set of values or of a distribution
σ :Standard Deviation of a given set of values or of a distribution σ 2 : Variance of a given set of values or of a distribution Σ : Sum of all the values of a given set
Q.D.: Quartile deviation of a given set of values or of a distribution
M.D.: Mean deviation of a given set of values or of a distribution
Biostatistics is the application of statistics in different fields of biology. The science of biostatistics includes the design of biological experiments, especially in medicine, pharmacy, agriculture, forestry, environmental science, fishery etc; the collection, summarization, and analysis of data from those experiments; and execute interpretation and
approaches. In biology most phenomena are affected by many casual factors, uncontrollable in their variation and often unidentifiable. Statistics is needed to measure such variable phenomena with a predictable error and to ascertain the reality of minute but important differences.
A Biostatistics centre could jointly organize working groups, the seminar series, computing infrastructure and possibly consulting and clinical trials coordinating centre cervices. The main objective of the centre would be to estimate, collaborate on, and circulate results of research in a particular subspecialty in the following reasons:
The information collected from census or surveys or from other sources is called raw data. The word data means information. The adjective raw attached to data indicates that the information collected cannot be used directly. It has to be converted into more suitable form before it begins to make sense to be utilized gainfully. Raw data is like raw rice. Raw rice has to be cooked properly and tastefully before it is eaten and digested. Similarly, raw data has to
be converted into proper form such as tabulation, frequency distribution form, etc, before any inference is drawn from it.
There are two ways of statistical data;
Primary data: It is the data collected by some person or organization for his own use from any primary source. For example the data of census report collected by the centre government, the data collected by any agency for its own purpose, the gadget of India, etc.
Secondary data: It is the data collected by some other person or organization for their own use but the investigator also gets it for his own use. For example the data collected by any medical agency can be used by some other medical institute students.
In other words, primary data are those data which are collected by you to meet your own specific purpose where as the secondary data are those data which are collected by somebody else. A data can be primary for one purpose and secondary for the other.
The primary data are of the following types:
1.5.2.1 Nominal Data
In the study of biostatistics, we meet many different types of numerical data. The different types have varying degrees of structure in the relationships among possible values. One of the simplest types of data is nominal data, in which the values fall into unordered categories or classes. In a certain study, for instance, males might be assigned the value 1 and females the value 0. Numbers are used mainly for the sake of convenience; numerical values allow us to use computers to perform complex analysis of the data. Nominal data that take on one of two distinct values-such as male and female are said to be dichotomous or binary, depending on whether the Greek or the Latin root for two is preferred. However, not all nominal data need be dichotomous. Often there are three or more possible categories into
1.5.2.4 Discrete Data
For discrete data, both ordering and magnitude are important. In this case, the numbers represent actual measurable quantities rather than mere labels. In addition, discrete data are restricted to taking on only specified values-often integers or counts-that differ by fixed amounts; no intermediate values are possible. Examples of discrete data include the number of motor vehicle accidents in Dehradun in a particular month, the number of times a woman has given birth, the number of new cases of tuberculosis reported in the India during a one- year period, and the number of beds available in a particular hospital. Note that for discrete data a natural order exists among the possible values. If we are interested in the number of times a woman has given birth, for instance, a larger number indicates that a woman has had more children. Furthermore, the difference between one and two births is the same as the difference between four and five births. Finally, the number of births is restricted to the nonnegative integers; a woman cannot give birth 3.4 times because it is meaningful to measure the distance between possible data.
1.5.2.5 Continuous Data
Data that represent measurable quantities but are not restricted to taking on certain specified values (such as integers) are known as continuous data. In this case, the difference between any two possible data values can be arbitrarily small. Examples of continuous data include time, the serum cholesterol level of a patient, the concentration of a pollutant, and temperature. In all instances, fractional values are possible. Since we are able to measure the distance between two observations in a meaningful way, arithmetic operations can be applied. The only limiting factor for a continuous observation is the degree of accuracy with which it can be measured; consequently, we often see time rounded off to the nearest second and weight to the nearest pound or gram. The more accurate our measuring instruments, however, the greater the amount of detail that can be achieved in our recorded data.
In a study of the effects of maternal smoking on newborns, for example, we might first record the birth weights of a large number of infants and then categorize the infants into three groups: those who weight less than 1500 grams, those who weight between 1500 and 2500 grams, and those who weight more than 2500 grams. Although we have the actual measures of birth weight, we are not concerned with whether a particular child weighs 1560 grams or
1580 grams; we are only interested in the number of infants who fall into each category. From prior experience, we may not expect substantial differences among children within the very low birth weight, low birth weight, and normal birth weight groupings. Furthermore, ordinal data are often easier to handle than continuous data and thus simplify the analysis. There is a consequent loss of detail in the information about the infants, however. In general, the degree of precision required in a given set of data depends on the questions that are being studied.
As we progressed, the nature of the relationship between possible data values became increasingly complex. Distinctions must be made among the various types of data because different techniques are used to analyze them. It does not make sense to speak of an average blood type of 1.8; it does make sense, however, to refer to an average temperature of 4.55°C.
Population is a collection of units or objects of which some property is defined for every unit or object. Population may consist of finite or infinite number of units. Population is also called universe by a number of statisticians and scientists.
The inhabitants of a region, number of wheat fields in a state or district, fruit plants in a city, number of students in a institution, insects in a field, persons suffering from any particular disease, workers in a institution, total no of person in city, total households, total no of students in any university, are a few examples of finite populations. All real numbers, all stars in the sky are examples of infinite populations. Generally, the population has a large number of animates and inanimates. Moreover, the units or subjects constituting the
Figure- 1.1 Population v/s Sample
The main steps in the planning and execution of
1. Objective of the survey. The first step is to define clearly the objective of the survey. It is generally found that even the sponsoring agency what it wants and how it is going to use the results. The spon care that the objectives should be fulfilled with the available resources of time, money and manpower. 2. Defining the population to be sampled. be taken should be defined clearly. For example in sampling of farms clear framed to define a farm in respect of shape, size, etc., keeping in mind the border so as to enable the investigating person to decide whether to include or not a par in the population. 3. The frame and sampling unit. and they must be distinct, obvious and non population belongs to one and only one sampli
.1 Population v/s Sample
The main steps in the planning and execution of a survey are as follows: The first step is to define clearly the objective of the survey. It is generally found that even the sponsoring agency is not quite clear in its thinking as to what it wants and how it is going to use the results. The sponsors of the survey should take care that the objectives should be fulfilled with the available resources presented in the form
2. Defining the population to be sampled. The population from which the sample is to be defined clearly. For example in sampling of farms clear-cut rules must be framed to define a farm in respect of shape, size, etc., keeping in mind the border so as to enable the investigating person to decide whether to include or not a par
The frame and sampling unit. The sampling units must cover the entire population and they must be distinct, obvious and non-overlapping in the sense that every element of the population belongs to one and only one sampling unit. For example, in socio
The first step is to define clearly the objective of the survey. is not quite clear in its thinking as to sors of the survey should take presented in the form
The population from which the sample is to cut rules must be framed to define a farm in respect of shape, size, etc., keeping in mind the border- line cases so as to enable the investigating person to decide whether to include or not a particular farm
The sampling units must cover the entire population overlapping in the sense that every element of the ng unit. For example, in socio-economic
survey for selecting people in a town, the sampling unit might be an individual person, a family, a household or a block in a locality.
In order to cover the population decided upon, there should be a list, map or some other acceptable material, called the frame, which serves as a guide to the to the population to be covered. The construction of the frame is one of a main problem since it is the frame which determines the structure of the sample survey. If the frame is not up-to-date, it should be brought up-to-date before using it.
4. Data to be collected. The data should be collected keeping in view the objective of the survey. The tendency should not be to collect too many data some of which are never subsequently examined and analyzed. A practical method is to chalk out an outline of the tables that the survey should produce. This would help in eliminating the collection of irrelevant information and ensure that no necessary data are omitted. 5. The questionnaire or schedule. Having decided about the type of the data to be collected, the next important step of the sample selection is the construction of the questionnaire (to be filled by the respondent) or schedule of enquiry (to be filled by the interviewer) which requires skill, special technique as well as familiarity with the subject matter. The questions should be clear, brief, non- offending, polite in tone, clear-cut and to the point so that not much scope of guessing is left on the part of the respondent or interviewer. Suitable and detailed instructions for filling up the questionnaire or schedule should also be prepared. 6. Method of collecting information. The two methods commonly used to collect the sample data are: (i) Interview method. In this method, the investigator goes from house to house and interviews the individuals personally. He asks the questions one by one and fills up the schedule on the basis of the information gained from the individuals. (ii) Mailed questionnaire method. In this method the questionnaire is mailed to the individuals who are required to fill it up and returns it duly completed.
(ii) Tabulation of data. decide the procedure for tabulation of the data which are incomplete due to non The method of tabulation, hand or machine should be depending upon the size of the data.
(iii) Statistical analysis. prepared for the statistical analysis. There are different methods of estimation; therefore a suitable formula should be used for the estimation
(iv) Information for future note of caution for designing future surveys. The information in the form of the data means, standard deviation, the nature of the variability and the cost, time, etc., are important which are helpful for future surveys. Any completed sample survey is a lesson for future surveys in recognizing and rectifying the mistakes committed in the post survey.
Figure-1.
Tabulation of data. Before carrying out the tabulation of the data, we should decide the procedure for tabulation of the data which are incomplete due to non The method of tabulation, hand or machine should be depending upon the size of the data.
analysis. A properly scrutinized, edited and tabulated data now prepared for the statistical analysis. There are different methods of estimation; therefore a should be used for the estimation of the parameters. Information for future surveys. Any completed survey helps in providing a note of caution for designing future surveys. The information in the form of the data means, standard deviation, the nature of the variability and the cost, time, etc., are important which future surveys. Any completed sample survey is a lesson for future surveys in recognizing and rectifying the mistakes committed in the post survey.
1.2 Sampling Design Process
Before carrying out the tabulation of the data, we should decide the procedure for tabulation of the data which are incomplete due to non-response. The method of tabulation, hand or machine should be depending upon the size of the data.
A properly scrutinized, edited and tabulated data now prepared for the statistical analysis. There are different methods of estimation; therefore a
Any completed survey helps in providing a note of caution for designing future surveys. The information in the form of the data means, standard deviation, the nature of the variability and the cost, time, etc., are important which future surveys. Any completed sample survey is a lesson for future surveys in
The errors in the collection, processing and analysis of the data are of two types:
1.6.3.1 Sampling errors
Sampling errors arise in the collection of a sample and the reason is because only a small part of the population is used for getting the population parameter estimates. Therefore these are absent in the complete enumeration. The main reasons of these errors are:
1.6.3.2 Non- Sampling errors
The errors due to the inductive process of inferring about the population on the basis of a sample, the non-sampling errors arise at the stages of observation, ascertainment and processing the data and so are present in the complete enumeration and sample survey both. The reasons of these errors are:
(iii) Where chances of selection of unit is proportional to the sample size.
Some techniques which are commonly used in sampling are as follows:
1.7.2.1 Simple random sampling
This is the basic and most commonly used method of sampling. In this method each unit of the population has an equal chance of selection in the sample.
In this method, an equal probability is attached to each unit of the population at the first draw. It also indicates an equal probability of selection for the remaining units at the subsequent draws.
For example, to draw a simple random sample from an outdoor patient register of the department of obstetrics and gynecology, each entry would need to be numbered subsequently. If you want to draw a sample of size 700 out of 3500, a list of 700 random numbers between 1 and 3500 would need to be prepared using one of the known procedures (described later). The 700 entries made in the register corresponding to 700 random numbers present in the prepared list would make up the required sample.
There are two ways in simple random sampling, if the unit drawn is replaced back before the next unit is drawing, the technique is called simple random sampling with replacement and if the drawing units are not replaced back and the next draws are done without selected units, the technique is called simple random sampling without replacement.
1.7.2.1.1 Selection of a simple random sample
Mainly two approaches are use to draw a simple random sample: (a) Lottery system method (b) Mechanical randomization or random numbers method (a) Lottery system. This is the simplest method of selecting a random sample. The process is given below: Suppose for a survey we want to select (n) students out of a class of (N) students. We assign the numbers 1 to N; one number to each student and we write these numbers on (N) identical chits which are same in size shape and color. These chits are put in a bag and
thoroughly shuffled and then (n) chits are drawn one by one. The (n) students corresponding to the numbers on these chits will make the required sample.
This method is quite independent of the properties of the population. Generally, in place of chits, cards are used. This is one of the most reliable methods of selecting a random sample.
(b) Mechanical randomization or random numbers method. The lottery method is time consuming, if the population is large. In random numbers methods, a randomly generated numbers’ table known as random number table is used to draw the required sample. There are many tables of this types prepared by many professors and scientists. These tables are so constructed that each of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 appears same number of times independently of each other. If we want to select a sample from a population of size N ( ≤ 99) then the numbers can be combined two by two and we will get pairs from 00 to 99. Similarly if N ( ≤ 999) or N ( ≤ 9999), and so on, then we combine the numbers three by three for N ( ≤ 999) and four by four for N ( ≤ 9999), and so on. Since each of the digits 0, 1, 2,…..,9 occurs equal number of times independently of each other, so does each of the pairs 00 to 99 or triplets 000 to 999, or quadruplets 0000 to 9999, and so on.
The steps of drawing the random sample are as follows:
(i) Identify the N units in the population with numbers from 1 to N. (ii) Select at random any page of the random number table and pick up the numbers row wise or column wise or diagonal wise at random. (iii) The population units corresponding to these selected numbers form the required sample. Some commonly used random number tables are: