Introduction to Statistics: Data Types, Sources, and Presentation, Lecture notes of Statistics

This lesson will guide you about introduction to statistics

Typology: Lecture notes

2020/2021

Available from 03/30/2023

orlando-balana
orlando-balana 🇵🇭

4 documents

1 / 72

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
COURSE CODE: MC-106
LESSON: 01
AUTHOR: SURINDER KUNDU
AN INTRODUCTION TO BUSINESS STATISTICS
OBJECTIVE: The aim of the present lesson is to enable the students to understand
the meaning, definition, nature, importance and limitations of statistics.
A knowledge of statistics is like a knowledge of foreign
language of algebra; it may prove of use at any time under
any circumstance ................................................. Bowley.
STRUCTURE OF THE NOTE:
1.1 Introduction
1.2 Meaning and Definitions of Statistics
1.3 Types of Data and Data Sources
1.4 Types of Statistics
1.5 Scope of Statistics
1.6 Importance of Statistics in Business
1.7 Limitations of statistics
1.8 Summary
1.9 Self-Test Questions
1.10 Types of Data, Important Sources of Secondary Data; Collection and Presentation of
Data: Different Methods of collecting Primary Data: Text, Tabular and graphical
Methods of Data presentation; Frequency Distribution, Diagrammatic Presentation of
Frequency data.
1.1 INTRODUCTION
For a layman, ‘Statistics’ means numerical information expressed in quantitative
terms. This information may relate to objects, subjects, activities, phenomena, or
regions of space. As a matter of fact, data have no limits as to their reference,
coverage, and scope. At the macro level, these are data on gross national product and
shares of agriculture, manufacturing, and services in GDP (Gross Domestic Product).
1
SUBJECT: BUSINESS STATISTICS
BBA 2nd Semester
BBAHC-3: BUSINESS STATISTICS & MATHEMATICS
(The note is constructed on the basis of different collections from several sources including websites. In
case of any difficulty to understand one may contact the instructor. Useful disclaimers apply)
Unit-I
Introduction: Definition of Statistics, Importance and scope of statistics, Limitations of Statistics,
Types of Data, Important Sources of Secondary Data; Collection and Presentation of Data:
Different Methods of collecting Primary Data: Text, Tabular and graphical Methods of Data
presentation; Frequency Distribution, Diagrammatic Presentation of Frequency data.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48

Partial preview of the text

Download Introduction to Statistics: Data Types, Sources, and Presentation and more Lecture notes Statistics in PDF only on Docsity!

COURSE CODE: MC- 106

LESSON: 01

AUTHOR: SURINDER KUNDU

AN INTRODUCTION TO BUSINESS STATISTICS

OBJECTIVE: The aim of the present lesson is to enable the students to understand

the meaning, definition, nature, importance and limitations of statistics. “A knowledge of statistics is like a knowledge of foreign language of algebra; it may prove of use at any time under any circumstance”… .................................................Bowley.

STRUCTURE OF THE NOTE:

1.1 Introduction 1.2 Meaning and Definitions of Statistics 1.3 1.4 Types of Data and Data SourcesTypes of Statistics 1.5 Scope of Statistics 1.6 Importance of Statistics in Business 1.7 Limitations of statistics 1.8 1.9 SummarySelf-Test Questions 1.10 Types of Data, Important Sources of Secondary Data; Collection and Presentation of Data: Different Methods of collecting Primary Data: Text, Tabular and graphical Methods of Data presentation; Frequency Distribution, Diagrammatic Presentation of Frequency data.

1.1 INTRODUCTION

For a layman, ‘Statistics’ means numerical information expressed in quantitative terms. This information may relate to objects, subjects, activities, phenomena, or regions of space. As a matter of fact, data have no limits as to their reference, coverage, and scope. At the macro level, these are data on gross national product and shares of agriculture, manufacturing, and services in GDP (Gross Domestic Product).

SUBJECT: BUSINESS STATISTICS

BBA – 2 nd^ Semester BBAHC-3: BUSINESS STATISTICS & MATHEMATICS

(The note is constructed on the basis of different collections from several sources including websites. In

case of any difficulty to understand one may contact the instructor. Useful disclaimers apply)

Unit-I Introduction: Definition of Statistics, Importance and scope of statistics, Limitations of Statistics, Types of Data, Important Sources of Secondary Data; Collection and Presentation of Data: Different Methods of collecting Primary Data: Text, Tabular and graphical Methods of Data presentation; Frequency Distribution, Diagrammatic Presentation of Frequency data.

At the micro level, individual firms, howsoever small or large, produce extensive statistics on their operations. The annual reports of companies contain variety of data on sales, production, expenditure, inventories, capital employed, and other activities. These data are often field data, collected by employing scientific survey techniques. Unless regularly updated, such data are the product of a one-time effort and have limited use beyond the situation that may have called for their collection. A student knows statistics more intimately as a subject of study like economics, mathematics, chemistry, physics, and others. It is a discipline, which scientifically deals with data, and is often described as the science of data. In dealing with statistics as data, statistics has developed appropriate methods of collecting, presenting, summarizing, and analysing data, and thus consists of a body of these methods.

1.2 MEANING AND DEFINITIONS OF STATISTICS

In the beginning, it may be noted that the word ‘statistics’ is used rather curiously in two senses plural and singular. In the plural sense, it refers to a set of figures or data. In the singular sense, statistics refers to the whole body of tools that are used to collect data, organise and interpret them and, finally, to draw conclusions from them. It should be noted that both the aspects of statistics are important if the quantitative data are to serve their purpose. If statistics, as a subject, is inadequate and consists of poor methodology, we could not know the right procedure to extract from the data the information they contain. Similarly, if our data are defective or that they are inadequate or inaccurate, we could not reach the right conclusions even though our subject is well developed. A.L. Bowley has defined statistics as: (i) statistics is the science of counting, (ii) Statistics may rightly be called the science of averages, and (iii) statistics is the science of measurement of social organism regarded as a whole in all its mani-

(iii) Statistics must be reasonably accurate. Wrong figures, if analysed, will lead to erroneous conclusions. Hence, it is necessary that conclusions must be based on accurate figures. (iv) Statistics must be collected in a systematic manner. If data are collected in a haphazard manner, they will not be reliable and will lead to misleading conclusions. (v) Collected in a systematic manner for a pre-determined purpose (vi) Lastly, Statistics should be placed in relation to each other. If one collects data unrelated to each other, then such data will be confusing and will not lead to any logical conclusions. Data should be comparable over time and over space.

1.3 TYPES OF DATA AND DATA SOURCES

Statistical data are the basic raw material of statistics. Data may relate to an activity of our interest, a phenomenon, or a problem situation under study. They derive as a result of the process of measuring, counting and/or observing. Statistical data, therefore, refer to those aspects of a problem situation that can be measured, quantified, counted, or classified. Any object subject phenomenon, or activity that generates data through this process is termed as a variable. In other words, a variable is one that shows a degree of variability when successive measurements are recorded. In statistics, data are classified into two broad categories: quantitative data and qualitative data. This classification is based on the kind of characteristics that are measured. Quantitative data are those that can be quantified in definite units of measurement. These refer to characteristics whose successive measurements yield quantifiable observations. Depending on the nature of the variable observed for measurement, quantitative data can be further categorized as continuous and discrete data.

Obviously, a variable may be a continuous variable or a discrete variable. (i) Continuous data represent the numerical values of a continuous variable. A continuous variable is the one that can assume any value between any two points on a line segment, thus representing an interval of values. The values are quite precise and close to each other, yet distinguishably different. All characteristics such as weight, length, height, thickness, velocity, temperature, tensile strength, etc., represent continuous variables. Thus, the data recorded on these and similar other characteristics are called continuous data. It may be noted that a continuous variable assumes the finest unit of measurement. Finest in the sense that it enables measurements to the maximum degree of precision. (ii) Discrete data are the values assumed by a discrete variable. A discrete variable is the one whose outcomes are measured in fixed numbers. Such data are essentially count data. These are derived from a process of counting, such as the number of items possessing or not possessing a certain characteristic. The number of customers visiting a departmental store everyday, the incoming flights at an airport, and the defective items in a consignment received for sale, are all examples of discrete data. Qualitative data refer to qualitative characteristics of a subject or an object. A characteristic is qualitative in nature when its observations are defined and noted in terms of the presence or absence of a certain attribute in discrete numbers. These data are further classified as nominal and rank data. (i) Nominal data are the outcome of classification into two or more categories of items or units comprising a sample or a population according to some quality characteristic. Classification of students according to sex (as males and

simplifying data, which are otherwise quite unwieldy and voluminous. It seeks to achieve this in a manner that meaningful conclusions can be readily drawn from the data. Descriptive statistics may thus be seen as comprising methods of bringing out and highlighting the latent characteristics present in a set of numerical data. It not only facilitates an understanding of the data and systematic reporting thereof in a manner; and also makes them amenable to further discussion, analysis, and interpretations. The first step in any scientific inquiry is to collect data relevant to the problem in hand. When the inquiry relates to physical and/or biological sciences, data collection is normally an integral part of the experiment itself. In fact, the very manner in which an experiment is designed, determines the kind of data it would require and/or generate. The problem of identifying the nature and the kind of the relevant data is thus automatically resolved as soon as the design of experiment is finalized. It is possible in the case of physical sciences. In the case of social sciences, where the required data are often collected through a questionnaire from a number of carefully selected respondents, the problem is not that simply resolved. For one thing, designing the questionnaire itself is a critical initial problem. For another, the number of respondents to be accessed for data collection and the criteria for selecting them has their own implications and importance for the quality of results obtained. Further, the data have been collected, these are assembled, organized, and presented in the form of appropriate tables to make them readable. Wherever needed, figures, diagrams, charts, and graphs are also used for better presentation of the data. A useful tabular and graphic presentation of data will require that the raw data be properly classified in accordance with the objectives of investigation and the relational analysis to be carried out..

A well thought-out and sharp data classification facilitates easy description of the hidden data characteristics by means of a variety of summary measures. These include measures of central tendency, dispersion, skewness, and kurtosis, which constitute the essential scope of descriptive statistics. These form a large part of the subject matter of any basic textbook on the subject, and thus they are being discussed in that order here as well. Inferential statistics , also known as inductive statistics, goes beyond describing a given problem situation by means of collecting, summarizing, and meaningfully presenting the related data. Instead, it consists of methods that are used for drawing inferences, or making broad generalizations, about a totality of observations on the basis of knowledge about a part of that totality. The totality of observations about which an inference may be drawn, or a generalization made, is called a population or a universe. The part of totality, which is observed for data collection and analysis to gain knowledge about the population, is called a sample. The desired information about a given population of our interest; may also be collected even by observing all the units comprising the population. This total coverage is called census. Getting the desired value for the population through census is not always feasible and practical for various reasons. Apart from time and money considerations making the census operations prohibitive, observing each individual unit of the population with reference to any data characteristic may at times involve even destructive testing. In such cases, obviously, the only recourse available is to employ the partial or incomplete information gathered through a sample for the purpose. This is precisely what inferential statistics does. Thus, obtaining a particular value from the sample information and using it for drawing an inference about the entire population underlies the subject matter of inferential statistics. Consider a

degree of reliability of these inferences.

1.5 SCOPE OF STATISTICS

Apart from the methods comprising the scope of descriptive and inferential branches of statistics, statistics also consists of methods of dealing with a few other issues of specific nature. Since these methods are essentially descriptive in nature, they have been discussed here as part of the descriptive statistics. These are mainly concerned with the following: (i) It often becomes necessary to examine how two paired data sets are related. For example, we may have data on the sales of a product and the expenditure incurred on its advertisement for a specified number of years. Given that sales and advertisement expenditure are related to each other, it is useful to examine the nature of relationship between the two and quantify the degree of that relationship. As this requires use of appropriate statistical methods, these falls under the purview of what we call regression and correlation analysis. (ii) Situations occur quite often when we require averaging (or totalling) of data on prices and/or quantities expressed in different units of measurement. For example, price of cloth may be quoted per meter of length and that of wheat per kilogram of weight. Since ordinary methods of totalling and averaging do not apply to such price/quantity data, special techniques needed for the purpose are developed under index numbers. (iii) Many a time, it becomes necessary to examine the past performance of an activity with a view to determining its future behaviour. For example, when engaged in the production of a commodity, monthly product sales are an important measure of evaluating performance. This requires compilation and analysis of relevant sales data over time. The more complex the activity, the

more varied the data requirements. For profit maximising and future sales planning, forecast of likely sales growth rate is crucial. This needs careful collection and analysis of past sales data. All such concerns are taken care of under time series analysis. (iv) Obtaining the most likely future estimates on any aspect(s) relating to a business or economic activity has indeed been engaging the minds of all concerned. This is particularly important when it relates to product sales and demand, which serve the necessary basis of production scheduling and planning. The regression, correlation, and time series analyses together help develop the basic methodology to do the needful. Thus, the study of methods and techniques of obtaining the likely estimates on business/economic variables comprises the scope of what we do under business forecasting. Keeping in view the importance of inferential statistics, the scope of statistics may finally be restated as consisting of statistical methods which facilitate decision-- making under conditions of uncertainty. While the term statistical methods is often used to cover the subject of statistics as a whole, in particular it refers to methods by which statistical data are analysed, interpreted, and the inferences drawn for decision- making. Though generic in nature and versatile in their applications, statistical methods have come to be widely used, especially in all matters concerning business and economics. These are also being increasingly used in biology, medicine, agriculture, psychology, and education. The scope of application of these methods has started opening and expanding in a number of social science disciplines as well. Even a political scientist finds them of increasing relevance for examining the political behaviour and it is, of course, no surprise to find even historians statistical data, for history is essentially past

production, inspection, packaging and shipping, sales and complaints, inventory and maintenance, costs, management control, industrial engineering and research. Statistical problems arising in the course of business operations are multitudinous. As such, one may do no more than highlight some of the more important ones to emphasis the relevance of statistics to the business world. In the sphere of production, for example, statistics can be useful in various ways. Statistical quality control methods are used to ensure the production of quality goods. Identifying and rejecting defective or substandard goods achieve this. The sale targets can be fixed on the basis of sale forecasts, which are done by using varying methods of forecasting. Analysis of sales affected against the targets set earlier would indicate the deficiency in achievement, which may be on account of several causes: (i) targets were too high and unrealistic (ii) salesmen's performance has been poor (iii) emergence of increase in competition (iv) poor quality of company's product, and so on. These factors can be further investigated. Another sphere in business where statistical methods can be used is personnel management. Here, one is concerned with the fixation of wage rates, incentive norms and performance appraisal of individual employee. The concept of productivity is very relevant here. On the basis of measurement of productivity, the productivity bonus is awarded to the workers. Comparisons of wages and productivity are undertaken in order to ensure increases in industrial productivity. Statistical methods could also be used to ascertain the efficacy of a certain product, say, medicine. For example, a pharmaceutical company has developed a new medicine in the treatment of bronchial asthma. Before launching it on commercial basis, it wants to ascertain the effectiveness of this medicine. It undertakes an experimentation involving the formation of two comparable groups of asthma

patients. One group is given this new medicine for a specified period and the other one is treated with the usual medicines. Records are maintained for the two groups for the specified period. This record is then analysed to ascertain if there is any significant difference in the recovery of the two groups. If the difference is really significant statistically, the new medicine is commercially launched.

1.7 LIMITATIONS OF STATISTICS

Statistics has a number of limitations, pertinent among them are as follows: (i) There are certain phenomena or concepts where statistics cannot be used. This is because these phenomena or concepts are not amenable to measurement. For example, beauty, intelligence, courage cannot be quantified. Statistics has no place in all such cases where quantification is not possible. (ii) Statistics reveal the average behaviour, the normal or the general trend. An application of the 'average' concept if applied to an individual or a particular situation may lead to a wrong conclusion and sometimes may be disastrous. For example, one may be misguided when told that the average depth of a river from one bank to the other is four feet, when there may be some points in between where its depth is far more than four feet. On this understanding, one may enter those points having greater depth, which may be hazardous. (iii) Since statistics are collected for a particular purpose, such data may not be relevant or useful in other situations or cases. For example, secondary data (i.e., data originally collected by someone else) may not be useful for the other person. (iv) Statistics are not 100 per cent precise as is Mathematics or Accountancy. Those who use statistics should be aware of this limitation.

(ii) Defective data: Another misuse is that sometimes one gives defective data. This may be done knowingly in order to defend one's position or to prove a particular point. This apart, the definition used to denote a certain phenomenon may be defective. For example, in case of data relating to unem- ployed persons, the definition may include even those who are employed, though partially. The question here is how far it is justified to include partially employed persons amongst unemployed ones. (iii) Unrepresentative sample: In statistics, several times one has to conduct a survey, which necessitates to choose a sample from the given population or universe. The sample may turn out to be unrepresentative of the universe. One may choose a sample just on the basis of convenience. He may collect the desired information from either his friends or nearby respondents in his neighbourhood even though such respondents do not constitute a representative sample. (iv) Inadequate sample: Earlier, we have seen that a sample that is unrepresentative of the universe is a major misuse of statistics. This apart, at times one may conduct a survey based on an extremely inadequate sample. For example, in a city we may find that there are 1, 00,000 households. When we have to conduct a household survey, we may take a sample of merely 100 households comprising only 0.1 per cent of the universe. A survey based on such a small sample may not yield right information. (v) Unfair Comparisons: An important misuse of statistics is making unfair comparisons from the data collected. For instance, one may construct an index of production choosing the base year where the production was much less. Then he may compare the subsequent year's production from this low base.

Such a comparison will undoubtedly give a rosy picture of the production though in reality it is not so. Another source of unfair comparisons could be when one makes absolute comparisons instead of relative ones. An absolute comparison of two figures, say, of production or export, may show a good increase, but in relative terms it may turnout to be very negligible. Another example of unfair comparison is when the population in two cities is different, but a comparison of overall death rates and deaths by a particular disease is attempted. Such a comparison is wrong. Likewise, when data are not properly classified or when changes in the composition of population in the two years are not taken into consideration, comparisons of such data would be unfair as they would lead to misleading conclusions. (vi) Unwanted conclusions: Another misuse of statistics may be on account of unwarranted conclusions. This may be as a result of making false assumptions. For example, while making projections of population in the next five years, one may assume a lower rate of growth though the past two years indicate otherwise. Sometimes one may not be sure about the changes in business environment in the near future. In such a case, one may use an assumption that may turn out to be wrong. Another source of unwarranted conclusion may be the use of wrong average. Suppose in a series there are extreme values, one is too high while the other is too low, such as 800 and 50. The use of an arithmetic average in such a case may give a wrong idea. Instead, harmonic mean would be proper in such a case. (vii) Confusion of correlation and causation: In statistics, several times one has to examine the relationship between two variables. A close relationship between the two variables may not establish a cause-and-effect-relationship in the sense that one

1.10 Different Methods of collecting Primary Data: Text, Tabular and graphical Methods of Data Types of Data, Important Sources of Secondary Data; Collection and Presentation of Data: presentation; Frequency Distribution, Diagrammatic Presentation of Frequency data 1.10.a Types of Data, Important Sources of Secondary Data; Collection and Presentation of Data: Different Methods of collecting Primary Data Data can be defined as a collection of facts or information from which conclusions may be drawn. Data may be qualitative or quantitative. Once we know the difference between them, we can know how to use them. Qualitative Data: They represent some characteristics or attributes. They depict descriptions that may be observed but cannot be computed or calculated. For example, data on attributes such as intelligence, honesty, wisdom, cleanliness, and creativity collected using the students of your class conclusive in nature. a sample would be classified as qualitative. They are more exploratory than Quantitative Data: These can be measured and not simply observed. They can be numerically represented and calculations can be performed on them. For example, data on the number of students playing different sports from your class gives an estimate of how many of the total students play which sport. This information is numerical and can be classified as quantitative. Discrete Data: These are data that can take only certain specific values rather than a range of values. For example, data on the blood group of a certain population or on their genders is termed as discrete data. A usual way to represent this is using bar charts. Continuous Data: and lowest values. The difference between the highest and lowest value is called the range of These are data that can take values between a certain range with the highest data. For example, the age of persons can take values even in decimals or so is the case of the height and weights of the students of your school. These are classified as continuous data. Continuous data can be tabulated in what is called a frequency distribution. They can be graphically represented using histograms.

Depending on the source, it can classify as primary data or secondary data. Let us take a look at them both. Primary Data: These are the data that are collected for the first time by an investigator for a specific purpose. Primary data are ‘pure’ in the sense that no statistical operations have been performed on them and they are original. An example of primary data is the Census of India. Secondary Data: collected it. This means that this kind of data has already been col They are the data that are sourced from someplacelected by some researchers or that has originally investigators in the past and is available either in published or unpublished form. This information is impure as statistical operations may have been performed on them already. An example is information available on the Government of India, Department of Finance’s website or in other repositories, books, journals, etc. Collection of Primary Data Primary data is collected in the course of doing experimental or descriptive research by doing experiments, performing surveys or by observation or direct communication with respondents. Several methods for collecting primary data are given below:

1. Obs It is commonly used in studies relating to behavioural science. Under this method observation ervation Method becomes a scientific tool and the method of data collection for the researcher, when it serves a formulated research purpose and is systematically planned and subjected to checks and controls. (a) Structured (descriptive) and Unstructured (exploratory) observation : When a observation is characterized by careful definition of units to be observed, style of observer, conditions for observation and selection of pertinent data of observation it is a structured observation. When there characteristics are not thought of in advance or not present it is a unstructured observation. (b) Participant, Non-participant and Disguised observation: When the observer observes by making himself more or less, the member of the group he is observing, it is participant observation but when the observer observes by detaching him from the group under observation it is non participant observation. If the observer observes in such a manner that his presence is unknown to the people he is observing it is disguised observation. (c) Controlled (laboratory) and Uncontrolled (exploratory) observation: If the observation takes place in the natural setting it is a uncontrolled observation but when observer takes place according to some pre-arranged plans, involving experimental procedure it is a controlled observation. Advantages  Subjective bias is eliminated  Data is not affected by past behaviour or future intentions  Natural behaviour of the group can be recorded Limitations  Expensive methodology  Information provided is limited  Unforeseen factors may interfere with the observational task 2. Interview Method This method of collecting data involves presentation of oral verbal stimuli and reply in terms of oral - verbal responses. It can be achieved by two ways: (A) Personal Interview: It requires a person known as interviewer to ask questions generally in a face to face contact to the other person. It can be: Direct personal investigation: The interviewer has to collect the information personally from the services concerned. Indirect oral examination: The interviewer has to cross examine other persons who are suppose to have a knowledge about the problem. Structured Interviews: Interviews involving the use of pre- determined questions and of highly standard techniques of recording. Unstructured interviews: characterized by flexibility of approach to questioning. It does not follow a system of pre-determined questions and is