Business Statistics Study Notes, Study notes of Business Statistics

This document is about the basics about statistics, data, types of data etc

Typology: Study notes

2019/2020

Available from 01/20/2022

preeti-sarda
preeti-sarda 🇮🇳

3 documents

1 / 37

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Mrs. Preeti Sarda
Bachelor of Business Administration (BBA)
Paper Code: BB205
Business Statistics
UNIT 1
Statistics
Statistics is the study of the collection, analysis, interpretation, presentation, and
organization of data. Statistics is a branch of mathematics that transforms data into useful
information for decision makers.
Statistics is the science of data: The Scientific Method
Formulate a theory
Collect data to test the theory
Analyze the results
Interpret the results, and make decisions
A.L. Bowley has defined statistics as:
(i) Statistics is the science of counting,
(ii) Statistics may rightly be called the science of averages, and
(iii) Statistics is the science of measurement of social organism regarded as a
whole in all its manifestations
Boddington defined as: Statistics is the science of estimates and probabilities.
Agresti & Finlay,1997: Statistics consists of a body of methods for collecting and
analyzing data.
Statistics is much more than just the tabulation of numbers and the graphical presentation of
these tabulated numbers. Statistics is the science of gaining information from numerical and
categorical1 data.
Statistical methods can be used to find answers to the questions like:
• What kind and how much data need to be collected?
• How should we organize and summarize the data?
• How can we analyze the data and draw conclusions from it?
• How can we assess the strength of the conclusions and evaluate their uncertainty?
That is, statistics provides methods for:
1. Design: Planning and carrying out research studies.
2. Description: Summarizing and exploring data.
3. Inference: Making predictions and generalizing about phenomena represented by the
data.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25

Partial preview of the text

Download Business Statistics Study Notes and more Study notes Business Statistics in PDF only on Docsity!

Bachelor of Business Administration (BBA)

Paper Code: BB

Business Statistics

UNIT 1

Statistics

Statistics is the study of the collection, analysis, interpretation, presentation, and organization of data. Statistics is a branch of mathematics that transforms data into useful information for decision makers. Statistics is the science of data: The Scientific Method  Formulate a theory  Collect data to test the theory  Analyze the results  Interpret the results, and make decisions

A.L. Bowley has defined statistics as: (i) Statistics is the science of counting, (ii) Statistics may rightly be called the science of averages, and (iii) Statistics is the science of measurement of social organism regarded as a whole in all its manifestations Boddington defined as: Statistics is the science of estimates and probabilities. Agresti & Finlay,1997: Statistics consists of a body of methods for collecting and analyzing data.

Statistics is much more than just the tabulation of numbers and the graphical presentation of these tabulated numbers. Statistics is the science of gaining information from numerical and categorical1 data.

Statistical methods can be used to find answers to the questions like:

  • What kind and how much data need to be collected?
    • How should we organize and summarize the data?
  • How can we analyze the data and draw conclusions from it?
  • How can we assess the strength of the conclusions and evaluate their uncertainty?

That is, statistics provides methods for:

  1. Design : Planning and carrying out research studies.
    1. Description: Summarizing and exploring data.
  2. Inference: Making predictions and generalizing about phenomena represented by the data.

Example Statistics in practice: Consider the following problems:

  • Agricultural Problem: Is new grain seed or fertilizer more productive?
  • Medical Problem: What is the right amount of dosage of drug to treatment?
  • Political Science: How accurate are the gall ups and opinion polls?
  • Economics: What will be the unemployment rate next year?
  • Technical Problem : How to improve quality of product?

Population and Sample

Population and sample are two basic concepts of statistics.

Population can be characterized as the set of individual persons or objects in which an investigator is primarily interested during his or her research problem. Sometimes wanted measurements for all individuals in the population are obtained, but often only a set of individuals of that population are observed; such a set of individuals constitutes a sample. Population is the collection of all individuals or items under consideration in a statistical study. (Weiss, 1999) Sample is that part of the population from which information is collected. (Weiss, 1999)

Example Finite population: In many cases the population under consideration is one which could be physically listed. For example: – The students of the University of Tampere,

  • The books in a library. Example Hypothetical population: Also in many cases the population is much more abstract and may arise from the phenomenon under consideration. Consider e.g. a factory producing light bulbs. If the factory keeps using the same equipment, raw materials and methods of production also in future then the bulbs that will be produced in factory constitute a hypothetical population. That is, sample of light bulbs

to make more thorough analysis of the subject under investigation. Furthermore, the preliminary descriptive analysis of a sample often reveals features that lead to the choice of the appropriate inferential method to be later used. Sometimes it is possible to collect the data from the whole population. In that case it is possible to perform a descriptive study on the population as well as usually on the sample. Only when an inference is made about the population based on information obtained from the sample does the study become inferential.

Characteristics of statistics

The major characteristics of statistics: (i) Statistics are the aggregates of facts. It means a single figure is not statistics. For example, national income of a country for a single year is not statistics but the same for two or more years is statistics. (ii) Statistics are affected by a number of factors. For example, sale of a product depends on a number of factors such as its price, quality, competition, the income of the consumers, and so on. (iii)Statistics must be reasonably accurate. Wrong figures, if analysed, will lead to erroneous conclusions. Hence, it is necessary that conclusions must be based on accurate figures. (iv) Statistics must be collected in a systematic manner. If data are collected in a haphazard manner, they will not be reliable and will lead to misleading conclusions. (v) Collected in a systematic manner for a pre-determined purpose (vi) Lastly, Statistics should be placed in relation to each other. If one collects data unrelated to each other, then such data will be confusing and will not lead to any logical conclusions. Data should be comparable over time and over space.

Scope of statistics

Apart from the methods comprising the scope of descriptive and inferential branches of statistics, statistics also consists of methods of dealing with a few other issues of specific nature. Since these methods are essentially descriptive in nature, they have been discussed here as part of the descriptive statistics. These are mainly concerned with the following: (i) It often becomes necessary to examine how two paired data sets are related. For example, we may have data on the sales of a product and the expenditure incurred on its advertisement for a specified number of years. Given that sales and advertisement expenditure are related to each other, it is useful to examine the nature of relationship between the two and quantify the degree of that relationship. As this requires use of appropriate statistical methods, these falls under the purview of what we call regression and correlation analysis. (ii) Situations occur quite often when we require averaging (or totaling) of data on prices and/or quantities expressed in different units of measurement. For example, price of cloth may be quoted per meter of length and that of wheat per kilogram of weight. Since ordinary methods of totaling and averaging do not apply to such price/quantity data, special techniques needed for the purpose are developed under index numbers.

(iii) Many a time, it becomes necessary to examine the past performance of an activity with a view to determining its future behavior. For example, when engaged in the production of a commodity, monthly product sales are an important measure of evaluating performance. This requires compilation and analysis of relevant sales data over time. The more complex the activity, the 11 more varied the data requirements. For profit maximizing and future sales planning, forecast of likely sales growth rate is crucial. This needs careful collection and analysis of past sales data. All such concerns are taken care of under time series analysis. (iv) Obtaining the most likely future estimates on any aspect(s) relating to a business or economic activity has indeed been engaging the minds of all concerned. This is particularly important when it relates to product sales and demand, which serve the necessary basis of production scheduling and planning. The regression, correlation, and time series analyses together help develop the basic methodology to do the needful. Thus, the study of methods and techniques of obtaining the likely estimates on business/economic variables comprises the scope of what we do under business forecasting.

Importance of statistics in business

There are three major functions in any business enterprise in which the statistical methods are useful. These are as follows: i) The planning of operations: This may relate to either special projects or to the recurring activities of a firm over a specified period. ii) The setting up of standards: This may relate to the size of employment, volume of sales, fixation of quality norms for the manufactured product, norms for the daily output, and so forth. iii) The function of control: This involves comparison of actual production achieved against the norm or target set earlier. In case the production has fallen short of the target, it gives remedial measures so that such a deficiency does not occur again.

Different authors have highlighted the importance of Statistics in business. For instance, Croxton and Cowden give numerous uses of Statistics in business such as project planning, budgetary planning and control, inventory planning and control, quality control, marketing, production and personnel administration. Within these also they have specified certain areas where Statistics is very relevant. Another author, Irwing W. Burr , dealing with the place of statistics in an industrial organisation, specifies a number of areas where statistics is extremely useful. These are: customer wants and market research, development design and specification, purchasing, production, inspection, packaging and shipping, sales and complaints, inventory and maintenance, costs, management control, industrial engineering and research.

Classification condenses the data by dropping out unnecessary details. It facilitates comparison between different sets of data clearly showing the different points of agreement and disagreement. It enables us to study the relationship between several characteristics and make further statistical treatment like tabulation, etc.

Eg. During population census, people in the country are classified according to sex (males/ females), marital status (married/unmarried), place of residence (rural/urban), Age (0– 5 years, 6– 10 years, 11–15 years, etc.), profession (agriculture, production, commerce, transport, doctor, others), residence in states (West Bengal, Bihar, Mumbai, Delhi, etc.), etc.

Data: Data refers to the observations of variables. There are some key terms that we need to know before we can properly examine data. The objective of statistics is to extract information from data.

Types of data : Data can either be categorical/qualitative or numerical/quantitative

  1. Categorical (Qualitative) data – refers to when observations fall into categories
  2. Numerical (Quantitative) data - refers to when observations are real numbers

Measurements of data:

Nominal level: Nominal data are items that are differentiated by a naming system. The names refer to different characteristics that something can take on. Examples are things such as eye color, countries or names of people. Data at the nominal level are qualitative. It does not make sense to calculate something like the mean or standard deviation of nominal data.

Ordinal level : Ordinal data are data that have an order (nominal data do not). For example, placement in a race - first, second, et. - has an order, but no meaning can be given to the difference in the placements; i.e. the placements cannot be used to answer the question of ‟how much more?‟

Interval level: In the case of interval scaled data, the idea of difference does have meaning, but it does not have starting point. The most commonly cited example is temperature. 30◦C is 20◦C warmer than 10◦C, but 0 ◦C does not mean that there is no temperature.

Ratio level: For data at the ratio level the difference between two values makes sense and there is a starting point. For example between 10 and 30km there is a difference of 20km and the idea of 0km has meaning in the sense that it is the absence of distance.

Nominal is the lowest level. Only names are meaningful here.Ordinal adds an order to the names.Interval adds meaningful differencesRatio adds a zero so that ratios are meaningful.

Data Array

 Data: Numbers or measurements that are collected as a result of observations.  Array: An array is a systematic arrangement of objects, usually in rows and columns.  Data Array: Observations that are systematically arranged. An arrangement of data in ascending or descending order is called an array.

Modes of Classification

There are four types of classification, viz., (i) Qualitative; (ii) Quantitative; (iii) Temporal and (iv) Spatial (i) Qualitative classification: It is done according to attributes or non-measurable characteristics; like social status, sex, nationality, occupation, etc. For example, the population of the whole country can be classified into four categories as married, unmarried, widowed and divorced. When only one attribute, e.g., sex, is used for classification, it is called simple classification. When more than one attributes, e.g., deafness, sex and religion, are used for classification, it is called manifold classification. (ii) Quantitative classification: It is done according to numerical size like weights in kg or heights in cm. Here we classify the data by assigning arbitrary limits known as class- limits. The quantitative phenomenon under study is called a variable. For example, the population of the whole country may be classified according to different variables like age, income, wage, price, etc. Hence this classification is often called „classification by variables‟. (a) Variable: A variable in statistics means any measurable characteristic or quantity which can assume a range of numerical values within certain limits, e.g., income, height, age, weight, wage, price, etc. A variable can be classified as either discrete or continuous. (1) Discrete variable: A variable which can take up only exact values and not any fractional values, is called a „discrete‟ variable. Number of workmen in a factory, members of a family, students in a class, number of births in a certain year, number of telephone calls in a month, etc., are examples of discrete-variable.

via an online poll. For the survey method, the response rate for the survey is crucial, as a low response rate can destroy the validity of any conclusion resulting from the statistical analysis. Some Advantages of using Primary data :

  1. The investigator collects data specific to the problem under study.
  2. There is no doubt about the quality of the data collected (for the investigator).
  3. If required, it may be possible to obtain additional data during the study period. Some Disadvantages of using Primary data (for reluctant/ uninterested investigators) :
  4. The investigator has to contend with all the hassles of data collection-  Deciding why, what, how, when to collect  Getting the data collected (personally or through others)  Getting funding and dealing with funding agencies  Ethical considerations (consent, permissions, etc.)
  5. Ensuring the data collected is of a high standard-  All desired data is obtained accurately, and in the format it is required in  There is no fake/ cooked up data  Unnecessary/ useless data has not been included

Secondary Data: Secondary data refers to information that has already been collected by some other person or organization Examples include:-  Data collection from books  Newspaper

Published data Published data refers to data from secondary sources, and the data is readily available. This method of collecting data is preferred due to its convenience, relatively low cost, and its reliability (assuming it‟s been collected by a reputable organization). However when using secondary data, care needs to be taken as errors may have been introduced as a result of a false transcription or a misinterpretation. Some Advantages of using Secondary data:

  1. The data‟s already there- no hassles of data collection
  2. It is less expensive
  3. The investigator is not personally responsible for the quality of data (“I didn‟t do it”) Some disadvantages of using Secondary data:
  4. The investigator cannot decide what is collected (if specific data about something is required, for instance).
  5. One can only hope that the data is of good quality
  6. Obtaining additional data (or even clarification) about something is not possible (most often).

Frequency Distribution

If the value of a variable, e.g., height, weight, etc. (continuous), number of students in a class, readings of a taxi-meter (discrete) etc., occurs twice or more in a given series of observations, then the number of occurrence of the value is termed as the “frequency” of that value. The way of tabulating a pool of data of a variable and their respective frequencies side by side is called a „frequency distribution‟ of those data. Croxton and Cowden defined frequency distribution as “a statistical table which shows the sets of all distinct values of the variable arranged in order of magnitude, either individually or in groups, with their corresponding frequencies side by side”.  Frequency: It is the number of observations following in some class.  Frequency Distribution: It is a listing of classes and their frequencies.

Classes:  There is no thumb rule about the number of classes, but roughly it is suggested that the number of classes should be between 8 to 15.  Inclusive Classification: It is used for the classes like 55-57, 58-60, ……, 68-70. Both end points of the classes are included in the respective classes.  Exclusive Classification: It is used for the classes like 10-20, 20-30,……50-60. The 2 nd^ end point is not included in the respective classes.

Several Important Terms

(a) Class-limits: The maximum and minimum values of a class-interval are called upper class limit and lower class-limit respectively (b) Class-mark, or, Mid-value: The class-mark, or, mid-value of the class-interval lies exactly at the middle of the class-interval and is given by:

(c) Class boundaries: Class boundaries are the true-limits of a class interval. It is associated with grouped frequency distribution, where there is a gap between the upper class-limit and the lower class-limit of the next class. This can be determined by using the formula:

Common width of a class-interval = difference between two successive upper Class-limits (or, two successive lower class-limits) (when the class-intervals have equal widths) = difference between two successive upper class-boundaries (or, two successive lower class boundaries) = difference between two successive class marks, or, mid values

where, N = total no. of observations in the data (Formula suggested by M.A. Sturges)

Frequency Distribution Methods  Entry Table: By listing the actual observations  Tally Sheet: By using a tally column

Types of Frequency Distributions

Frequency distribution is divided into several kinds also due to nature of raw data. Much useful information can be inferred from the frequency distribution table; therefore, frequency distribution table can be presented in proper and useful manner. Following are the various types of frequency distribution:

1. Frequency Distribution for Discrete Data The class limits in discrete data are the true class limits and there will be no class boundaries because discrete data are not in fractions. For example; following figures represents number of children born to 50 women in a certain locality up to the age of 40 years.

The following Table shows the frequency distribution table for discrete data, taking

the class interval size of 1.

 - 0 // Number of Children Tally Marks Number of Women - 1 // - 2 //// - 3 //// - 4 //// /// - 5 //// //// - 6 //// / - 7 //// - 8 /// - 9 //// 
  • 10 /
  • Total

4. Relative Cumulative Frequency Distribution The cumulative frequency of a class divided by the total frequency is called relative cumulative frequency. It is also called percentage cumulative frequency since it is expressed in percentage. The table showing relative cumulative frequencies is called the relative cumulative frequency distribution or percentage cumulative frequency distribution.

Presentation of Statistical Data

Statistical data can be presented in three different ways: (a) Textual presentation, (b) Tabular presentation, and (c) Graphical presentation.

(a)Textual presentation: This is a descriptive form. The following is an example of such a presentation of data about

deaths from industrial diseases in Great Britain in 1935–39 and 1940– 44.

Example : Numerical data with regard to industrial diseases and deaths there form in Great

Britain during the years 1935–39 and 1940–44 are given in a descriptive form: “During the

quinquennium 1935–39, there were in Great Britain 1, 775 cases of industrial diseases made up of 677 cases of lead poisoning, 111 of other poisoning, 144 of anthrax, and 843 of

gassing. The number of deaths reported was 20 p.c. of the cases for all the four diseases

taken together, that for lead poisoning was 135, for other poisoning 25 and that for anthrax

was 30.

During the next quin quennium, 1940–44, the total number of cases reported was 2,

  1. But lead poisoning cases reported fell by 351 and anthrax cases by 35. Other poisoning

cases increased by 784 between the two periods. The number of deaths reported decreased

by 45 for lead poisoning, but decreased only by 2 for anthrax from the pre-war to the post- war quinquennium. In the later period, 52 deaths were reported for poisoning other than

lead poisoning. The total number of deaths reported in 1940–44 including those from

gassing was 64 greater than in 1935–39”.

The disadvantages of textual presentation are: (i) It is too lengthy; (ii) There is repetition of words; (iii) Comparisons cannot be made easily; (iv) It is difficult to get an idea and take appropriate action.

(b) Tabular presentation, or, Tabulation: “The process of arranging data into rows and columns is called tabulation” A table is a systematic arrangement of data into vertical column and horizontal rows. Tabulation of data on population of a country can by classified on the basis of religion, gender or marital status. Tabulation may be simple, double, triple or complex depending on the nature of classification, which is being used by the statistician. Tabulation may be defined as the systematic presentation of numerical data in rows or/and columns according to certain characteristics. It expresses the data in concise and attractive form which can be easily understood and used to compare numerical figures.

The descriptive form of Example has been condensed below in the form of a Table.

Types of Tabulation In general, the tabulation is classified in two parts, that is a simple tabulation, and a complex tabulation. Simple tabulation, gives information regarding one or more independent questions. Complex tabulation gives information regarding two mutually dependent questions. ONE-WAY TABLE DIVISION

Karachi Hyderabad Sukkur

POPULATION (Millions)

This table gives us information regarding one characteristic information about the population in different divisions of Sindh.

All questions that can be answered in ONE WAY TABLE are independent of each other. It is therefore an example of a simple tabulation, since the information obtained in it is regarding one independent question, that is the number of persons in various divisions of Sindh in millions.

Two-Way Table These types of table give information regarding two mutually dependent questions. For example, question is, how many millions of the persons are in the Divisions; the One-Way Table will give the answer. But if we want to know that in the population number, who are in the majority, male, or female. The Two-Way Tables will answer the question by giving the column for female and male. Thus the table showing the real picture of divisions sex wise is as under: TWO-WAY TABLE DIVISION POPULATION (Millions) Male Female Total Karachi Hyderabad Sukkur

Three-Way Table Three-Way Table gives information regarding three mutually dependent and inter-related questions. For example, from one-way table, we get information about population, and from two-way table, we get information about the number of male and female available in various divisions. Now we can extend the same table to a three way table, by putting a question, “How many male and female are literate?” Thus the collected statistical data will show the following, three mutually dependent and inter-related questions:

  1. Population in various division.
  2. Their sex-wise distribution.
  3. Their position of literacy.

THREE-WAY TABLE DIVISION POPULATION (Millions) Male Female Total

Karachi Hyderabad Sukkur

Literate Illiterate Total Literate Illiterate Total Literate Illiterate Total

This table gives information concerning the literacy of both male and female in various divisions of Sindh. From the table we can explain the sex which has more education in relation to division, and also, we can say whether literacy is low in rural areas than in urban areas.

Higher Order Tables Higher order tables are those which provide information about a large number of inter related questions. Higher order tables may be of four-way, five-way, six-way etc. Such kind of tables are called manifold tables.

Objectives of Tabulation

The main objectives of tabulation are stated below: (i) To carry out investigation; (ii) To do comparison; (iii) To locate omissions and errors in the data; (iv) To use space economically; (v) To study the trend; (vi) To simplify data; (vii) To use it as future reference.

Sorting: Sorting of data is the last process of tabulation. It is a time-consuming process

when the data is too large.

After classification the data may be sorted using either of the following methods: (i) Manual method: Here the sorting is done by hand by giving tally marks for the number of times each event has occurred. Next the total tally marks are counted. The method is simple and suitable for limited data. (ii) Mechanical and electrical method: To reduce the sorting time mechanical devices may be used. This is described as mechanical tabulation. For electrical tabulation data should be codified first and then punched on card. For each data a separate card is used. The punched cards are checked by a machine called „verifier‟. Next the cards are sorted out into different groups as desired by a machine called „sorter‟. Finally, the tabulation is done by using a tabulator. The same card may be sorted out more than once for completing tables under different titles.