Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Data Terminology, Types, and Sampling Methods: A Comprehensive Guide, Exercises of Business Statistics

University of Economics Ho Chi Minh City (UEH)Business Statistics

A comprehensive overview of data terminology, types, and sampling methods. It covers key concepts such as variables, data sets, categorical and numerical data, time series and cross-sectional data, levels of measurement, and different sampling techniques. The document also includes examples and explanations to illustrate these concepts.

Typology: Exercises

2023/2024

Uploaded on 10/09/2024

huong-trinh-nguyet 🇻🇳

1 document

1 / 55

This page cannot be seen from the preview

Don't miss anything!

Chapter 2

2.1 VARIABLES AND DATA

Data Terminology

▪ An observation: a single member of a collection of items that we want to study,

such as a person, firm, or region.

Ex: an employee, or an invoice mailed last month

▪ A variable: a characteristic about the items that we want to study (e.g., student

name, Gender, DOB).

Ex: an employee’s income or an invoice amount.

▪ Data set: all the values of all of the variables for all of the observations we chose.

Data usually are entered into a spreadsheet or database as an n X m matrix

Categorical and Numerical Data

A data set may contain a mixture of data types. Two broad categories:

• Categorical (qualitative) data: values that are described by words rather than

numbers - nonnumerical values - Verbal label. Values of the categorical variable

might be represented using numbers - Coded

• Numerical (quantitative) data: arise from counting, measuring something, or some

kind of mathematical operation. Two types: Discrete (integers), Continuous

(physical measurements, financial variables)

Time Series Data and Cross-Sectional Data

• Time series Data: observation in the sample represents a different equally spaced

point in time (years, months, days). The periodicity is the time between observations.

→ trends and patterns over time

Ex: a firm’s sales, market share, debt/ equity ratio, employee absenteeism, inventory

turnover, and product quality ratings

Discover Exercises of Business Statistics University of Economics Ho Chi Minh City (UEH)

Partial preview of the text

Download Data Terminology, Types, and Sampling Methods: A Comprehensive Guide and more Exercises Business Statistics in PDF only on Docsity!

Chapter 2 2.1 VARIABLES AND DATA

Data Terminology

▪ An observation: a single member of a collection of items that we want to study, such as a person, firm, or region. Ex: an employee, or an invoice mailed last month ▪ A variable: a characteristic about the items that we want to study (e.g., student name, Gender, DOB). Ex: an employee’s income or an invoice amount. ▪ Data set: all the values of all of the variables for all of the observations we chose. Data usually are entered into a spreadsheet or database as an n X m matrix

Categorical and Numerical Data

A data set may contain a mixture of data types. Two broad categories:

Categorical (qualitative) data : values that are described by words rather than numbers - nonnumerical values - Verbal label. Values of the categorical variable might be represented using numbers - Coded
Numerical (quantitative) data: arise from counting, measuring something, or some kind of mathematical operation. Two types: Discrete (integers) , Continuous (physical measurements, financial variables)

Time Series Data and Cross-Sectional Data

Time series Data: observation in the sample represents a different equally spaced point in time (years, months, days). The periodicity is the time between observations. → trends and patterns over time Ex: a firm’s sales, market share, debt/ equity ratio, employee absenteeism, inventory turnover, and product quality ratings

Cross-sectional Data: observation represents a different individual unit (e.g., a person, firm, geographic area) at the same point in time. → variation among observations and relationships Ex: daily closing prices of a group of 20 stocks recorded on December 1, 2015. Combine the two data types to get pooled cross-sectional and time series data. Ex: monthly unemployment rates for the 13 Canadian provinces or territories for the last 6 0 months 2.2 LEVEL OF MEASUREMENT Four levels of measurement for data: nominal, ordinal, interval, and ratio.

Nominal Measurement

Nominal data: the weakest level of measurement and the easiest to recognize, identify a category. “Nominal” data are the same as “qualitative”, “categorical” or “classification” data. The only permissible mathematical operations are counting (e.g., frequencies). ➔ No ordering

Ordinal Measurement

Ordinal data codes connote – imply - a ranking of data values. There is no clear meaning to the distance between. Like nominal data, ordinary data lack the properties that are required to compute many statistics, such as the average. Ordinal data can be treated as nominal, but not vice versa. ➔ Ordering, but differences have no meaning.

Rule of Thumb: A population may be treated as infinite when the population size N is at least 20 times the sample size n (i.e., when N/n ≥ 20)

Parameters and Statistics

A parameter is a specific characteristic of a population
A statistic is a specific characteristic of a sample From a sample of n items, chosen from a population, we compute statistics that can be used as estimates of parameters found in the population. Population mean = μ Population proportion = π Sample mean = 𝐱̅ Sample proportion = p

Target Population

The target population contains all the individuals in which we are interested
The sampling frame is the group from which we take the sample

2.4 SAMPLING METHODS Two main categories: random sampling and non-random sampling

Random Sampling Methods

Simple Random Sample

We denote the population size by N and the sample size by n. In a simple random sample, every item in the population of N items has the same chance of being chosen in the sample of n items. Ex: select one student at random from a list of 15 students Sampling without replacement : once an item has been selected to be included in the sample, it cannot be considered for the sample again. Problem when our sample size n is

close to our population size N → bias/ tendency to overestimate/ underestimate

A finite population is effectively infinite if the sample is less than 5 percent of the population (if n/N < .05) Sampling with replacement: the same random number could show up more than once. Duplicates are unlikely when n is much smaller than N

Systematic Sample

Systematic sample: choose every k th item from a sequence or list, starting from a randomly chosen entry among the first k items on the list.

Decide on sample size: n
Divide frame of N individuals into n groups of k individuals: k=N/n
Randomly select one xth^ individual from the first group
Select every xth^ individual in other groups thereafter

2.6 SURVEYS

SURVEY

Step 1: State the goals of the research
Step 2: Develop the budget (time, money, staff)
Step 3: Create a research design (target population, frame, sample size).
Step 4: Choose a survey type and method.
Step 5: Design a data collection instrument (questionnaire).
Step 6: Pretest the survey instrument and revise as needed.
Step 7: Conduct the survey.
Step 8: Code the data and analyze the data

Questionnaire Design

Begin with short, clear instructions.
State the survey purpose.
Assure anonymity.
Instruct on how to submit the completed survey.
Break survey into naturally occurring sections
Let respondents bypass sections that are not applicable (e.g., “if you answered no to question 7, skip directly to Question 15”).

Chapter 3 Graphical Presentation of Data

Data in raw form are usually not easy to use for decision making
Some type of organization like graph or table is needed
The type of graph to use depends on the variable being summarized

Tables and Charts for Categorical Data

Summary table

Column/Bar and Pie Chart

Column/Bar charts and Pie charts are often used for qualitative/categorical data (categories or nominal scale)
Pies or Bars/Columns represent categories
Height of bar or size of pie slice shows the frequency or percentage for each category

Column and Bar chart

A column chart is a vertical display of data
A bar chart is a horizontal display of data A Column chart display easier to read, but a Bar chart can be useful when the axis labels are long or when there are many categories.

For two-digit or three-digit integer data, the stem is the tens digit of the data, and the leaf is the ones digit ❖ Completed stem-and-leaf diagram: Data in ordered array: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41 Using the 100’s digit as the stem: ❖ Completed stem-and-leaf display:

Dot Plots

A dot plot is another simple graphical display of n individual values of numerical data, The basic steps in making a dot plot are to

Make a scale that covers the data range
Mark axis demarcations and label them
Plot each data value as a dot above the scale at its approximate location If more than one data value lies at approximately the same X-axis location , the dots are piled up vertically

Easy to understand
Show variability
Show the center and where the midpoint lies
Reveal some things about the shape of the distribution
Not good for large samples (e.g., > 5,000).

Dot plots have limitations.

Don’t reveal very much information about the data set’s shape when the sample is small
Become awkward when the sample is large (what if you have 100 dots at the same point?)
When have decimal data. Tabulating Numerical Data

Frequency and Cumulative Distributions

A table
Grouping n data values into k classes called bins (based on values of the data)
The bin limits are cutoff points that define each bin.
Bins have equal interval widths and their limits cannot overlap ❖ The basic steps for constructing a frequency distribution

Sort the data in ascending order ➔ Find Smallest and Largest Data Values
Choose the number of bins ➔ Sturges’ Rule: k = 1 + 3.3.log (n)
Set the bin limit:

𝐱𝐦𝐚𝐱−𝐱𝐦𝐢𝐧 𝐤

Put the data values in the appropriate bin ➔ Count the Data Values in Each Bin
Create the table. ➔ Show only the absolute frequencies or include the relative frequencies and the cumulative frequencies

Multivariate Categorical Data

Contingency table

Side-by-side bar charts

Scatter Plots A scatter plot shows n pairs of observations (x1, y1), (x2, y2),.. ., (xn, yn) as dots (or some other symbol) on an X-Y graph

Investigate the relationship between two variables → association between two variables
Convey patterns in data pairs that would not be apparent from a table.

Time Series Plot

Used to study patterns in the values of a variable over time.
One variable is measured on the X axis
The time period is measured on the Y axis.
Can display several variables at once Log Scales Useful for time series data : be expected to grow at a compound annual percentage rate (e.g., GDP, the national debt, or your future income). Reveal whether the quantity is growing at an
increasing percent (concave upward),
constant percent (straight line)
declining percent (concave downward). Deceptive Graphs Error 1: Nonzero Origin: A nonzero origin will exaggerate the trend

Error 8: Complex Graphs: Complicated visual displays make the reader work harder. Error 11: Area Trick : Simultaneously enlarging the width of the bars as their height increases → bar area misstates the true proportion

CHAPTER 4 4.1 NUMERICAL DESCRIPTION

Descriptive measures derived from:

a sample (n items): statistics
a population (N items or infinite): parameters Three key characteristics: center, variability, and shape.

4.2 MEASURES OF CENTER

Used to measure the rate of change of a variable over time 𝐗𝐆 = ( 𝐱𝟏 𝐱𝟐 𝐱𝟑 … 𝐱𝐧) 𝟏 𝐧
Geometric mean of rate of return o Measures the status of an investment over time 𝐑𝐆 = [(𝟏 + 𝐑𝟏)(𝟏 + 𝐑𝟐)(𝟏 + 𝐑𝟑) … (𝟏 + 𝐑𝐧)] 𝟏 𝐧 (^) − 𝟏 Where Ri is the rate of return in time period i SUMMARY TABLE Quartiles The quartiles ( denoted Q1, Q2, Q3 ): scale points that divide the ordered data into four groups of approximately equal size : the 25th, 50th, and 75th percentiles
Q 1 is the value for which 25% of the observations are smaller and 75% are larger
Q 2 is the same as the median (50% are smaller, 50% are larger)
Only 25% of the observations are greater than Q 3 The first quartile Q1 is the median of the data values below Q2 , and the third quartile Q3 is the median of the data values above Q

Find a quartile by determining the value in the appropriate position (𝒙𝒏) in the ordered

data

First quartile position: Position of Q 1 = (N+1)/

Second quartile position: Position of Q 2 = (N+1)/2 (the median position)

Third quartile position: Position of Q 3 = 3(N+1)/

where N is the number of observed values The value of Q1, Q2, Q3 is the value between 𝒙𝒏−𝟏 and 𝒙𝒏+𝟏 : =

Box and Whisker Plot

Box-and-Whisker Plot: A Graphical display of data using 5-number summary 𝑥𝑚𝑖𝑛 , Q1, Q2, Q3, 𝑥𝑚𝑎𝑥 A box plot shows

center (position of the median Q2)
variability ( width of the “box” defined by Q1 and Q3 and the range between 𝑥𝑚𝑖𝑛 and 𝑥𝑚𝑎𝑥)
shape ( skewness if the whiskers are of unequal length and/or if the median is not in the center of the box)

Data Terminology, Types, and Sampling Methods: A Comprehensive Guide, Exercises of Business Statistics

Related documents

Partial preview of the text

Download Data Terminology, Types, and Sampling Methods: A Comprehensive Guide and more Exercises Business Statistics in PDF only on Docsity!

Chapter 2

2.1 VARIABLES AND DATA

Data Terminology

Categorical and Numerical Data

Time Series Data and Cross-Sectional Data

Nominal Measurement

Ordinal Measurement

Parameters and Statistics

Target Population

Random Sampling Methods

Simple Random Sample

close to our population size N → bias/ tendency to overestimate/ underestimate

Systematic Sample

SURVEY

Questionnaire Design

Chapter 3

Graphical Presentation of Data

Tables and Charts for Categorical Data

Summary table

Column/Bar and Pie Chart

Column and Bar chart

Dot Plots

Frequency and Cumulative Distributions

Contingency table

Side-by-side bar charts

CHAPTER 4

4.1 NUMERICAL DESCRIPTION

4.2 MEASURES OF CENTER

Find a quartile by determining the value in the appropriate position (𝒙𝒏) in the ordered

First quartile position: Position of Q 1 = (N+1)/

Second quartile position: Position of Q 2 = (N+1)/2 (the median position)

Third quartile position: Position of Q 3 = 3(N+1)/

Box and Whisker Plot