Descriptive Statistics: Population and Sampling, Exams of Probability and Statistics

An introduction to descriptive statistics in the context of research, focusing on the importance of identifying the population, variables, and sampling methods. It covers the distinction between qualitative and quantitative variables, the concept of a sample, and the significance of previous research. The document also explains random sampling and its representation of the population, as well as the importance of sample size and the use of visual displays for data presentation.

Typology: Exams

Pre 2010

Uploaded on 08/08/2009

koofers-user-x6r
koofers-user-x6r 🇺🇸

10 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Stat Handout 1 Descriptive Statistics Math 382
Any good research question should contain at least two important elements:
Who is being studied?
What characteristic is being investigated?
To get started, we need to define a few important statistical terms.
Population: The entire group of individuals or objects that we are interested in studying.
One or more characteristics may be required to describe a population, depending on how
specific the population is. Examples:
A. All adults in the U.S.
B. All registered voters in the U.S.
C. All female registered voters in the U.S.
D. All female registered voters who live in New Mexico.
E. All female registered voters who live in New Mexico who are aged 65 and
over
F. All female registered voters who live in Socorro, New Mexico who are
aged 65 and over.
Variable: A characteristic of the population that we wish to investigate. Examples of
variables include the following: height, weight, opinion on a political issue, gender, etc.
There are two types of variables: Qualitative and Quantitative.
A. Qualitative variable is any variable that involves a non-numerical characteristic
of the individual or object. Examples: gender, political affiliation, hair color,
opinion (agree/disagree).
B. Quantitative variable is a numerically meaningful characteristic of the individual
or object. Typically involves a measurement, or a count taken on the individual.
Examples: height, weight, test score, number of children, etc.
Sample: A subset of individuals selected from the population we are interested in
studying. Examples of samples from each of the above populations:
A. 2000 adults in the U.S.
B. 1000 registered voters in the U.S.
C. 1200 female registered voters in the U.S.
D. 200 female registered voters who live in New Mexico.
E. 500 female registered voters who live in New Mexico who are aged 65
and over
F. 100 female registered voters who live in Socorro, New Mexico who are
aged 65 and over.
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Descriptive Statistics: Population and Sampling and more Exams Probability and Statistics in PDF only on Docsity!

Any good research question should contain at least two important elements:

  • Who is being studied?
  • What characteristic is being investigated?

To get started, we need to define a few important statistical terms.

Population : The entire group of individuals or objects that we are interested in studying. One or more characteristics may be required to describe a population, depending on how specific the population is. Examples:

A. All adults in the U.S. B. All registered voters in the U.S. C. All female registered voters in the U.S. D. All female registered voters who live in New Mexico. E. All female registered voters who live in New Mexico who are aged 65 and over F. All female registered voters who live in Socorro, New Mexico who are aged 65 and over.

Variable : A characteristic of the population that we wish to investigate. Examples of variables include the following: height, weight, opinion on a political issue, gender, etc. There are two types of variables: Qualitative and Quantitative.

A. Qualitative variable is any variable that involves a non-numerical characteristic of the individual or object. Examples: gender, political affiliation, hair color, opinion (agree/disagree). B. Quantitative variable is a numerically meaningful characteristic of the individual or object. Typically involves a measurement, or a count taken on the individual. Examples: height, weight, test score, number of children, etc.

Sample : A subset of individuals selected from the population we are interested in studying. Examples of samples from each of the above populations:

A. 2000 adults in the U.S. B. 1000 registered voters in the U.S. C. 1200 female registered voters in the U.S. D. 200 female registered voters who live in New Mexico. E. 500 female registered voters who live in New Mexico who are aged 65 and over F. 100 female registered voters who live in Socorro, New Mexico who are aged 65 and over.

Where Do We Begin when Determining a Research Question?

Note that a good research question asks us to be very specific about who and what we are studying. But how do we get to the point where we can be this specific? Researchers typically start by conducting an initial exploration.

  1. What do we already know about this area?
  2. What does previous research suggest?
  3. If there is no previous research, then an initial, general exploration must be conducted by the researcher before a specific research question can be developed.

For example:

  1. Before researchers determine which plants are useful for medicinal purposes, they first had to discover that some plants can be useful for medicinal purposes.
  2. When a researcher plans a study of how various additives increase the strength of material, he should try different additives, or maybe use the existing knowledge of which ones were helpful in the past.

Random Sampling

A simple random sample of measurements of size, say n, from a population is a subset of the population in which all individuals or items in the population have an equal chance of being selected.

How can we select a simple random sample?

  • Imagine that each individual in the population put their name in a big basket. o Note if we had objects instead of people, we could give each one a label with an ID number on it and put the ID numbers in the basket.
  • Taking a random sample is like drawing numbers at random from the basket. Note: you could also use a random number generator or a table of random numbers.

Why is a random sample a representative sample?

  • Because all individuals in the population have an equal chance of being selected for the sample, regardless of what their characteristics might be. o For instance, if 25% of the population is over 60 years old, then about 25% of the sample will be over 60. If 30% of the population has seen a flying saucer, then about 30% of the sample will have seen a flying saucer.
  • Of course, the exact percentage of individuals in the sample who have various characteristics will not be exactly the same as the population because of what is called sampling variability. However, "on average" a random sample will look pretty much like the population.

Organizing Data

The data collected from a qualitative variable is called qualitative data. The data collected from a quantitative variable is called quantitative data. We organize qualitative and quantitative data somewhat differently, because qualitative data comes to us in categories (with no specific order, like male and female) and quantitative data comes to us as measurements (numbers whose order has meaning.)

Organizing Qualitative Data

Suppose John has conducted a survey of the faculty at a local junior college. He selected a random sample of 81 faculty members from this college and asked them questions regarding: what their department is, degree, rank when hired, masters degree when hired, age, age when hired, gender and salary.

Since qualitative variables involve "natural" categories or groups. The organization of this data is usually straight forward. Simply list each category and the count the number of individuals in each category. This table is called a frequency table. (Note, you can report the frequency (count), the relative frequency (count/total number sampled) or percent frequency.

For example, in John’s data set variable is Degree is qualitative: the categories are bachelors, masters, master+30hr, and Ph.D.

The frequency table for the variable "Degree" is shown below for John's data:

Count of Degree Degree Total bachelors 17 masters 48 masters+30hr 10 Ph.D 6

A frequency or relative frequency table is one way to present a visual display in table form. There are a number of other visual displays that can be used. Two of the most common visual displays in graphical form are pie charts and bar charts.

John created a pie chart of faculty Degrees from the above table, see below. Note that the pie represents the entire group; it is divided into sections proportionate to the number of individuals in each group.

Degree

21%

60%

12%

7%

bachelors masters masters+30hr Ph.D

John also created a bar chart to present a visual display of faculty Degrees, see below.

0

10

20

30

40

50

60

bachelors masters masters+30hr Ph.D Degree

Count

Organizing Quantitative Data

Quantitative data involves measurements (or quantities .) To organize quantitative data, the first step is to create a list of categories or groups and count the number of individuals in each group. While quantitative data can be ordered from smallest to largest, the cut-off values used to determine the groups is up to us. The frequency table for the variable "Salary" is shown below for John's data:

John made a histogram of the salary data:

Histogram for Salary

0

5

10

15

20

25

19000 24000 29000 34000 39000 44000 More Salary

Count

The salary groups appear along one axis (in this case the horizontal axis) and the frequencies/counts appear along the other axis (in this case the vertical axis). The tick marks along the salary axis correspond to the cut-off (boundaries) for each salary group. The height of each bar reflects the number of faculty member in each salary group.

The overall pattern or shape of the histogram is one characteristic that is important to include when summarizing quantitative data.

How do we describe the pattern of a histogram?

If you cut the histogram in half and the halves are approximate mirror images of each other, we call the shape symmetric.

  • If it has a mound in the middle, and tails that look the same on each side, we call it bell-shaped.
  • If there is a mound of data off to the left, with a tail leading off to the right, the data is skewed right.
  • If the mound is to the right, with a tail leading off to the left, the data is skewed left.
  • Data that does not have a mound to it at all does not have any special name.

Note: Bell-shaped data is also symmetric. Skewed data is not symmetric.

When evaluating a visual data display, there are three main issues to consider:

  1. Does it achieve the intended purpose?
  2. Is it effective?
  1. Is it fair?

Before creating your own visual display, be sure to think about these three questions.

There should be a purpose for creating the visual display: either a point is being made or a question is being answered. If the purpose is to help answer a research question, be sure that the display matches the research question.

What constitutes an effective visual display? The most important element is clarity. The reader should quickly and easily see the point that is being made.

A fair visual display does not misrepresent the information to the reader. There are many examples of visual displays that mislead the reader, either unknowingly or purposefully.

Descriptive Statistics

Descriptive Statistics are numbers that summarize important characteristics of a sample. They give us a quick and easy overview of the data. They are used to give estimates of the population parameters: the mean μ, the standard deviation σ, proportion p, etc.

For quantitative data, some measure of center is an important characteristic to report.

There are two measures of center to consider:

1. The sample mean, ∑

=

n

i

xi n

x 1

is the average of all the values in the data set.

  1. The median is the numerical value that divides the data set in half (after the data has been ordered from smallest to largest).

The amount of variability is another characteristic that is important to include when summarizing quantitative data. It is a descriptive statistic that represents the amount of variety or spread that we see among the data.

Two measure of variability we might consider are:

  1. The range is the difference between the maximum and minimum values in the data set. (This is a very crude measure of variability.)
  2. The standard deviation, 1

1

2

= n

x x s

n

i

i is the "average distance from the

mean." This is by far the most common measure of variability.

Other descriptive statistics that we might include in our summary would be the minimum and maximum values in the data set, the first quartile, the third quartile, etc.