

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Qualitative data (also called Categorical or Attribute data) can be separated into different categories and don't use numbers. Examples: Grade (A, B, C, D, F), ...
Typology: Study notes
1 / 3
This page cannot be seen from the preview
Don't miss anything!


Chapter 1
Key Ideas Terms: Data (Quantitative vs. Qualitative), Discrete vs. Continuous Data, Statistics, Population, Census, Sample, Parameter, Statistic, Observational Study vs. Experiment Bias: Voluntary Response, Small Samples, Misleading Graphs/Percentages, Loaded Questions, Nonresponse, Missing Data, Correlation vs. Causality, Self-Interest, Precision, Partial Pictures Obs. Studies: Cross-Sectional Study, Retrospective (Case-Control) Study, Prospective (Cohort) Study Experiments: Confounding Variables, Single/Double Blind, Blocking, Randomization, Replication Sampling: Convenience, Simple Random, Stratified, Cluster, Systematic, Multistage, Sampling vs. Nonsampling Error
Section 1-1: Overview Why Statistics? Before beginning the study of statistics, it is important to understand why it is even needed in the first place. We need statistics because we want to know something about the world, but because of random processes, we can only make educated guesses. For example, let’s say you want to start a new organization on campus, but you are not sure how much of the student body would even be interested in it. You ask some people who live on your residence hall floor, and they seem to be excited about it. However, when you ask some people in your classes, they don’t like the idea. It would be great if you could find a way to ask everyone on campus if they would be interested in your organization. Unfortunately, this would take too much time. This is where statistics comes in. Even though you can’t know for sure what percentage of the population likes your idea, you could find a way to get a small group of people together who represent most of the variation in the student body (in terms of race, gender, sexual orientation, economic status, political affiliation, etc.). Then this group of people should be a rough microcosm of the entire campus, and their responses to your questions should generally match the campus as well. This is how statistics is used. We want to know something about the world that is either too difficult or impossible to observe. Therefore, we use logical thinking and mathematical principles (often it’s common sense) to make an educated guess about what the true value is. The good thing about statistics is that we can even quantify how accurate and precise that measurement is (e.g. margin of error in polling). Since statistics is not tied to any one application, it is used in any situation where something is uncertain (business, medicine, aeronautics, physics, politics, athletics, weather forecasting, and so forth).
Here is some basic terminology we’ll be using in class:
Section 1-2: Types of Data More terminology:
A note about the word “data”: A lot of people don’t realize it, but data is plural for datum. So although people often say things like “the data shows that…”, they should really be saying “the data show that…”. Just thought you’d like to know.
Data comes in two flavors: Quantitative and Qualitative.
You can probably already see that it can be difficult to stick some variables into certain data types without knowing the context. If you take something like “Age,” it could actually be in any of these groups. If Age is measured in years, but decimals are allowed (e.g. someone could be 14.236 years old), then it is continuous. If it is rounded off, then it is discrete. Furthermore, if you are doing a study on infants, say, and you only consider Age 0 (newborn) and 1 (one year old), then it could be thought of as qualitative data, since you have 2 categories. The point here is that these are just labels for data, and nothing is set in stone. You can use whatever terms you want to describe the data as long as you can make an argument for why the data should fall into that group.
Section 1-3: Critical Thinking One general method for finding something out about a population is to take a sample and use it to make an inference. This is done by first determining the appropriate statistic for that sample, and then using it as an estimate of the parameter in question. However, unless that sample is large enough and representative of the population, the estimate can be very inaccurate. Furthermore, when the results of a study are displayed, there are often ways of exaggerating or dampening the relationships between groups that lead to misleading conclusions. The textbook has a lot of great examples concerning these issues, which are paraphrased below. Bias, by the way, is just a way of saying things are flawed, or skewed in one direction.
Sampling Biases Voluntary Response Samples ( Self-Selected Samples ) – This happens when, instead of the researcher drawing a sample from the population, the people are choosing to be included in the sample. This often happens with internet polls on popular sports or political websites. What ends up happening is that only people who care enough about the issue (and visit that site) end up responding. This gives results that are incredibly biased.
Small Samples – If a sample is too small, the estimates of the parameter are less accurate. As an extreme example, suppose on election day, someone wants to see which president received more votes by giving an exit poll and seeing who people voted for. However, they only give the poll to two people. That isn’t very likely to be accurate, is it?
Loaded Questions – Sometimes, the way a question is worded can change the way people answer it. Surveys about controversial topics are often places where this occurs. Asking “Do you support a woman’s right to choose an abortion” is a lot different than asking “Do you support aborting a fetus which would otherwise grow into a healthy human being?” A researcher must be careful to present questions in a way that does not lead someone to a particular response. Similarly, writing questions in a particular order may influence how people respond as well.
Nonresponse – This occurs when someone is selected as part of a sample, but gives incomplete information. Most nationwide polls have very high nonresponse rates, and this can often be a problem if the nonresponse is a result of the survey design. It could be the case that most of the nonresponders had something in common, which isn’t reflected in the results of the study.
Missing Data – This is the result of nonresponse, and it can result from survey design (low income people may not report their yearly salary) or random chance (in a phone survey, someone may not answer their phone). There is an entire field of statistics concerning how to deal with missing data.
Misleading and/or Erroneous Results Misleading Graphs – The scale on graphs can often be increased or decreased to make differences appear larger or smaller than they actually are. This is very common, because most people look at the pictures when they read about studies, and not the numbers reported.
Percentages – Percentages are often misused in a variety of contexts. In my personal experience, I once saw a sign at a clothing store claiming that clothes that were currently on sale for 25% off would now by discounted by an additional 50%, for a total of 75% off. This isn’t true. It’s really just a little over 60% off, since the 50% off doesn’t include the 25% already discounted before.
Correlation vs. Causality – Often, people make the mistake of assuming that since two variables are correlated (increasing one increases the other as well), that one must cause the other. This is not usually the case, however. Consider a fictional study where someone records the amount of ice cream eaten each month, as well as the number of drowning deaths. These two variables would be highly correlated. As ice cream sales increased, so would drowning. It is clear that eating ice cream does not cause drowning, however. This happens because both of the variables are controlled by a third variable, which is time of year. In the summer, the days are hotter – more people eat ice cream and go swimming. In the winter, no one eats ice cream or goes swimming.
Precise Numbers – When very specific numbers are stated in results, the tendency is for the reader to assume there is a reason the number is not rounded off. Often they assume this reason is that the number is more accurate. However, this is not the case. It is still just an estimate, even though it does not appear to be that way.
Partial Pictures – If a television company told you “All of our TVs sold in the last 50 years are still in operation,” you would think the company made incredibly durable TVs. However, what would you think if that company had only been making and selling TVs for 6 months? While the claim is technically true, it isn’t as fantastic as it is made out to be.
Section 1-4: Design of Experiments There are two broad categories of studies to collect data: Observational Study – Data are collected by observing characteristics of subjects, but they are not modified in any way. Experiment – Data are collected by modifying subjects (by applying a treatment) and observing its effects.