Data Analysis Lecture: Evaluating and Comparing Data Sets - Prof. Benjamin Kerr, Study notes of Ecology and Environment

A lecture outline for a data analysis class focusing on evaluating and comparing data sets. Topics include processing and organizing data using excel, visualizing data through graphs and charts, analyzing data through t-tests and chi-square tests, and understanding the assumptions and limitations of these statistical tests. The lecture also includes exercises and demos to help students gain practical experience.

Typology: Study notes

Pre 2010

Uploaded on 03/18/2009

koofers-user-tza
koofers-user-tza 🇺🇸

10 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Data Analysis
Data Analysis
E3: Lab Lecture
Suppose your hear that a high-protein diet during puberty leads to
an increased height as an adult.
- The mean height in a high protein tr eatment was 5’11” and the mean
height in a control treatment was 5’5”
- What would you feed your kids? How do you gauge this?
The New York Times has just done an expose about sexism in
graduate admissions in a famous department of mathematics
- While the number of male and f emale applicants was equal, the
number of males admitted was greater.
- Should an formal inquiry take plac e? How do you evaluate the data?
How do we evaluate data?
When you were a child, your father tells you he will let you stay up
late if the result of a coin he flips is heads.
- Suppose the coin comes up head s 25% of the time
- Is your Dad using a fair coin? H ow would you evaluate this?
12 3
100 25
4 1
Number of flips Number of “heads”
Control High-protein Control High-protein
a
a
Control High-
protein
e
e
frequency
Data Analysis
Data Analysis
Lecture Outline
Processing Data (using Excel)
Visualizing Data (using Excel)
Analyzing Data (using Excel)
-Difference in means (t-test)
-Difference in distributions (χ
χχ
χ
2
test)
Data Analysis
Data Analysis
Lecture Outline
Processing Data (using Excel)
Visualizing Data (using Excel)
Analyzing Data (using Excel)
-Difference in means (t-test)
-Difference in distributions (χ
χχ
χ
2
test)
Handling Data
After a laboratory experiment or ti me out in the field, you
will have several data points.
How should one process this (potenti ally voluminous) data?
- A first step to dealing with your data is to organize it
(spreadsheet programs, like Excel, can help)
- A second step is to process your data
1) Investigate portions of the data set
2) Look at relevant descriptive statistics
3) Transform data points in a well-defined way
4) Combine data points in a well-defined way
- A third step is to visualize your data
- A fourth step is to subject your data to an appropriate
statistical test
Massaging?
Dressing-up?
3F colonies
4N colonies
Fitness Across Two Environments
0
0.5
1
1.5
Tube Dish
Competition Arena
w(F,N)
*
Focusing
Worksheet
Go to our class website:
http://faculty.washington.edu/kerrb/biol481/
On the “Class Data” link, download the file labeled “Excel Practice Sheet
DEMO: Functions in Excel
GOALS
- Get comfortable using f
x
line in Excel
- Understand the
“translational” properties
of Excel
- Learn to fix rows and
columns (or both) with $
Fill in the empty boxes under the “Simple Functions” tab on the “Excel
Practic Sheet” file (you can check your work with your calculator).
pf3

Partial preview of the text

Download Data Analysis Lecture: Evaluating and Comparing Data Sets - Prof. Benjamin Kerr and more Study notes Ecology and Environment in PDF only on Docsity!

Data Analysis Data Analysis

E3: Lab Lecture

  • Suppose your hear that a high-protein diet during puberty leads to an increased height as an adult. - The mean height in a high protein treatment was 5’11” and the mean height in a control treatment was 5’5” - What would you feed your kids? How do you gauge this?
  • The New York Times has just done an expose about sexism in graduate admissions in a famous department of mathematics - While the number of male and female applicants was equal, the number of males admitted was greater. - Should an formal inquiry take place? How do you evaluate the data?

How do we evaluate data?

  • When you were a child, your father tells you he will let you stay up late if the result of a coin he flips is heads. - Suppose the coin comes up heads 25% of the time - Is your Dad using a fair coin? How would you evaluate this?

Number of flips Number of “heads”

Control High-protein Control (^) High-protein

♀a ♂a

Control High- protein

♀e ♂e

frequency

Data Analysis Data Analysis

Lecture Outline

  • Processing Data (using Excel)
  • Visualizing Data (using Excel)
  • Analyzing Data (using Excel)
    • Difference in means (t-test)
    • Difference in distributions ( χχχχ^2 test)

Data AnalysisData Analysis

Lecture Outline

  • Processing Data (using Excel)
  • Visualizing Data (using Excel)
  • Analyzing Data (using Excel)
    • Difference in means (t-test)
    • Difference in distributions ( χχχχ^2 test)

Handling Data

  • After a laboratory experiment or time out in the field, you will have several data points.
  • How should one process this (potentially voluminous) data?
    • A first step to dealing with your data is to organize it (spreadsheet programs, like Excel, can help)
    • A second step is to process your data
      1. Investigate portions of the data set
      2. Look at relevant descriptive statistics
      3. Transform data points in a well-defined way
      4. Combine data points in a well-defined way
    • A third step is to visualize your data
    • A fourth step is to subject your data to an appropriate statistical test

Massaging?

Dressing-up?

3 F colonies 4 N colonies

Fitness Across Two Environments

0

1

Tube Dish Competition Arena

w(F,N)


Focusing

Worksheet

  • Go to our class website:

http://faculty.washington.edu/kerrb/biol481/

  • On the “Class Data” link, download the file labeled “ Excel Practice Sheet
  • DEMO : Functions in Excel
• GOALS
  • Get comfortable using fx line in Excel
  • Understand the “translational” properties of Excel
  • Learn to fix rows and columns (or both) with $
  • Fill in the empty boxes under the “Simple Functions” tab on the “Excel Practic Sheet” file (you can check your work with your calculator).

Data Analysis Data Analysis

Lecture Outline

  • Processing Data (using Excel)
  • Visualizing Data (using Excel)
  • Analyzing Data (using Excel)
    • Difference in means (t-test)
    • Difference in distributions ( χχχχ^2 test)

Picture = Words ×××× 1000

Grade Distribution

A B C D E

Understanding the Black Box

0

1

0 5 10 15 20 Number of Trials

Accuracy of Prediction

.

Colony Distribution from the Luria-Delbruck Experiment

0

1 3 5 7 9 11 13 15 17 19 21 Colonies

Frequency

. Observed Expected

  • We are visual animals and often can see patterns when data is presented visually
  • Examples:
    • Pie-chart illustrates the distribution of values of a single variable
    • X-Y plot illustrates the form of the relationship between two variables
    • Paired histograms illustrate the relationship between the distributions of two variables.
  • The most appropriate picture will often depend on the data: - Categorical or quantitative? - Frequencies, counts or measurements? - Relationship between data points?

Worksheet

  • DEMO : Graphing in Excel
• GOALS
  • Get comfortable using the “Chart Wizard”
  • Picking the right graphical representation for your data
  • Labeling your axes and adding a title
  • Graph both y=erx^ and y=esx^ on the same plot (you can check your work with your calculator if it graphs).
  • Label your x axis “x” and your y axis “y” and title your graph “Exponential Growth”
  • What happens when you change the value of r from 0.1 to −−−− 0.1?

Data AnalysisData Analysis

Lecture Outline

  • Processing Data (using Excel)
  • Visualizing Data (using Excel)
  • Analyzing Data (using Excel)
    • Difference in means (t-test)
    • Difference in distributions ( χχχχ^2 test)

Student’s t-test

William Sealy Gossett

  • DEMO: Performing a t-test
    • Computing a p-value from a t-test
    • Distinguish the different types of t-tests:  Paired versus Unpaired data  Equal versus Unequal variance  One-tailed versus Two-tailed tests
  • Gossett published a paper using the pseudonym “Student” that dealt with distinguishing the differences between means of small data sets.
  • The t-test uses the statistics from two groups of data (means and s.d.) to generate a third statistic (the t statistic).
  • If the two groups of data come from populations with the same mean, the t statistic has a characteristic distribution itself (note the shape will depend on the sample sizes).
  • If the computed t is extreme, then the chance that there are equal means from the two groups is slim (this is quantified by the p-value from the test). The means are significantly different if p<0.05.
  • Assumptions
    • Each datum is independent
    • Data is normally distributed

Worksheet

10mL

5mL

t = 0

t = 0

t = 24

t = 24

TUBE I

DISH I

10mL^10 -^

10 -^

10 -3^10

-4 10

-5 10 -

10 -

10 -2^10

-3^10

-4^10 -^

10 -

1/10 (^) 1/10 1/

1/

1/

1/ 1/

1/10 1/^

1/

1/

1/

1/

(^100) μμμμ L

100 μμμμ L

F N

50 μμμμ L

(^100) μμμμ L

TUESDAYTUESDAY WEDNESDAYWEDNESDAY THURSDAYTHURSDAY

  • Click on the “Tradeoff Data” tab. From the colony counts, write functions that will give the cell counts Fb, Nb, Fe, Ne. Then write functions giving w(F,N).
  • After finding average fitnesses (use the function “AVERAGE”), graph the average fitnesses from the TUBE and DISH environments. Label your graph.
  • Perform an unpaired and paired t-test on your data. Which test should you use for this data? What can you conclude?