experiment design Scientific method in one minute, Study notes of Design

In our experiment no one surpassed 25 wpm after several hours of practice.” • Gain further insights, stimulate thinking and creativity. Back to our example. • ...

Typology: Study notes

2022/2023

Uploaded on 03/01/2023

carlick
carlick 🇺🇸

4.2

(11)

276 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Per Ola Kristensson
Research Methods
M.Phil. Advanced Computer Science
University of Cambridge
Michaelmas Term, 2009
Experimenting: experiment design Scientific method in one minute
1. Use experience and observations to
gain insight about a phenomenon
2. Construct a hypothesis
3. Use hypothesis to predict outcomes
4. Test hypothesis by experimenting
5. Analyse outcome of experiment
6. Go back to step 1
Typical computer science scenario
A particular task needs to be solved by a
software system
This task is currently solved by an existing
system (a baseline)
You propose a new, in your opinion, better
system
You argue why your proposed system is
better than the baseline
You support your arguments by providing
evidence that your system indeed beats the
baseline
Running example in this lecture
•Text entry on a
Tablet PC
A. Handwriting
recognition
B. Software
keyboard
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download experiment design Scientific method in one minute and more Study notes Design in PDF only on Docsity!

Per Ola Kristensson

Research Methods

M.Phil. Advanced Computer Science

University of Cambridge

Michaelmas Term, 2009

Experimenting: experiment design Scientific method in one minute

1. Use experience and observations to

gain insight about a phenomenon

2. Construct a hypothesis

3. Use hypothesis to predict outcomes

4. Test hypothesis by experimenting

5. Analyse outcome of experiment

6. Go back to step 1

Typical computer science scenario

  • A particular task needs to be solved by a

software system

  • This task is currently solved by an existing

system (a baseline)

  • You propose a new, in your opinion, better

system

  • You argue why your proposed system is

better than the baseline

  • You support your arguments by providing

evidence that your system indeed beats the

baseline

Running example in this lecture

• Text entry on a

Tablet PC

A. Handwriting

recognition

B. Software

keyboard

Why experiments?

  • Substantiate claims
    • A research paper needs to provide evidence to convince other researchers of the paper’s main points
  • Strengthen or falsify hypotheses
    • “My system/technique/algorithm is [in some aspect] better than previously published systems/techniques/algorithms”
  • Evaluate and improve/revise/reject models
    • “The published model predicts users will type at 80 wpm on average after 40 minutes of practice with a thumb keyboard. In our experiment no one surpassed 25 wpm after several hours of practice.”
  • Gain further insights, stimulate thinking and creativity

Back to our example

  • Why this experiment?
    • Despite decades of research there is no empirical data of text entry performance of handwriting recognition
    • An inappropriate study of handwriting (sans recognition) from 1967 keeps getting cited in the literature, often through secondary or tertiary sources (handbooks, etc.)
    • Based on these numerous citations in research papers, handwriting recognition is perceived to be rather slow
    • However, there is no empirical evidence that supports this claim

Different kinds of experiments

• Surveys

• Field studies

• Simulations and computational

experiments

• Controlled experiments

• … and quasi-experiments, and many

more…

Controlled experiments and hypotheses

  • A controlled experiment tests the validity of

one or more hypothesis

  • Here we will consider the simplest case:
    • One method vs. another method
    • Each method is referred to as a condition
  • The null hypothesis H0 states there is no

difference between the conditions

  • Our hypothesis H1 states there is a

difference between the conditions

  • To show a statistically significant difference

the null hypothesis H0 needs to be rejected

Experimental design

• Dependent and independent variables

• Within-subjects vs. between-subjects

• Mixed designs

• Single session vs. longitudinal

experiments

Dependent and independent variables

• Dependent variable:

  • What is measured
  • Typical examples (in CS): time, accuracy,

memory usage

• Independent variable

  • What is manipulated
  • Typical examples (in CS): the system used

by participants, feedback to participant

(e.g. a beep versus a visual flash)

Deciding what to manipulate and what

to measure

• This is a key issue in research

• Boils down to your hypothesis:

  • What do you believe?
  • How can you substantiate your claim by

making measures?

  • What can you measure?
  • Is it possible to protect internal validity

without sacrificing external validity?

Our example

• We let participants write phrases using

either:

  • Software keyboard (baseline)
  • Handwriting recognition
  • That is, we manipulate the input method

• We measure:

  • Entry rate in words-per-minute
  • Error rate in number of written characters

that do not match the stimulus

Between-subjects design

  • Each participant is exposed to only one

condition

  • One of the simplest experimental designs
  • Advantages:
    • No risk of confounds or skill-transfer from one condition to the other
    • Therefore no need to do counter-balancing or check for asymmetrical skill-transfer effects
  • Disadvantages:
    • Variance is not controlled within the participant
    • Therefore demands more participants than a within-subjects design to show a statistically significant difference

Within-subjects design

  • Each participant is exposed to all conditions
  • One of the most common experimental

designs in practice

  • Advantages:
    • Variance is controlled within the participant
    • Therefore requires fewer participants than a between-subjects design
  • Disadvantages:
    • More involved, requires counter-balancing of start condition to avoid transfer effects
    • Risk of asymmetrical skill transfer

Mixed designs

  • It is also possible to combine within- and

between-subjects experimental designs

  • Such designs are called mixed designs
  • These are difficult to design because they

are more difficult to control

  • A mixed design can be a symptom of no

clear set of hypotheses, or lack of ability to

prioritise among them

  • Often a mixed design can be broken down

into smaller studies that study isolated

phenomena separately

Single session vs. longitudinal

• Do you believe participants will

improve significantly over time?

• If so, how much will they improve?

• How are previous related studies set

up in the literature?

Participants, our example

We recruited 12 volunteers from the university campus. We intentionally wanted a rather broad sample and recruited participants from many different departments with many different backgrounds. Six were men and six were women. Their ages ranged between 22-37 (mean = 27, sd = 4). Participants were screened for dyslexia and repetitive strain injury (RSI). Seven participants were native English speakers and five participants had English as their second language. No participant had used a handwriting recognition interface before. One participant had used a software keyboard before. No participant had regularly used a software keyboard before. Participants were compensated £10 per session.

Participants, our example

We recruited 12 volunteers from the university campus. We intentionally wanted a rather broad sample and recruited participants from many different departments with many different backgrounds. Six were men and six were women. Their ages ranged between 22-37 (mean = 27, sd = 4). Participants were screened for dyslexia and repetitive strain injury (RSI). Seven participants were native English speakers and five participants had English as their second language. No participant had used a handwriting recognition interface before. One participant had used a software keyboard before. No participant had regularly used a software keyboard before. Participants were compensated £10 per session.

Apparatus

  • Which equipment and which software?
    • Needs to be described in sufficient detail to enable other researchers to replicate your experiment
  • Typical information:
    • Physical and logical screen size
    • Sensor device characteristics
    • CPU clock speed
    • Computer brand/model
  • Choices that are not obvious need to be

motivated

Apparatus, our example

We used a Dell Latitude XT Tablet PC running Windows Vista Service Pack 1. The 12.1" color touch-screen had a resolution of 1280 × 800 pixels and a physical screen size of 261 × 163 mm. Participants used a capacitance-based pen to write directly onto the screen in both conditions.

Both the handwriting recognizer and the software keyboard were docked to the lower part of the screen. The dimensions of the software keyboard were 1266 × 244 pixels and 257 × 50 mm. The dimensions of the handwriting recognizer writing area measured 1266 × 264 pixels and 257 × 55 mm.

Apparatus, our example

We used a Dell Latitude XT Tablet PC running Windows Vista Service Pack 1. The 12.1" color touch-screen had a resolution of 1280 × 800 pixels and a physical screen size of 261 × 163 mm. Participants used a capacitance-based pen to write directly onto the screen in both conditions.

Both the handwriting recognizer and the software keyboard were docked to the lower part of the screen. The dimensions of the software keyboard were 1266 × 244 pixels and 257 × 50 mm. The dimensions of the handwriting recognizer writing area measured 1266 × 264 pixels and 257 × 55 mm.

Apparatus, motivating your choices

The handwriting recognizer was configured to learn and adapt to participants’ handwriting style (the default setting on Windows Vista). Each participant performed the experiment in a separate user account on the machine to ensure handwriting adaptation was carried out on an individual basis. There was a potential confound in enabling handwriting adaptation since it caused the system, as well as the user, to learn as a function of usage. In the interest of external validity we enabled adaptation since in actual use users would most likely have adaptation turned on.

Apparatus, motivating your choices

The handwriting recognizer was configured to learn and adapt to participants’ handwriting style (the default setting on Windows Vista). Each participant performed the experiment in a separate user account on the machine to ensure handwriting adaptation was carried out on an individual basis. There was a potential confound in enabling handwriting adaptation since it caused the system, as well as the user, to learn as a function of usage. In the interest of external validity we enabled adaptation since in actual use users would most likely have adaptation turned on.

Procedure

• Describes how the experiment was

carried out

• Needs to be described in sufficient

detail for other researchers to be able

to replicate your experiment

• Again, choices need to be motivated

After the experiment

• Results

• Limitations and implications

Our example

●●

● ●

● ●

● ● ●

15

20

25

30

Session

Entry rate (wpm) ● ●

● ●

● ● ●

1 2 3 4 5 6 7 8 9 10

15

20

25

30

1 2 3 4 5 6 7 8 9 10

● (^) Software keyboard Handwriting recognition

Summary

  • A well-designed controlled experiment provides you empirical evidence that your new method is better [in some aspects] than some previous method in the literature (a baseline)
  • Important to consider the experimental design

early

  • Within vs. between
  • Dependent and independent variables
  • Internal and external validity
  • Pilot study often a good idea (perhaps your method has a fatal flaw)
  • Important to point out limitations and implications
  • Experiments must be reproducible