Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

experiment design Scientific method in one minute, Study notes of Design

Design

In our experiment no one surpassed 25 wpm after several hours of practice.” • Gain further insights, stimulate thinking and creativity. Back to our example. • ...

Typology: Study notes

2022/2023

Uploaded on 03/01/2023

carlick 🇺🇸

4.2

(11)

276 documents

1 / 10

This page cannot be seen from the preview

Don't miss anything!

Per Ola Kristensson

Research Methods

M.Phil. Advanced Computer Science

University of Cambridge

Michaelmas Term, 2009

Experimenting: experiment design Scientific method in one minute

1. Use experience and observations to

gain insight about a phenomenon

2. Construct a hypothesis

3. Use hypothesis to predict outcomes

4. Test hypothesis by experimenting

5. Analyse outcome of experiment

6. Go back to step 1

Typical computer science scenario

• A particular task needs to be solved by a

software system

• This task is currently solved by an existing

system (a baseline)

• You propose a new, in your opinion, better

system

• You argue why your proposed system is

better than the baseline

• You support your arguments by providing

evidence that your system indeed beats the

baseline

Running example in this lecture

•Text entry on a

Tablet PC

A. Handwriting

recognition

B. Software

keyboard

Partial preview of the text

Download experiment design Scientific method in one minute and more Study notes Design in PDF only on Docsity!

Per Ola Kristensson

Research Methods

M.Phil. Advanced Computer Science

University of Cambridge

Michaelmas Term, 2009

Experimenting: experiment design Scientific method in one minute

1. Use experience and observations to

gain insight about a phenomenon

2. Construct a hypothesis

3. Use hypothesis to predict outcomes

4. Test hypothesis by experimenting

5. Analyse outcome of experiment

6. Go back to step 1

Typical computer science scenario

A particular task needs to be solved by a

software system

This task is currently solved by an existing

system (a baseline)

You propose a new, in your opinion, better

system

You argue why your proposed system is

better than the baseline

You support your arguments by providing

evidence that your system indeed beats the

baseline

Running example in this lecture

• Text entry on a

Tablet PC

A. Handwriting

recognition

B. Software

keyboard

Why experiments?

Substantiate claims
- A research paper needs to provide evidence to convince other researchers of the paper’s main points
Strengthen or falsify hypotheses
- “My system/technique/algorithm is [in some aspect] better than previously published systems/techniques/algorithms”
Evaluate and improve/revise/reject models
- “The published model predicts users will type at 80 wpm on average after 40 minutes of practice with a thumb keyboard. In our experiment no one surpassed 25 wpm after several hours of practice.”
Gain further insights, stimulate thinking and creativity

Back to our example

Why this experiment?
- Despite decades of research there is no empirical data of text entry performance of handwriting recognition
- An inappropriate study of handwriting (sans recognition) from 1967 keeps getting cited in the literature, often through secondary or tertiary sources (handbooks, etc.)
- Based on these numerous citations in research papers, handwriting recognition is perceived to be rather slow
- However, there is no empirical evidence that supports this claim

Different kinds of experiments

• Surveys

• Field studies

• Simulations and computational

experiments

• Controlled experiments

• … and quasi-experiments, and many

more…

Controlled experiments and hypotheses

A controlled experiment tests the validity of

one or more hypothesis

Here we will consider the simplest case:
- One method vs. another method
- Each method is referred to as a condition
The null hypothesis H0 states there is no

difference between the conditions

Our hypothesis H1 states there is a

difference between the conditions

To show a statistically significant difference

the null hypothesis H0 needs to be rejected

Experimental design

• Dependent and independent variables

• Within-subjects vs. between-subjects

• Mixed designs

• Single session vs. longitudinal

experiments

Dependent and independent variables

• Dependent variable:

What is measured
Typical examples (in CS): time, accuracy,

memory usage

• Independent variable

What is manipulated
Typical examples (in CS): the system used

by participants, feedback to participant

(e.g. a beep versus a visual flash)

Deciding what to manipulate and what

to measure

• This is a key issue in research

• Boils down to your hypothesis:

What do you believe?
How can you substantiate your claim by

making measures?

What can you measure?
Is it possible to protect internal validity

without sacrificing external validity?

Our example

• We let participants write phrases using

either:

Software keyboard (baseline)
Handwriting recognition
That is, we manipulate the input method

• We measure:

Entry rate in words-per-minute
Error rate in number of written characters

that do not match the stimulus

Between-subjects design

Each participant is exposed to only one

condition

One of the simplest experimental designs
Advantages:
- No risk of confounds or skill-transfer from one condition to the other
- Therefore no need to do counter-balancing or check for asymmetrical skill-transfer effects
Disadvantages:
- Variance is not controlled within the participant
- Therefore demands more participants than a within-subjects design to show a statistically significant difference

Within-subjects design

Each participant is exposed to all conditions
One of the most common experimental

designs in practice

Advantages:
- Variance is controlled within the participant
- Therefore requires fewer participants than a between-subjects design
Disadvantages:
- More involved, requires counter-balancing of start condition to avoid transfer effects
- Risk of asymmetrical skill transfer

Mixed designs

It is also possible to combine within- and

between-subjects experimental designs

Such designs are called mixed designs
These are difficult to design because they

are more difficult to control

A mixed design can be a symptom of no

clear set of hypotheses, or lack of ability to

prioritise among them

Often a mixed design can be broken down

into smaller studies that study isolated

phenomena separately

Single session vs. longitudinal

• Do you believe participants will

improve significantly over time?

• If so, how much will they improve?

• How are previous related studies set

up in the literature?

Participants, our example

We recruited 12 volunteers from the university campus. We intentionally wanted a rather broad sample and recruited participants from many different departments with many different backgrounds. Six were men and six were women. Their ages ranged between 22-37 (mean = 27, sd = 4). Participants were screened for dyslexia and repetitive strain injury (RSI). Seven participants were native English speakers and five participants had English as their second language. No participant had used a handwriting recognition interface before. One participant had used a software keyboard before. No participant had regularly used a software keyboard before. Participants were compensated £10 per session.

Participants, our example

We recruited 12 volunteers from the university campus. We intentionally wanted a rather broad sample and recruited participants from many different departments with many different backgrounds. Six were men and six were women. Their ages ranged between 22-37 (mean = 27, sd = 4). Participants were screened for dyslexia and repetitive strain injury (RSI). Seven participants were native English speakers and five participants had English as their second language. No participant had used a handwriting recognition interface before. One participant had used a software keyboard before. No participant had regularly used a software keyboard before. Participants were compensated £10 per session.

Apparatus

Which equipment and which software?
- Needs to be described in sufficient detail to enable other researchers to replicate your experiment
Typical information:
- Physical and logical screen size
- Sensor device characteristics
- CPU clock speed
- Computer brand/model
Choices that are not obvious need to be

motivated

Apparatus, our example

We used a Dell Latitude XT Tablet PC running Windows Vista Service Pack 1. The 12.1" color touch-screen had a resolution of 1280 × 800 pixels and a physical screen size of 261 × 163 mm. Participants used a capacitance-based pen to write directly onto the screen in both conditions.

Both the handwriting recognizer and the software keyboard were docked to the lower part of the screen. The dimensions of the software keyboard were 1266 × 244 pixels and 257 × 50 mm. The dimensions of the handwriting recognizer writing area measured 1266 × 264 pixels and 257 × 55 mm.

Apparatus, our example

We used a Dell Latitude XT Tablet PC running Windows Vista Service Pack 1. The 12.1" color touch-screen had a resolution of 1280 × 800 pixels and a physical screen size of 261 × 163 mm. Participants used a capacitance-based pen to write directly onto the screen in both conditions.

Both the handwriting recognizer and the software keyboard were docked to the lower part of the screen. The dimensions of the software keyboard were 1266 × 244 pixels and 257 × 50 mm. The dimensions of the handwriting recognizer writing area measured 1266 × 264 pixels and 257 × 55 mm.

Apparatus, motivating your choices

The handwriting recognizer was configured to learn and adapt to participants’ handwriting style (the default setting on Windows Vista). Each participant performed the experiment in a separate user account on the machine to ensure handwriting adaptation was carried out on an individual basis. There was a potential confound in enabling handwriting adaptation since it caused the system, as well as the user, to learn as a function of usage. In the interest of external validity we enabled adaptation since in actual use users would most likely have adaptation turned on.

Apparatus, motivating your choices

The handwriting recognizer was configured to learn and adapt to participants’ handwriting style (the default setting on Windows Vista). Each participant performed the experiment in a separate user account on the machine to ensure handwriting adaptation was carried out on an individual basis. There was a potential confound in enabling handwriting adaptation since it caused the system, as well as the user, to learn as a function of usage. In the interest of external validity we enabled adaptation since in actual use users would most likely have adaptation turned on.

Procedure

• Describes how the experiment was

carried out

• Needs to be described in sufficient

detail for other researchers to be able

to replicate your experiment

• Again, choices need to be motivated

After the experiment

• Results

• Limitations and implications

Our example

●●

● ●

●

● ● ●

●

Session

Entry rate (wpm) ● ●

●

● ●

●

● ● ●

●

1 2 3 4 5 6 7 8 9 10

● (^) Software keyboard Handwriting recognition

Summary

A well-designed controlled experiment provides you empirical evidence that your new method is better [in some aspects] than some previous method in the literature (a baseline)
Important to consider the experimental design

early

Within vs. between
Dependent and independent variables
Internal and external validity
Pilot study often a good idea (perhaps your method has a fatal flaw)
Important to point out limitations and implications
Experiments must be reproducible

experiment design Scientific method in one minute, Study notes of Design

Related documents

Partial preview of the text

Download experiment design Scientific method in one minute and more Study notes Design in PDF only on Docsity!

Experimenting: experiment design Scientific method in one minute

1. Use experience and observations to

gain insight about a phenomenon

2. Construct a hypothesis

3. Use hypothesis to predict outcomes

4. Test hypothesis by experimenting

5. Analyse outcome of experiment

6. Go back to step 1

Typical computer science scenario

software system

system (a baseline)

system

better than the baseline

evidence that your system indeed beats the

baseline

Running example in this lecture

• Text entry on a

Tablet PC

A. Handwriting

recognition

B. Software

keyboard

Why experiments?

Back to our example

Different kinds of experiments

• Surveys

• Field studies

• Simulations and computational

experiments

• Controlled experiments

• … and quasi-experiments, and many

more…

Controlled experiments and hypotheses

one or more hypothesis

difference between the conditions

difference between the conditions

the null hypothesis H0 needs to be rejected

Experimental design

• Dependent and independent variables

• Within-subjects vs. between-subjects

• Mixed designs

• Single session vs. longitudinal

experiments

Dependent and independent variables

• Dependent variable:

memory usage

• Independent variable

by participants, feedback to participant

(e.g. a beep versus a visual flash)

Deciding what to manipulate and what

to measure

• This is a key issue in research

• Boils down to your hypothesis:

making measures?

without sacrificing external validity?

Our example

• We let participants write phrases using

either:

• We measure:

that do not match the stimulus

Between-subjects design

condition

Within-subjects design

designs in practice

Mixed designs

between-subjects experimental designs

are more difficult to control

clear set of hypotheses, or lack of ability to

prioritise among them

into smaller studies that study isolated

phenomena separately

Single session vs. longitudinal

• Do you believe participants will

improve significantly over time?

• If so, how much will they improve?

• How are previous related studies set