User Interface Evaluation: Expert Heuristic Evaluation and Measurement Methods, Lecture notes of Human-Computer Interaction Design

to develop the interface of your design

Typology: Lecture notes

2021/2022

Uploaded on 04/07/2022

garcia-kyla-mae-a
garcia-kyla-mae-a 🇵🇭

5

(1)

10 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
MODULE No: 8
IT86A-HUMAN COMPUTER INTERACTION
ISABELA STATE UNIVERSITY ILAGAN CAMPUS
Page | 112
USER INTERACE EVALUATION
Topic:
1. Evaluation Criteria
2. Evaluation Methods
a. Focus Interview/Enactment/Observation Study
b. Expert Heuristic Evaluation
c. Measurement
d. Safety and ethics evaluation
Objective:
Discuss various methods for evaluating interface at
different phases of development process.
Employ various methods to evaluate interfaces design.
Conduct User Interface evaluation on an existing users
interface design.
Analyze possible compromises made in the interface design
base on the evaluation method use.
Introduction
The last remaining part in the cycle of UI interactive software
development is the evaluation stage. Even if the developers may
have strived to adhere to various HCI principles, guidelines, and
rules and have applied the latest toolkits and implementation
methodologies, the resulting UI or software is most probably not
problem-free. Frequently, careful considerations in interaction
and interface design may not even have been carried out in the
first place. Aside from the fact that there may be things that
the developer failed to oversee or consider, the overall
development process was to be a gradual refinement process to
begin with, where the next refinement stages would be based on
the evaluation results of the previous rounds. In this chapter,
we will present several methods and examples of evaluation for
user interfaces.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download User Interface Evaluation: Expert Heuristic Evaluation and Measurement Methods and more Lecture notes Human-Computer Interaction Design in PDF only on Docsity!

IT86A-HUMAN COMPUTER INTERACTION

Page | 112 USER INTERACE EVALUATION Topic :

1. Evaluation Criteria

2. Evaluation Methods

a. Focus Interview/Enactment/Observation Study

b. Expert Heuristic Evaluation

c. Measurement

d. Safety and ethics evaluation

Objective :

• Discuss various methods for evaluating interface at

different phases of development process.

• Employ various methods to evaluate interfaces design.

• Conduct User Interface evaluation on an existing users

interface design.

• Analyze possible compromises made in the interface design

base on the evaluation method use.

Introduction The last remaining part in the cycle of UI interactive software development is the evaluation stage. Even if the developers may have strived to adhere to various HCI principles, guidelines, and rules and have applied the latest toolkits and implementation methodologies, the resulting UI or software is most probably not problem-free. Frequently, careful considerations in interaction and interface design may not even have been carried out in the first place. Aside from the fact that there may be things that the developer failed to oversee or consider, the overall development process was to be a gradual refinement process to begin with, where the next refinement stages would be based on the evaluation results of the previous rounds. In this chapter, we will present several methods and examples of evaluation for user interfaces.

IT86A-HUMAN COMPUTER INTERACTION

Page | 113 Content

1. Evaluation Criteria When evaluating the interaction model and interface, there are largely two criteria. One is the usability and the other is user experience (UX). Simply put, usability refers to the ease of use and learnability of the user interface (we come back to UX later in this section) [1]. Usability can be measured in two ways, quantitatively or qualitatively. Quantitative assessment often involves task-performance measurements. That is, we assume that an interface is “easy to use and learn” (good usability) if the subject (or a reasonable pool of subjects) is able to show some (absolute) minimum user performance on typical application tasks. The assessment of a given new interface is better made in a comparative fashion against some nominal or conventional interface (in terms of relative performance edge). Popular choices of such performance measures are task completion time, task completion amount in a unit time (e.g., score), and task error rate. For example: Suppose we would like to test a new motion-based interface for a smartphone game. We could have a pool of subjects play the game, using both the conventional touch-based interface and also the newly proposed motion-based one. We could compare the score and assess the comparative effectiveness of the new interface. The underlying assumption is that task performance is closely correlated to the usability (ease of use and learnability). However, such an assumption is quite arguable. In other words, task-performance measures, while quantitative, only reveal the aspect of efficiency (or merely the aspect of ease of use) and not necessarily the entire usability. The aspect of learnability should be and can be assessed in a more explicit way by measuring the time and effort (e.g., memory) for users to learn the interface. The problem is that it is difficult to gather a homogeneous pool of subjects with similar backgrounds (in order to make the evaluation fair). Measuring the learnability is generally likely to introduce much more biasing factors such as differences due to educational/experiential/cultural background, age, gender, etc. Finally, quantitative

IT86A-HUMAN COMPUTER INTERACTION

Page | 115 User experience (UX) is the other important aspect of interface evaluation. There is no precise definition for UX. It is generally accepted that the notion of user experience is “total” in the sense that it is not just about the interface, but also something about the whole product/application and even extends to the product family (such as the Apple® products or MS Office). It is also deeply related to the user’s emotions and perceptions that result from the use or anticipated use of the application (through the given interface). Such an effective response is very much dependent on the context of use. Thus UX evaluation involves a more comprehensive assessment on the emotional response, under a variety of usage contexts and across a family of products/applications/interfaces (see Figure 8.3). A distinction can be made between usability methods, which have the objective of improving human performance, and user experience methods, which have the objective of improving user satisfaction by achieving both the pragmatic and hedonic goals [5]. Note that the notion of UX includes usability, i.e., high UX usually translates to high usability and high emotional attachment.

IT86A-HUMAN COMPUTER INTERACTION

Page | 116

2. Evaluation Methods Whether it is for the user experience or more narrow usability, or whether for the qualitative feelings or quantitative performance, there is a variety of evaluation methods. A given method may be general and applicable to many different situations and objectives, or it may be more specific and fitting for a particular criterion or usage situation. Overall, an evaluation method can be characterized by the following factors: ✓ Timing of analysis (e.g., throughout the application development stage: early, middle, late/after) ✓ Type and number of evaluators (e.g., several HCI experts vs. hundreds of domain users)

IT86A-HUMAN COMPUTER INTERACTION

Page | 118 Note that the interview/simulation method, due to its simplicity, can be used not only for evaluation, but also for interaction modeling and exploration of alternatives at the early design stage. In Chapters 3 and 4, we have already seen design tools such as storyboards, wireframing, and GOMS (Goals, Operators, Methods, and Selection), which can be used in conjunction with users or experts for simultaneous analysis and design. The user interviewing/observation technique, being somewhat free-form, is easy to administer but is not structured to be comprehensive. Table 8.1 summarizes the characteristics of the interview/simulation/observation approach.

IT86A-HUMAN COMPUTER INTERACTION

Page | 119 Expert Heuristic Evaluation Expert heuristic evaluation is very similar to the interview method. The difference is that the evaluators are HCI experts and that the analysis is carried out against a preprepared HCI guideline, hence the term heuristics. For instance, the guideline can be general or more specific (Chapter 2) with respect to application genre (e.g., for games), cognitive/ergonomic load, corporate UI design style (e.g., Android™ UI guideline), etc. The directions or particular themes of the heuristics are chosen by the underwriter. The following lists Nielsen’s ten general UI heuristics [8]. Note that these guidelines are almost the same as the general principles/guidelines introduced in Chapters 1 and 2 and used for interaction/interface design.

  1. Visibility of system status : The system should always keep users informed about what is going on, through appropriate feedback within reasonable time.
  2. Match between system and the real world : The system should speak the users’ language, with words, phrases, and concepts familiar to the user, rather than system- oriented terms. Follow real-world conventions, making information appear in a natural and logical order.
  3. User control and freedom : Users often choose system functions by mistake and will need a clearly marked “emergency exit” to leave the unwanted state without having to go through an extended dialogue. Support undo and redo.
  4. Consistency and standards : Users should not have to wonder whether different words, situations, or actions mean the same thing. Follow platform conventions.
  5. Error prevention : Even better than good error messages is a careful design that prevents a problem from occurring in the first place. Either eliminate error-prone conditions or check for them and present users with a confirmation option before they commit to the action.
  6. Recognition rather than recall : Minimize the user’s memory load by making objects, actions, and options visible. The user should not have to remember information from one part of the dialogue to another. Instructions for use of the system should be visible or easily retrievable whenever appropriate.
  7. Flexibility and efficiency of use : Accelerators—unseen by the novice user—may often speed up the interaction for the expert user such that the system can cater to both

IT86A-HUMAN COMPUTER INTERACTION

Page | 121 The expert heuristic evaluation is one of the most popular methods of UI evaluation because it is quick and dirty and relatively cost effective (Table 8.3). Only a few (typically three to five) UI and domain experts are typically brought in to evaluate the UI implementation in the late stage of the development or even against a finished product. The disadvantage of the expert review is that the feedback from the user is absent, as the HCI expert may not understand the needs of the actual users. On the other hand, the small size

IT86A-HUMAN COMPUTER INTERACTION

Page | 122 of the evaluator pool is compensated by the expertise of the participants. Measurement In contrast to interviews and observation, measurement methods attempt to indirectly quantify the goodness of the interaction/ interface design with a score through representative task performance (quantitative) or quantified answers from carefully prepared subjective surveys (qualitative). Typical indicators for quantitative task performance are the task completion time, score (or amount of task performance in unit time), and errors (produced in unit time). For example : for a mobile game, a representative task might be to “invoke the given game, log in, and reach the main screen.” Another example task, for No Sheets, would be to “invoke the application, load the music file, and set the tempo” (Figure 8.6). Task-performance measurement is only meaningful when compared to the nominal/reference case. Thus, two measurements must be made between the nominal and the new design, and statistical analysis is then applied to derive any meaningful and significant differences between the two measurements (Figure 8.7). To minimize bias or variation, it is generally accepted that it is feasible to gather a sufficiently homogeneous yet relatively small subject pool for physical/cognitive task-performance measurement.

IT86A-HUMAN COMPUTER INTERACTION

Page | 124 Similarly to the task-performance case, a comparative survey against the nominal case is recommended. Both types of measurement experiments can optionally be run over a long period of time, especially when memory performance and familiarity with the task is involved. For instance, to assess the ease of learning an interface, the task performance can be measured over weeks to see how quickly the user recalls how to operate the interface and produce higher performance. Another variation is with the place of the evaluation. When testing with the finished product, it is best to conduct the usage test at the actual place of usage, outside the laboratory (e.g., at the office, at home, on the street, etc.). However, as expected, it is often very difficult to conduct the measurement or testing at the actual place of interaction. Even when it is possible, there are many uncontrollable factors that might affect the outcome of the testing (e.g., having to test in front of other people). To isolate and prevent these possible biases, the testing is often conducted in a laboratory setting as well, with a carefully selected pool of homogeneous subjects. With the advent of smartphones and their ubiquity, in situ field testing is gaining great popularity. Applications can collect user interaction information in the background upon particular interaction events, and this information can then be analyzed in a batch process. While the same danger exists with respect to the environmental biases,

IT86A-HUMAN COMPUTER INTERACTION

Page | 125 these can be often mitigated by the high number of subjects (e.g., users of smartphones and apps). Some research has shown that there is very little difference in the analysis/evaluation results between the controlled laboratory studies and the in situ field studies. However, this result depends on the nature of the applications (especially those for which typical usage situations cannot easily be re-created in the laboratory). In fact, in addition to the need to carefully construct the survey, measurement experiments require meticulous operational logistics to be as fair and bias free as possible, starting from the recruitment, screening, and pretraining of the subjects, compensation for and obtaining the consent of the subjects, choosing the right independent and dependent variables, and applying the right statistical analysis methods to the resulting data. The details of such design of experiments (DEX) are beyond the scope of this book, and we refer you to the related literature. Despite the higher reliability of the evaluation results, a significant amount of effort is needed to prepare and administer the measurement interface evaluation method (Table 8.5).

IT86A-HUMAN COMPUTER INTERACTION

Page | 127 Self-Assessment Task: Lab Activity

  1. Group your class into 2.
  2. The group must visit 4 different Government Online Website and assess its user interface design using: a: Expert Heuristic Evaluation Criteria.
  3. Use the following Likert Scale for each of the elements in evaluation criteria.
  4. Tally and interpret the evaluation result. References ✓ Wikipedia. 2014. Usability. http://en.wikipedia.org/wiki/Usability. ✓ Hart, Sandra G., Steve Land, and E. Lowell. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. Human Mental Workload 1 (3): 139 – 83. ✓ NASA. 2013. NASA Task Load Index. http://humansystems.arc.nasa. gov/groups/tlx/downloads/TLXScale.pdf. ✓ ISO. 2009. Ergonomics of human system interaction—Part 210: Humancentred design for interactive systems. ISO DIS 9241 - 210:2010. Geneva, Switzerland: International Organization for Standardization. ✓ Bevan, N. 2008. UX, Usability and ISO standards. Paper presented at Values, Value and Worth workshop, CHI 2008,

IT86A-HUMAN COMPUTER INTERACTION

Page | 128 Florence, Italy. http://www.cs.tut.fi/ihte/CHI08_workshop/papers/Bevan_UXEM_ CHI08_06April08.pdf. ✓ Lewis, James R. 1995. IBM computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use. International Journal of Human–Computer Interaction 7 (1): 57 – 78. ✓ Wikipedia. 2013. Wizard of Oz experiment. http://en.wikipedia.org/ wiki/Wizard_of_Oz_experiment. ✓ Nielsen, Jakob. 1994. Enhancing the explanatory power of usability heuristics. In Proceedings of the SIGCHI conference on human factors in computing systems, 152 – 58. New York: ACM Press. ✓ Likert, Rensis. 1932. A technique for the measurement of attitudes. Archives of Psychology 22 (140): 1 – 55. ✓ Rowley, D. E. 1994. Usability testing in the field: Bringing the laboratory to the user. In Proceedings of the SIGCHI conference on human factors in computing systems, 252 – 57. New York: ACM Press. ✓ Kaikkonen, A., T. Kallio, A. Kekalainen, A. Kankainen, and M. Cankar. 2005. Usability testing of mobile applications: A comparison between laboratory and field testing. Journal of Usability Studies 1 (1): 4 – 16. ✓ Kjeldskov, J., M. B. Skov, B. S. Als, and R. T. Høegh. 2004. Is it worth the hassle? Exploring the added value of evaluating the usability of context-aware mobile systems in the field. Lecture Notes in Computing Science 3160:61–73.