






















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This paper reviews the research evidence concerning the use of formal instruments to measure students’ evaluations of their teachers, students’ satisfaction with their programmes and students’ perceptions of the quality of their programmes. These
Typology: Essays (high school)
1 / 30
This page cannot be seen from the preview
Don't miss anything!























ISSN 0260-2938 (print)/ISSN 1469-297X (online)/05/040387– © 2005 Taylor & Francis Group Ltd DOI: 10.1080/
Taylor and Francis LtdCAEH300405.sgm10.1080/0260293042000318136Assessment & Evaluation in Higher Education0260-2938 (print)/1469-297X (online)Original Article 2005 Taylor & Francis Ltd 304 000000August 2005JohnRichardsonInstitute of Educational TechnologyThe Open UniversityWalton HallMilton KeynesMK7 [email protected]
This paper reviews the research evidence concerning the use of formal instruments to measure students’ evaluations of their teachers, students’ satisfaction with their programmes and students’ perceptions of the quality of their programmes. These questionnaires can provide important evidence for assessing the quality of teaching, for supporting attempts to improve the quality of teaching and for informing prospective students about the quality of course units and programmes. The paper concludes by discussing several issues affecting the practical utility of the instruments that can be used to obtain student feedback. Many students and teachers believe that student feed- back is useful and informative, but for a number of reasons many teachers and institutions do not take student feedback sufficiently seriously.
Introduction
The purpose of this article is to review the published research literature concerning the use of formal instruments to obtain student feedback in higher education. My primary emphasis will be on sources that have been subjected to the formal processes of independent peer review, but there is also a ‘grey’ literature consisting of confer- ence proceedings, in-house publications and technical reports that contain relevant information even if they are lacking in academic rigour. The first part of my review will cover the predominantly North American litera- ture that is concerned with students’ evaluations of their teachers. I shall briefly refer to attempts to measure student satisfaction and then turn to the predominantly Australian and British literature that is concerned with students’ perceptions of the quality of their programmes. The final part of my review will deal with more practi- cal issues: why collect student feedback? Why use formal instruments? What should be the subject of the feedback? What kind of feedback should be collected? When should feedback be collected? Would a single questionnaire be suitable for all
388 J. T. E. Richardson
students? Why are response rates important? And how seriously is student feedback taken?
Students’ evaluations of teaching
In North America, the practice of obtaining student feedback on individual teachers and course units is widespread. Marsh and Dunkin (1992) identified four purposes for collecting students’ evaluations of teaching (SETs):
● Diagnostic feedback to teachers about the effectiveness of their teaching. ● A measure of teaching effectiveness to be used in administrative decision making. ● Information for students to use in the selection of course units and teachers. ● An outcome or process description for use in research on teaching.
Marsh and Dunkin noted that the first purpose was essentially universal in North America, but the other three were not:
At many universities systematic student input is required before faculty are even consid- ered for promotion, while at others the inclusion of SETs is optional or not encouraged at all. Similarly, in some universities the results of SETs are sold to students in university bookstores as an aid to the selection of courses or instructors, whereas the results are considered to be strictly confidential at other universities. (1992, p. 143)
The feedback in question usually takes the form of students’ ratings of their level of satisfaction or their self-reports of other attitudes towards their teachers or their course units. The feedback is obtained by means of standard questionnaires, the responses are automatically scanned, and a descriptive summary of the responses is returned to the relevant teacher and, if appropriate, the teacher’s head of department. The process is relatively swift, simple and convenient for both students and teachers, and in most North American institutions it appears to have been accepted as a matter of routine. It has, however, been described as a ‘ritual’ (Abrami et al. , 1996, p. 213), and precisely for that reason it may not always be regarded as a serious matter by those involved. In many institutions, the instruments used to obtain student feedback have been constructed and developed in-house and may never have been subjected to any kind of external scrutiny. Marsh (1987) described five instruments that had received some kind of formal evaluation and others have featured in subsequent research. The instrument that has been most widely used in published work is Marsh’s (1982) Students’ Evaluations of Educational Quality (SEEQ). In completing this questionnaire, students are asked to judge how well each of 35 statements (for instance, ‘You found the course intellectually stimulating and challenging’) describes their teacher or course unit, using a five-point scale from ‘very poor’ to ‘very good’. The statements are intended to reflect nine aspects of effective teaching: learning/ value, enthusiasm, organization, group interaction, individual rapport, breadth of coverage, examinations/grading, assignments and workload/difficulty. The evidence using this and other similar questionnaires has been summarized in a series of reviews (Marsh, 1982, 1987; Arubayi, 1987; Marsh & Dunkin, 1992; Marsh & Bailey, 1993).
390 J. T. E. Richardson
unit where different groups of students are taught by different instructors but are subject to the same form of assessment. In these circumstances, there is a clear rela- tionship between SETs and academic attainment, even when the grades are assigned by an independent evaluator, although some aspects of teaching are more important in predicting attainment than others (Cohen, 1981; Marsh, 1987). The relationship between SETs and academic attainment is stronger when students know their final grades, though there is still a moderate correlation if they provide their ratings before their final grades are known (Cohen, 1981). Greenwald and Gilmore (1997a, b) noted that in the latter case the students can acquire expec- tations about their final grades from the results of midterm tests. They found a positive relationship between students’ expected grades and their overall ratings of their teaching but a negative relationship between students’ expected grades and their esti- mated workload. They argued that students reduced their work investment in order to achieve their original aspirations when faced with lenient assessment on their midterm tests. The latter research raises the possibility that SETs might be biased by the effects of extraneous background factors, a possibility that is often used to foster scepticism about the value of SETs in the evaluation of teaching in higher education (Husbands & Fosh, 1993). Marsh (1987) found that four variables were potentially important in predicting SETs: the students’ prior interest in the subject matter; their expected grades; their perceived workload; and their reasons for taking the course unit in ques- tion. Nevertheless, the effects of these variables upon students’ ratings were relatively weak and did not necessarily constitute a bias. For instance, course units that were perceived to have a higher workload received more positive ratings, and the effect of prior interest was mainly on what students said they had learned from the course unit rather than their evaluation of the teaching per se (see Marsh, 1983). Marsh (1987) acknowledged in particular that more positive SETs might arise from the students’ satisfaction at receiving higher grades (the grading satisfaction hypothesis ) or else from other uncontrolled characteristics of the student population. The fact that the relationship between SETs and academic attainment is stronger when the students know their final grades is consistent with the grading satisfaction hypothesis. However, Marsh pointed out that, if students are taught in different groups on the same course unit, they may know how their attainment compares with that of the other students in their group, but they have no basis for knowing how their attainment compares with that of the students in other groups. Yet the correlation between SETs and academic attainment arises even when it is calculated from the average SETs and the average attainment across different groups, and even when the different groups of students do not vary significantly in terms of the grades that they expect to achieve. Marsh argued that this was inconsistent with the grading satisfac- tion hypothesis and supported the validity of SETs. Although the SEEQ has been most widely used in North America, it has also been employed in investigations carried out in Australia, New Zealand, Papua New Guinea and Spain (Marsh, 1981, 1986; Clarkson, 1984; Marsh et al. , 1985; Watkins et al. , 1987; Marsh & Roche, 1992). The instrument clearly has to be adapted (or
Instruments for obtaining student feedback 391
translated) for different educational settings and in some of these studies a different response scale was used. Even so, in each case both the reliability and the validity of the SEEQ were confirmed. In a trial carried out by the Curtin University of Technology Teaching Learning Group (1997), the SEEQ was found to be far more acceptable to teachers than the existing in-house instrument. Coffey and Gibbs (2001) arranged for a shortened version of the SEEQ containing 24 items from six scales) to be administered to students at nine universities in the UK. The results confirmed the intended factor structure of this inventory and also showed a high level of internal consistency. Because cross-cultural research tended to confirm the factor structure of the SEEQ, Marsh and Roche (1994) argued that it was especially appropriate for the increasingly multicultural student population attending Australian universities. In a further study, Coffey and Gibbs (in press) asked 399 new teachers from eight countries to complete a questionnaire about their approaches to teaching. They found that those teachers who adopted a student-focused or learning-centred approach to teaching received significantly higher ratings from their students on five of the six scales in the shortened SEEQ than did those teachers who adopted a teacher-focused or subject-centred approach to teaching. In the case of teachers who had completed the first semester of a training programme, Coffey and Gibbs (2000) found that their students gave them significantly higher ratings on four of the six scales in the shortened SEEQ at the end of the semester than they had done after four weeks. Nevertheless, this study suffered from a severe attrition of participants, and it is possible that the latter effect was simply an artefact resulting from sampling bias. Equally, the students may have given more positive ratings simply because they were more familiar with their teachers. SETs are most commonly obtained when teaching is face-to-face and is controlled by a single lecturer or instructor. It has indeed been suggested that the routine use of questionnaires to obtain students’ evaluations of their teachers promotes an uncritical acceptance of traditional conceptions of teaching based on the bare transmission of knowledge and the neglect of more sophisticated concep- tions concerned with the promotion of critical thinking and self-expression (Kolitch & Dean, 1999). It should be possible to collect SETs in other teaching situations such as the supervision of research students, but there has been little or no research on the matter. A different situation is that of distance education, where students are both physi- cally and socially separated from their teachers, from their institutions, and often from other students too (Kahl & Cropley, 1986). To reduce what Moore (1980) called the ‘transactional distance’ with their students, most distance-learning institu- tions use various kinds of personal support, such as tutorials or self-help groups arranged on a local basis, induction courses or residential schools, and teleconferenc- ing or computer conferencing. This support seems to be highly valued by the students in question (Hennessy et al. , 1999; Fung & Carr, 2000). Nevertheless, it means that ‘teachers’ have different roles in distance education: as authors of course materials and as tutors. Gibbs and Coffey (2001) suggested that collecting SETs in distance
Instruments for obtaining student feedback 393
A similar approach has been adopted in in-house satisfaction surveys developed in the UK, but most have of these have not been adequately documented or evaluated. Harvey et al. (1997) described a general methodology for developing student satisfac- tion surveys based upon their use at the University of Central England. First, signifi- cant aspects of students’ experience are identified from the use of focus groups. Second, these are incorporated into a questionnaire survey in which larger samples of students are asked to rate their satisfaction with each aspect and its importance to their learning experience. Finally, the responses from the survey are used to identify aspects of the student experience that are associated with high levels of importance but low levels of satisfaction. According to Harvey (2001), this methodology has been adopted at a number of institutions in the UK and in some other countries, too. Descriptive data from such surveys have been reported in institutional reports (see Harvey, 1995), but no formal evidence with regard to their reliability or validity has been published.
Students’ perceptions of academic quality
From the perspective of an institution of higher education seeking to maintain and improve the quality of its teaching, it could be argued that the appropriate focus of assessment would be a programme of study rather than an individual course unit or the whole institution, and this has been the dominant focus in Australia and the UK. In an investigation into determinants of approaches to studying in higher educa- tion, Ramsden and Entwistle (1981) developed the Course Perceptions Question- naire (CPQ) to measure the experiences of British students in particular degree programmes and departments. In its final version, the CPQ contained 40 items in eight scales that reflected different aspects of effective teaching. It was used by Ramsden and Entwistle in a survey of 2208 students across 66 academic departments of engineering, physics, economics, psychology, history and English. A factor analysis of their scores on the eight scales suggested the existence of two underlying dimen- sions: one reflected the positive evaluation of teaching and programmes, and the other reflected the use of formal methods of teaching and the programmes’ vocational relevance. The CPQ was devised as a research instrument to identify and to compare the perceptions of students on different programmes, and Ramsden and Entwistle were able to use it to reveal the impact of contextual factors on students’ approaches to learning. However, the primary factor that underlies its constituent scales is open to a natural interpretation as a measure of perceived teaching quality, and Gibbs et al. (1988, pp. 29–33) argued that the CPQ could be used for teaching evaluation and course review. Even so, the correlations obtained by Ramsden and Entwistle between students’ perceptions and their approaches to studying were relatively weak. Similar results were found by other researchers (Parsons, 1988) and this led to doubts being raised about the adequacy of the CPQ as a research tool (Meyer & Muller, 1990). Ramsden (1991a) developed a revised instrument, the Course Experience Ques- tionnaire (CEQ), as a performance indicator for monitoring the quality of teaching
394 J. T. E. Richardson
on particular academic programmes. In the light of preliminary evidence, a national trial of the CEQ was commissioned by a group set up by the Australian Common- wealth Department of Employment, Education and Training to examine perfor- mance indicators in higher education (Linke, 1991). In this national trial, usable responses to the CEQ were obtained from 3372 final-year undergraduate students at 13 Australian universities and colleges of advanced education (see also Ramsden, 1991b). The instrument used in this trial consisted of 30 items in five scales which had been identified in previous research as reflecting different dimensions of effective instruc- tion: good teaching (8 items); clear goals and standards (5 items); appropriate work- load (5 items); appropriate assessment (6 items); and emphasis on independence ( items). The defining items of the five scales (according to the results of the national trial) are shown in Table 1. In addition, three of the items in the Appropriate Assess- ment scale could be used as a subscale to monitor the perceived importance of rote memory as opposed to understanding in assessment. The respondents were instructed to indicate their level of agreement or disagree- ment (along a scale from ‘definitely agree’, scoring five, to ‘definitely disagree’, scor- ing one) with each statement as a description of their programme of study. Half of the items referred to positive aspects, whereas the other half referred to negative aspects and were to be scored in reverse. This means that the instrument as a whole controlled for any systematic responses biases either to agree with all of the items or to disagree with all of the items. (Unfortunately, the items to be scored in reverse were not distributed equally across the five CEQ scales.) As a result of this national trial, it was determined that the Graduate Careers Coun- cil of Australia (GCCA) should administer the CEQ on an annual basis to all new graduates through the Graduate Destination Survey, which is conducted a few months after the completion of their degree programmes. The survey of the 1992 graduates was carried out in 1993 and obtained usable responses to the CEQ from more than 50,000 graduates from 30 institutions of higher education (Ainley & Long, 1994). Subsequent surveys have covered all Australian universities and have typically obtained usable responses to the CEQ from more than 80,000 graduates, reflecting
Table 1. Defining items of the scales in the original Course Experience Questionnaire
Scale Defining item
Good teaching Teaching staff here normally give helpful feedback on how you are going. Clear goals and standards You usually have a clear idea of where you’re going and what’s expected of you in this course. Appropriate workload The sheer volume of work to be got through in this course means you can’t comprehend it all thoroughly. Appropriate assessment Staff here seem more interested in testing what we have memorized than what we have understood. Emphasis on independence Students here are given a lot of choice in the work they have to do.
396 J. T. E. Richardson
research carried out in individual universities in Australia (Trigwell & Prosser, 1991) and Britain (Richardson, 1994). Evidence concerning the psychometric properties of the 23-item version of the CEQ has been obtained in the GCCA surveys and in the study by Wilson et al. (1997); the latter also provided evidence concerning the psychometric properties of the 36-item version of the CEQ. The internal consistency of the scales as measured by Cronbach’s (1951) coeffi- cient alpha is generally satisfactory, although there is no evidence on their test–retest reliability. The composition of the scales according to the results of factor analyses conducted on the responses to individual items is broadly satisfactory. In the 23-item version, all of the items tend to load on distinct factors reflecting their assigned scales (see Byrne & Flood, 2003). The application of Rasch’s (1960) measurement analysis confirms the multidimensional structure of the CEQ (Waugh, 1998; Ainley, 1999). In the 30-item and the 36-item versions, most items load on factors reflecting their assigned scales, but there is a consistent tendency for a few items on the Good Teach- ing scale and the Emphasis on Independence scale to load on other factors. Two recent studies have identified a possible problem with the Good Teaching scale. Broomfield and Bligh (1998) obtained responses to the version of the CEQ devised by Ainley and Long (1994) from 180 medical students. A factor analysis confirmed the scale structure of this instrument, except that the Good Teaching scale was reflected in two separate factors: one was defined by three items concerned with the classroom instruction; the other was defined by two items concerned with feed- back given to the students on their work. Kreber (2003) obtained similar results when she asked 1,080 Canadian students to evaluate particular course units. Of course, the quality of instruction is likely to depend on the competence of individual teachers, but the quality of feedback on students’ work is likely to depend more on institutional practices. The construct validity of the CEQ according to factor analyses on respondents’ scores on the constituent scales is also broadly satisfactory. The modal solution is a single factor on which all of the scales show significant loadings. The Appropriate Workload scale shows the lowest loadings on this factor, and there is debate over whether it should be taken to define a separate dimension (Ainley, 1999; Richardson, 1997). The criterion validity of the CEQ as an index of perceived quality can be tested by examining the correlations between respondents’ scale scores and their responses to the additional item concerned with their overall satisfaction. Typically, all of the CEQ’s scales show statistically significant correlations with ratings of satisfaction (see also Byrne & Flood, 2003), but the Appropriate Workload scale shows the weakest associations. The discriminant validity of the CEQ is shown by the fact that the respondents’ scores on the constituent scales vary across different academic disciplines and across different institutions of higher education offering programmes in the same discipline. In particular, students produce higher scores in departments that pursue student- centred of experiential curricula through such models as problem-based learning (see also Eley, 1992; Sadlo, 1997). Conversely, Ainley and Long (1995) used results from the 1994 GCCA survey to identify departments of psychology in which there was ‘the
Instruments for obtaining student feedback 397
possible need for review of teaching and assessment practices’ (p. 50). Long and Hillman (2000, pp. 25–29) found in particular that ratings on the Good Teaching scale as well as students’ overall level of satisfaction varied inversely with the size of their institution. As mentioned earlier, Ramsden and Entwistle (1981) were originally concerned to demonstrate a connection between students’ perceptions of their programmes and the approaches to learning that they adopted on those programmes. The weak rela- tionships that they and other researchers found cast doubt upon the concurrent validity of the CPQ. In contrast, investigations carried out at the Open University have shown an intimate relationship between the scores obtained on the CEQ by students taking different course units and their self-reported approaches to studying, such that the students who evaluate their course units more positively on the CEQ are more likely to adopt a deep approach to learning (Lawless & Richardson, 2002; Richardson, 2003, in press; Richardson & Price, 2003). Typically, the two sets of measures share between 45% and 80% of their variance. Similar results were obtained by Trigwell and Ashwin (2002), who used an adapted version of the CEQ to assess perceptions of the tutorial system among students at an Oxford college, and by Sadlo and Richardson (2003), who used the original CEQ to assess perceptions of academic quality among students taking subject-based and problem-based programmes in six different schools of occupational therapy. Wilson et al. (1997) demonstrated that students’ scores on the 36-item version of the CEQ were significantly correlated with their cumulative grade point averages. The correlation coefficients were highest for the Good Teaching scale and the Clear Goals and Standards scale, and they were lowest for the Generic Skills and Appropri- ate Workload scales. Of course, these data do not imply a causal link between good teaching and better grades. As mentioned above, Marsh (1987) pointed out that more positive student ratings could result from students’ satisfaction at receiving higher grades or from uncontrolled characteristics of the student population. In both these cases, however, it is not clear why the magnitude of the relationship between CEQ scores and academic attainment should vary across different scales of the CEQ. Finally, Lizzio et al. (2002) constructed a theoretical model of the relationships between CEQ scores, approaches to studying and academic outcomes. They inter- preted scores on the Generic Skills scale and students’ overall ratings of satisfaction as outcome measures, as well as grade point average. In general, they found that students’ scores on the other five scales of the CEQ were positively correlated with all three outcome measures. Students’ perceptions of their academic environment according to the CEQ had both a direct influence upon academic outcomes and an indirect influence that was mediated by changes in the students’ approaches to study- ing. In contrast, students’ academic achievement before their admission to university had only a weak influence on their grade point average and no effect on their overall satisfaction. Although the CEQ has been predominantly used in Australia, it has also been used in other countries to compare graduates and current students from different programmes. For instance, Sadlo (1997) used the CEQ to compare students taking
Instruments for obtaining student feedback 399
In response, a separate instrument, the Postgraduate Research Experience Question- naire was developed (Johnson, 1999, p. 11). Initial results with this instrument indi- cated that it had reasonable internal consistency and a consistent structure based on six dimensions: supervision, skill development, intellectual climate, infrastructure, thesis examination, and goals and expectations. This instrument is now employed across the Australian university system, and the findings are returned to institutions but are not published. Nevertheless, further research demonstrated that the questionnaire did not discriminate among different universities or among different disciplines at the same university (Marsh et al., 2002). As a result, there is considerable scepticism about whether it provides an adequate basis for benchmarking universities or disciplines within universities. One difficulty is the lack of a coherent research base on the expe- riences of postgraduate research students, and this has encouraged the use of totally ad hoc instruments to measure their perceptions of quality. Another difficulty is that evaluations of research training typically confound the overall quality of the research environment with the practice of individual supervisors. It is only very recently that researchers and institutions have recognized the need to distinguish institutional monitoring from enhancing supervisory practice (Chiang, 2002; Pearson et al. , 2002). The GCCA surveys also embrace students who have studied by distance educa- tion, for whom items referring to ‘lecturers’ or ‘teaching staff’ might be inappropriate. As mentioned earlier, academic staff in distance-learning institutions have two rather different roles: as the authors of course materials and as course tutors. Richardson and Woodley (2001) adapted the CEQ for use in distance education by amending any references to ‘lecturers’ or to ‘teaching staff’ so that the relevant items referred either to teaching materials or to tutors, as appropriate. The amended version was then used in a postal survey of students with and without a hearing loss who were taking course units by distance learning with the Open University. A factor analysis of their responses confirmed the intended structure of the CEQ, except that the Good Teach- ing scale split into two scales concerned with good materials and good tutoring. Simi- lar results were obtained by Lawless and Richardson (2002) and by Richardson and Price (2003), suggesting that this amended version of the CEQ is highly robust in this distinctive context. The CEQ was intended to differentiate among students taking different programmes of study, but the GCCA surveys have also identified apparent differ- ences related to demographic characteristics of the respondents, including gender, age, first language and ethnicity. However, the authors of the annual reports from the GCCA surveys have been at pains to point out that these effects could simply reflect the enrolment of different kinds of student on programmes in different disciplines with different teaching practices and different assessment requirements. In other words, observed variations in CEQ scores might arise from respondents taking differ- ent programmes rather than from inherent characteristics of the respondents them- selves. Indeed, in research with Open University students taking particular course units (Richardson, 2005; Richardson & Price, 2003), demographic characteristics
400 J. T. E. Richardson
such as gender and age did not show any significant relationship with students’ perceptions of the academic quality of their courses. One potential criticism of the CEQ is that it does not include any items relating to the pastoral, physical or social support of students in higher education. It is entirely possible to include additional items concerned with institutional facilities, such as computing and library resources. In fact, some institutions involved in the Australian graduate surveys have included extra items regarding administrative matters, student services and recreational facilities, but these additional items were not considered in the published analysis of results from the CEQ (Johnson et al. , 1996, p. 3). An initial analysis suggested that students’ satisfaction with their facilities was a much weaker prediction of their overall satisfaction than the original scales in the CEQ (Wilson et al. , 1997). As Johnson et al. (1996, p. 5) noted, the CEQ does not claim to be comprehensive but seeks information about dimensions of teaching and learning that appear to be central to the majority of academic subjects taught in institutions of higher education. Nevertheless, further developments were motivated by discussions in focus groups with stakeholders as well as analyses of students’ responses to open-ended questions included in the CEQ. McInnis et al. (2001) devised six new scales, each containing five items, to measure the domains of student support, learning resources, course organization, learning community, graduate qualities and intellectual motivation. The Course Organization scale proved not to be satisfactory, but McInnis et al. suggested that the other five scales could be used by institutions in annual surveys of their graduates. This would yield an extended CEQ containing 50 items. McInnis et al. found that students’ scores on the new scales were correlated with their scores on the five original scales of the 23-item CEQ, and they concluded that the inclusion of the new scales had not affected their responses to the original scales (p. x; see also Griffin et al. , 2003). However, McInnis et al. did not examine the constituent structure of their extended instrument in any detail. They have kindly provided a table of correlation coefficients among the scores of 2316 students on the five original scales of the 23- item CEQ and their six new scales. A factor analysis of the students’ scores on all 11 scales yields a single underlying dimension, but this is mainly dominated by the new scales at the expense of the original scales. This might suggest that the extended 50- item version of the CEQ is perceived by students as being mainly concerned with informal aspects of higher education (such as resources and support systems). Like the Generic Skills scale, the new scales were introduced for largely pragmatic reasons and are not grounded in research on the student experience. Hence, although the extended CEQ taps a broader range of students’ opinions, it may be less appropriate for measuring their perceptions of the more formal aspects of the curriculum that are usually taken to define teaching quality. As with students’ evaluations of teaching, there is little evidence that the collection of student feedback using the CEQ in itself leads to any improvement in the perceived quality of programmes of study. Even so, the proportion of graduates who agreed that they were satisfied with their programmes of study in the GCCA surveys has
402 J. T. E. Richardson
intractable with larger samples unless there are a limited number of response alterna- tives to each question that can be encoded in a straightforward way. The use of quan- titative inventories to obtain student feedback has therefore been dictated by organizational constraints, particularly given the increasing size of classes in higher education. The content of such instruments could, of course, be based on results from qualitative research, as was the case for the CEQ, or from focus groups, as in Harvey et al ’s (1997) student satisfaction methodology. In addition, informal feedback is mainly available when teachers and learners are involved in face-to-face situations. In distance education, as mentioned earlier, students are both physically and socially separated from their teachers and their insti- tutions, and this severely constrains the opportunities for obtaining student feedback. In this situation, the use of formal inventories has been dictated by geographical factors as much as by organizational ones (Morgan, 1984). It can be argued that it is not appropriate to compare the reports of students at institutions (such as the Open University) which are wholly committed to distance education with the reports of students at institutions which are wholly committed to face-to-face education. Never- theless, it would be both appropriate and of theoretical interest to compare the reports of distance-learning and campus-based students taking the same programmes at the large number of institutions that offer both modes of course delivery, bearing in mind, of course, the obvious differences in the educational context and the student popula- tion (see Richardson, 2000).
What should be the subject of the feedback?
Student feedback can be obtained on teachers, course units, programmes of study, departments and institutions. At one extreme, one could envisage a teacher seeking feedback on a particular lecture; at the other extreme, one might envisage obtaining feedback on a national system of higher education, especially with regard to contro- versial developments such as the introduction of top-up fees. Nevertheless, it is clearly sensible to seek feedback at a level that is appropriate to one’s basic goals. If the aim is to assess or improve the quality of particular teachers, they should be the subject of feedback. If the aim is to assess or improve the quality of particular programmes, then the latter should be the subject of feedback. Logically, there is no reason to think that obtaining feedback at one level would be effective in monitoring or improving quality at some other level (nor any research evidence to support this idea, either). Indeed, identifying problems at the programme or institutional level might have a negative impact on the quality of teaching by demotivating the staff who are actually respon- sible for delivering the programmes.
What kind of feedback should be collected?
Most of the research evidence has been concerned with students’ perceptions of the quality of the teaching that they receive or their more global perceptions of the academic quality of their programmes. Much less evidence has been concerned with
Instruments for obtaining student feedback 403
students’ level of satisfaction with the teaching that they receive or with their programmes in general. Consumer theory maintains that the difference between consumers’ expectations and perceptions determines their level of satisfaction with the quality of provision of a service. This assumption is embodied in American instru- ments such as the Noel-Levitz Student Satisfaction Inventory and also in Harvey et al ’s (1997) student satisfaction methodology. (Indeed, one could also modify the CEQ to measure students’ expectations when embarking on a programme in addition to their subsequent perceptions of its academic quality.) This theoretical approach was extended by Narasimhan (2001) to include the expectations and perceptions of teachers in higher education as well as those of their students. One fundamental difficulty with this approach is that it privileges satisfaction as a notion that is coherent, homogeneous and unproblematic. In fact, the limited amount of research on this topic suggests that student satisfaction is a complex yet poorly articulated idea that is influenced by a wide variety of contextual factors that are not intrinsically related to the quality of teaching (Wiers-Jenssen et al. , 2002). On theo- retical grounds, it is not at all clear that satisfaction should be a desirable outcome of higher education, let alone that it should be likened to a commodity or service. Indeed, the discomfort that is associated with genuine intellectual growth has been well documented in interview-based research by Perry (1970) and Baxter Magolda (1992). In the case of research using the CEQ, in contrast, students’ ratings of overall satisfaction with their courses or programmes are simply used as a way of validating students’ perceptions of academic quality. A different issue is whether student feedback should be concerned solely with curricular matters or whether it should also be concerned with the various facilities available at institutions of higher education (including computing, library, recre- ational and sporting facilities). It cannot be denied that the latter considerations are important in the wider student experience. However, students’ ratings of these facil- ities are not highly correlated with their perceptions of the quality of teaching and learning (Yorke, 1995), and they are less important as predictors of their overall satis- faction than their perceptions of the academic features of their programmes (Wilson et al. , 1997). As noted in the case of the CEQ, including additional scales about the wider institutional environment might tend to undermine feedback questionnaires as indicators of teaching quality. It would be preferable to evaluate institutional facilities as an entirely separate exercise, and in this case an approach orientated towards consumer satisfaction might well be appropriate.
When should feedback be collected?
It would seem sensible to collect feedback on students’ experience of a particular educational activity at the completion of that activity, since it is presumably their experience of the entire activity that is of interest. In other words, it would be most appropriate to seek student feedback at the end of a particular course unit or programme of study. However, other suggestions have been made. Narasimhan (2001) noted that obtaining feedback at the end of a course unit could not benefit the
Instruments for obtaining student feedback 405
has been successfully used with other problem-based programmes in higher educa- tion (Sadlo & Richardson, 2003; Trigwell & Prosser, 1991). In the GCCA surveys, the CEQ seems to be appropriate for assessing the experi- ence of students on both undergraduate and postgraduate programmes. (Students taking joint degree programmes are asked to provide responses for each of their disci- plines separately.) However, it does not seem to be useful for assessing the experi- ences of students working for postgraduate research degrees, and no suitable alternative has yet been devised. It may prove necessary to evaluate the quality of postgraduate research training using a different methodology from the CEQ. In distance education, it has proved necessary to amend the wording of many of the items in the CEQ, and the constituent structure of the resulting questionnaire reflects the different roles of staff as the authors of course materials and as tutors. The word- ing and the structure of any instrument adopted for use in a national survey of grad- uates would have to accommodate the different practices in campus-based and distance education. More generally, it would have to be able to accommodate varia- tions in practice in higher education that might arise in the future.
Why are response rates important?
Some might argue that the purpose of feedback surveys was simply to provide students with an opportunity to comment on their educational experience. On this argument, students who do not respond do not cause any difficulty because they have chosen not to contribute to this exercise. Nevertheless, most researchers assume that the purpose of feedback surveys is to investigate the experience of all the students in question, and in this case those who do not respond constitute a serious difficulty insofar as any conclusions have to be based on data contributed by a sample. Inferences based upon samples may be inaccurate for two reasons: sampling error and sampling bias. Sampling error arises because, even if a sample is chosen entirely at random, properties of the sample will differ by chance from those of the population from which the sample has been drawn. In surveys, questionnaire responses gener- ated by a sample will differ from those that would be generated by the entire popula- tion. The magnitude of the sampling error is reduced if the size of the sample is increased, and so efforts should be made to maximize the response rate. Sampling bias arises when a sample is not chosen at random from the relevant population. As a result, the properties of the sample may be misleading estimates of the correspond- ing properties of the population as a whole. In surveys, sampling bias arises if relevant characteristics of the people who respond are systematically different from those of the people who do not respond, in which case the results may be at variance with those that would have been found if responses had been obtained from the entire population. Research has shown that people who respond to surveys are different from those who do not respond in terms of demographic characteristics such as age and social class (Goyder, 1987, chapter 5). In market research, a common strategy is to weight the responses of particular groups of respondents to compensate for these
406 J. T. E. Richardson
demographic biases. However, respondents differ from nonrespondents in their attitudes and behaviour in ways that cannot be predicted on the basis of known demo- graphic characteristics (Goyder, 1987, chapter 7). In particular, students who respond to surveys differ from those who do not respond in terms of their study behaviour and academic attainment (Astin, 1970; Nielsen et al. , 1978; Watkins & Hattie, 1985). It is therefore reasonable to assume that students who respond to feedback questionnaires will be systematically different from those who do not respond in their attitudes and experience of higher education. This kind of bias is unavoidable, and it cannot be addressed by a simple weighting strategy. Nevertheless, its impact can be reduced by minimizing the number of non-respondents. In social research, a response rate of 50% is considered satisfactory for a postal survey (Babbie, 1973, p. 165; Kidder, 1981, pp. 150–151). As mentioned earlier, the Australian GCCA surveys require that this response rate be achieved by individ- ual institutions if their average ratings are to be published. Indeed, the vast majority of participating institutions do achieve this response rate, and at a national level the GCCA surveys regularly achieve response rates of around 60%. In other words, this is the kind of response rate that can be achieved in a well-designed postal survey, although it clearly leaves ample opportunity for sampling bias to affect the results. The position of the Australian Vice-Chancellor’s Committee (2001) is that an over- all institutional response rate for the CEQ of at least 70% is both desirable and achievable. Student feedback at the end of course units is often collected in a class situation, and this could be used to obtain feedback at the end of entire programmes (in both cases, presumably, before the assessment results are known). This is likely to yield much higher response rates and hence to reduce the impact of sampling error and sampling bias. There is an ethical issue as to whether students should be required to contribute feedback in this manner. In a class situation, students might feel under pressure to participate in the process, but the guidelines of many professional bodies stipulate that participants should be able to withdraw from a research study at any time. It will be important for institutions to clarify whether the collection of feedback is a formal part of the teaching-learning process or whether it is simply tantamount to institutional research. With the increasing use of information technology in higher education, institutions may rely less on classroom teaching and more upon electronic forms of communica- tion. This is already the case in distance learning, where electronic means of course delivery are rapidly replacing more traditional correspondence methods. Information technology can also provide a very effective method of administering social surveys, including the direct electronic recording of responses (see Watt et al. , 2002). It would be sensible to administer feedback surveys by the same mode as that used for deliver- ing the curriculum (classroom administration for face-to-face teaching, postal surveys for correspondence courses and electronic surveys for on-line courses). Little is known about the response rates obtained in electronic surveys, or whether different modes of administration yield similar patterns of results. It is however good practice to make feedback questionnaires available in a variety of formats for use by students