







Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An insightful theoretical overview of the role and origins of intelligence tests in personnel selection. It discusses the limitations of interviews as a means of assessing aptitudes and abilities and highlights the significance of intelligence tests in providing a reliable, standardized way to evaluate an applicant's problem-solving abilities. The document also explores the history of intelligence testing, from its origins in the late 1800s to its use during World War I, and introduces the concepts of crystallized and fluid intelligence. Crystallized intelligence is assessed through tests of verbal and numerical reasoning ability, while fluid intelligence is assessed through tests of abstract reasoning ability. The document also discusses the importance of test reliability and validity in ensuring accurate results.
Typology: Study notes
1 / 13
This page cannot be seen from the preview
Don't miss anything!








While much useful information can be gained from the standard job interview, the interview nonetheless suffers from a number of serious weaknesses. Perhaps the most important of these is that the interview has been shown to be a very unreliable way to judge a person’s aptitudes and abilities. This is because it is an unstandardised assessment procedure that does not directly assess mental ability, but rather assesses a person’s social skills and reported past achievements and performance. Clearly, the interview provides a useful opportunity to probe each applicant in depth about their work experience, and explore how they present themselves in a formal social setting. Moreover, behavioural interviews can be used to assess an applicant’s ability to ‘think on their feet’ and explain the reasoning behind their decision making processes. Assessment centres can provide further useful information on an applicant by assessing their performance on a variety of work-based simulation exercises. However, interviews and assessment centre exercises do not provide a reliable, standardised way to assess an applicant’s ability to solve novel, complex problems that require the use of logic and reasoning ability. Intelligence tests, on the other hand, do just this; providing a reliable, standardised way to assess an applicant’s ability to use logic to solve complex problems. As such, tests of general mental ability are likely to play a significant role in the selection process. Schmidt & Hunter (1998), in their seminal review of the research literature, note that over 85 years of research has clearly demonstrated that general mental ability (intelligence) is the single best predictor of job performance. From the perspective of assessing a respondent’s intelligence, the unstandardised idiosyncratic nature of interviews makes it impossible to directly compare one applicant’s ability with another’s. Not only do interviews not provide an objective base-line against which to contrast interviewees’ differing performances but, moreover, different interviewers typically come t o radically different conclusions about the same applicant. Not only do applicants respond
differently to different interviewers asking ostensibly the same questions, but what applicants say is often interpreted quite differently by different interviewers. In such cases we have to ask which interviewer has formed the ‘correct’ impression of the candidate, and to what extent any given interviewer’s evaluation of the candidate reflects the interviewer’s preconceptions and prejudices rather than reflecting the candidate’s performance. There are similar limitations on the range and usefulness of the information that can be gained from application forms or CV’s. Whilst work experience and qualifications may be prerequisites for certain occupations, in and of themselves they do not predict whether a candidate is likely to perform well or badly in a new position. Moreover, a person’s educational and occupational achievements are likely to be limited by the opportunities they have had, and as such may not reflect their true potential. Intelligence tests enable us to avoid many of these problems, not only by proving an objective measure of a person’s ability, but also by assessing the person’s potential, and not just their achievements to date.
The assessment of mental ability, or intelligence, is one of the oldest areas of research interest in psychology. Gould (1981) has traced attempts to scientifically measure mental acuity, or ability, to the work of Galton in the late 1800s. Prior to Galton’s (1869) pioneering research, the assessment of mental ability had focussed on phrenologists’ attempts to assess intelligence by measuring the size of people’s heads! Reasoning tests, in their present-day form, were first developed by Binet (1910); a French educationalist who published the first test of mental ability in 1905. Binet was concerned with assessing the intellectual development of children, and to this end he invented the concept of mental age. Questions assessing academic ability were graded in order of difficulty, according to the average age at which children could successfully answer each question. From the child’s performance on this test, it was possible to derive the child’s mental age. If, for example, a child performed at the level of the average 10
year old on Binet’s test then that child was classified as having a mental age of 10, regardless of the child’s chronological age. The concept of the Intelligence Quotient (IQ) was developed by Stern (1912), from Binet’s notion of mental age. Stern defined IQ as mental age divided by chronological age multiplied by
The idea of general mental ability, or general intelligence, was first conceptualised by Spearman in 1904. He reflected on the popular notion that some people are more academically able than others, noting that people who tend to perform well in one intellectual domain (e.g., science) also tend to perform well in other domains (e.g., languages, mathematics, etc.). He concluded that an underlying factor termed general intelligence, or ‘g’, accounted for this tendency for people to perform well across a range of areas, while differences in a person’s specific abilities or aptitudes accounted for their tendency to p e r f o r m marginally better in one area than in another (e.g., to be marginally better at French than they are at Geography). Spearman, in his 1904 paper, outlined the theoretical framework underpinning factor analysis; the statistical procedure that is used to identify the shared factor (‘g’) that accounts for a person’s tendency to perform well (or badly) across a range of different tasks. Subsequent developments in the mathematics underpinning
factor analysis, combined with advances in computing, meant that after the Second World War psychologists were able to begin exploring the structure of human mental abilities using these new statistical procedures. Being most famous for his work on personality, and in particular the development of the 16PF, the pioneering work that Raymond B. Cattell (1967) did on the structure of human intelligence has often been overlooked. Through an extensive research programme, Cattell and his colleagues identified that ‘g’ (general intelligence) could be decomposed into two highly correlated subtypes of mental ability, which he termed fluid and crystallised intelligence. Fluid intelligence is reasoning ability in its most abstract and purest form. It is the ability to analyse novel problems, identify the patterns and relationships that underpin these problems and extrapolate from these using logic. This ability is central to all logical problem solving and is crucial for solving scientific, technical and mathematical problems. Fluid intelligence tends to be relatively independent of a person’s educational experience and has been shown to be strongly determined b y genetic factors. Being the ‘purest’ form of intelligence, or ‘innate mental ability’, tests of f l u i d intelligence are often described as culture fair. Crystallised intelligence, on the other hand, consists of fluid ability as it is evidenced in culturally valued activities. High levels of crystallised intelligence are evidenced in a person’s good level of general knowledge, their extensive vocabulary and their ability to reason using words and numbers. In short, crystallised intelligence is the product of cultural and educational experience in interaction with fluid intelligence. As such it is assessed by traditional tests of verbal and numerical reasoning ability, including critical reasoning tests.
Normative data allows us to compare an individual’s score on a standardised scale against the typical score obtained from a clearly defined group of respondents (e.g., graduates, the general population, etc.). To enable any respondent’s score on the ART to be meaningfully interpreted, the test was standardised against a population similar to that on which it has been designed to be used (e.g., people in technical, managerial, professional and scientific roles). Such standardisation ensures that the scores obtained from the ART can be interpreted by relating them to a relevant distribution of scores.
The reliability of a test assesses the extent to which the variation in test scores is due to true differences between people on the characteristic being measured – in this case fluid intelligence – or to random measurement error. Reliability is generally assessed using one of two different methods; one assesses the stability of the test’s scores over time, the other assesses the internal consistency, or homogeneity, of the test’s items.
Also known as test-retest reliability, this method for assessing a test’s reliability involves determining the extent to which a group of people obtain similar scores on the test when it is administered at two points in time. In the case of reasoning tests, where the ability being assessed does not change substantially over time (unlike personality), the two occasions when the test is administered may be many months apart. If the test were perfectly reliable, that is to say test scores were not influenced by any random error, then respondents would obtain the same score on each occasion, as their level of intelligence would not have changed between the two points in time when they completed the test. In this way, the extent to which respondents’ scores are unstable over time can be used to estimate the test’s reliability.
Stability coefficients provide an important indicator of a test’s likely usefulness. If these coefficients are low, then this suggests that the test is not a reliable measure and is therefore of little practical use for assessment and selection purposes.
Also known as item homogeneity, this method for assessing a test’s reliability involves determining the extent to which, if people score well on one item, they also score well on the other test items. If each of the test’s items were a perfect measure of intelligence, that is to say the score the person obtained on the items was not influenced by any random error, then the only factor that would determine whether a person was able to answer each item correctly would be the item’s difficulty. As a result, each person would be expected to answer all the easier test items correctly, up until the point at which the items became too difficult for them to answer. In this way, the extent to which respondents’ scores on each item are correlated with their scores on the other test items, can be used to estimate the test’s reliability. The most commonly used internal consistency measure of reliability is Cronbach’s (1960) alpha coefficient. If the items on a scale have high intercorrelations with each other, then the test is said to have a high level of internal consistency (reliability) and the alpha coefficient will be high. Thus, a high alpha coefficient indicates that the test’s items are all measuring the same thing, and are not greatly influenced by random measurement error. A low alpha coefficient on the other hand suggests that either the scale’s items are measuring different attributes, or that the test’s scores are affected by significant random error. If the alpha coefficient is low this indicates t h a t the test is not a reliable measure, and is therefore of little practical use for assessment and selection purposes.
The fact that a test is reliable only means that the test is consistently measuring a construct, it does not indicate what construct the test is consistently measuring. The concept of validity addresses this issue. As Kline (1993) notes ‘a test is said to be valid if it measures what it claims to measure’. An important point to note is that a test’s reliability sets an upper bound for its validity. That is to say, a test cannot be more valid than it is reliable because if it is not consistently measuring a construct it cannot be consistently measuring the construct it was developed to assess. Therefore, when evaluating the psychometric properties of a test its reliability is usually assessed before addressing the question of its validity. There are two principal ways in which a test can be said to be valid.
Construct validity assesses whether the characteristic which a test is measuring is psychologically meaningful and consistent w i t h how that construct is defined. Typically, the construct validity of a test is assessed by demonstrating that the test’s results correlate other major tests which measure similar constructs and do not correlate with tests that measure different constructs. (This is sometimes referred to as a test’s convergent and discriminant validity). Thus, demonstrating that a test which measures fluid intelligence is more strongly correlated with an alternative measure of fluid intelligence than it is with a measure of crystallised intelligence, would be evidence of the measure’s construct validity.
This method for assessing the validity of a test involves demonstrating that the test meaningfully predicts some real-world criterion. For example, a valid test of fluid intelligence would be expected to predict academic performance, particularly in science and mathematics. Moreover, there are two types of criterion validity - predictive validity and concurrent validity. Predictive validity assesses whether a test is capable of predicting an agreed criterion which will be available at some future time, e.g., can a test of fluid intelligence predict
future GCSE maths results. Concurrent validity assesses whether the scores on a test can be used to predict a criterion which is available at the time the test was completed, e.g., can a test of fluid intelligence predict a scientist’s current publication record.
The ART was standardised on a sample of 651 adults of working age, drawn from a variety of managerial, professional and graduate occupations. The mean age of the standardisation sample was 30.4 years, with 32% of the sample being women. 24% of the sample identified themselves as being of non- white (European) ethnic origin. Of these respondents 22% identified themselves as being of Black African ethnic origin, 12% of Indian origin, 9% of Black Caribbean origin and 8% of Pakistani origin.
Gender differences on the ART were examined by comparing the scores that a group of male and a group of female graduate level bankers obtained on the ART. These data are presented in Table 1 and clearly indicate no significant sex difference in ART scores, suggesting that this test is not sex biased.
Table 2 presents alpha coefficients for the ART on a number of different samples. Inspection of this table indicates that all these coefficients are above .8, indicating that the ART has good levels of internal consistency reliability.
As noted above, test-retest reliability estimates the test’s reliability by assessing the temporal stability of the test’s scores. As such, test-retest reliability provides an alternative measure of reliability to internal consistency estimates of reliability, such as the alpha coefficient. Theoretically, test-retest and internal consistency estimates of a test’s reliability
be predicted, fluid intelligence was found to be modestly correlated (r=.29, p<.001) with a more abstract rather than a more concrete learning style, and with a more holistic (r=.23, p<.001) rather than a more serial (i.e., focussing on the ‘big picture’ rather than fine details) learning style, providing further support for t h e construct validity of the ART.
Intellectance is a meta-cognitive variable that assesses a person’s perception of their general mental ability. While it is a personality factor, rather than an ability factor, intellectance has nonetheless consistently been found to correlate with objective assessments of mental ability. As such it would be expected to be modestly correlated with fluid intelligence. A sample of 132 applicants for managerial and professional posts completed the 15FQ+ and the ART as part of an assessment process. The ART was
found to be correlated (r=.32, p<.001) with t h e 15FQ+ Factor Intellectance (ß), providing further support for the construct validity of the ART.
Table 3 presents the mean scores (and the associated 95% confidence intervals) on the ART obtained by three occupational groups o f differing ability levels; retail managers, bankers and graduate engineers. Given the nature of their work, engineers would be expected to have the highest level of fluid intelligence followed by bankers then retail managers. Inspection of Table 3 indicates that these means conform to this pattern, with the 95% confidence intervals indicating that these differences are unlikely to be due to chance effects. These data therefore provide further support for the construct validity of the ART.
Table 3. Mean ART scores for each occupational group and associated 95% confidence intervals
Sample Sample Size Mean and 95% Confidence Interval Graduate Engineers n=40 24.03 ±3. Graduate Bankers n=209 21.20 ±1. Retail Managers n=105 15.2 ±2.
yourselves, but ask me if anything is not clear. Please ensure that any mobile telephones, pagers or other potential distractions are switched off completely. We shall be doing the Abstract Reasoning Test which is timed for 3 0 minutes. During the test I shall be checking to make sure you are not making a n y accidental mistakes when filling in the answer sheet. I will not be checking your responses to see if you are answering correctly or not.
Put candidates at their ease by giving information about yourself: the purpose of the test; the timetable for the day; whether or not the questionnaire is being completed as part of a wider assessment programme, and how the results will be used and who will have access to them. Ensure that you and other administrators have requested that all mobile phones have been switched off, etc. The instructions below should be read out verbatim and the same script should be followed each time the ART is administered to one or more candidates. Instructions for the administrator are printed in ordinary type. Instructions designed to be read aloud to candidates have lines marked above and below them, are in bold and enclosed by speech marks. If this is the first or only questionnaire being administered, give an introduction as per or similar to the following example:
Rectify any omissions, then say:
If biographical information is required, ask respondents to complete the biodata section. If answer sheets are to be scanned, explain and demonstrate how the ovals are to be completed, emphasising the importance of fully blackening the oval. Walk around the room to check that the instructions are being followed.
WARNING: It is vitally important that test booklets do not go astray. They should be counted out at the beginning of the session and counted in again at the end.
WARNING: It is most important that answer sheets do not go astray. They should be counted out at the beginning of the test and counted in again at the end.
Remembering to read slowly and clearly, go to the front of the group and say:
the instructions for this test as I read them aloud.
Then ask:
some rough paper and an answer sheet.
instructed to do so.
the lines provided. Indicate your preferred title by checking the title box, then note your gender and age. Please insert today’s date which is [ ] in the Date of Testing box.
relationships between abstract shapes and figures. The test consists of 35 questions and you will be having 30 minutes in which to attempt them.
Binet, A. (1910) Les idees modernes sur les enfants. Paris: E. Flammarion.
Cattell, R. B. (1967). The theory of fluid and crystallised intelligence. British Journal of Educational Psychology, 37, 209-224.
Cronbach, L.J. (1960). Essentials of Psychological Testing (2nd Edition). New York: Harper.
Galton F. (1869). Hereditary Genius. London: MacMillan.
Gould, S.J. (1981). The Mismeasure of Man. Harmondsworth, Middlesex: Pelican.
Heim, A.H., Watt, K.P. and Simmonds, V. (1974). AH2/AH3 Group Tests of General Reasoning; Manual. Windsor: NFER Nelson.
Kline, P. (1993). Personality: The Psychometric View. London: Routledge.
Schmidt, F.L., & Hunter, J.E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124, 262-274.
Spearman, C. (1904). General-intelligence; objectively defined determined and measured. American Journal of Psychology, 15, 210-292.
Stern, W. (1912). Psychologische Methoden der Intelligenz-Prufung. Leipzig, Germany: Earth.
Terman, L.M. et. al., (1917). The Stanford Revision of the Binet-Simon scale for measuring intelligence. Baltimore: Warwick and York.
Yerkes, R.M. (1921). Psychological examining in the United States army. Memoirs of the National Academy of Sciences, 15.