Towards a Sociology of Assessment

It is necessary to begin by arguing that assessment is a proper topic for sociological inquiry, even though it is usually ignored, or left to the rival specialism of psychology, or to specialist assessors. 

Assessing and grading people is one of the most important functions of education, and it is implicit in  much of the more general functionalist approaches. Thus a meritocratic 'contest' system really depends upon the talented people being correctly identified, despite any unpromising social aspects they may display (low status skin colours or genders, or any of the signs of a 'lower class' upbringing), although this implication is rarely developed very far. Other approaches open up more possible critical perspectives too: the individualised grading and ranking of students would seem to be at the heart of marxist or weberian critiques of 'rationalisation' , and we can also try it out as an example of hailing or interpellation see file  (one not actually discussed explicitly by Althusser, however). 

Interactionist work has addressed some aspects of assessment, including students' responses to its pressures, as we shall see, but there is another potential area too- we know that there are important ways in which students are labelled by schools, and how these labels can come to have an effect, as teachers respond to adverse labels with low expectations (see Rist in Hammersley (1986)). Much of this work focuses on personal interactions in classrooms, however, on the ways in which teachers talk to students, ask them questions, or group them for teaching - again, formal assessment, the awarding of grades, which is arguably the most powerful form of labelling that a school can offer, is not examined. 

One problem is that assessment can become the province of the specialist statistician or administrator. However, the measurement problems in assessing or grading students should be ones in which sociologists too can intervene. Grading essays, for example (still a common assessment  device in the UK at least) really offers a familiar methodological problem, similar to those  encountered when coding open-ended responses on a questionnaire - somehow, a mass of utterances have to be reduced to a simpler form, a score. Multi-choice questions can be seen as similar to fixed-choice research questions, of course. Following this analogy, assessing by reference to published criteria, marking schemes or model answers is just like using established codes of responses, while double-grading or moderation is a mechanism to establish inter-coder reliability. 

Now the point of this analogy is clear - sociologists know that coding is a controversial yet crucial process, with substantive effects on the 'hard data' that are produced, long before any statistician gets their hands on them. We know that Durkheim's data on suicide rates has been effectively criticised, or that Garfinkel has shown how coroners have to use judgements and  fairly inexplicit rules to judge what category an unexpected death falls into, or that Cicourel has  shown that statistics on juvenile crime are produced in a strong sense by the decisions and recording practices of police officers or clerks in the juvenile justice system. 

Academics are often strangely reluctant to examine their own coding practices, when they assess students, with the same critical eye, however. Similarly , the normal practices to counteract 'bias' in assessment can be oddly limited - at best, there might be routine moderation practices to counteract individual marker biases, as we know, but little can be done to counter collective 'biases'. Again, while it is routine to pilot research instruments, the piloting of assessment devices is probably rare. Finally, while it is well known that respondents can try to assist researchers by giving them 'right answers', the collective manufacture of 'correct answers' by students still seems to be an embarrassment that is largely undiscussed. We shall return to these oddities of practice at the end, and try to locate them in various institutional constraints. 

However, let us first examine some of the classic work that has been done on a sociology of assessment. Bourdieu's 1986 work will serve to introduce the matter of the production of judgements that we raised above. He researched the ways in which French academics actually judged their students. 'Judgement' here refers to two stages of assessment really - the formal matter of awarding grades for assignments, and the secondary stage of using those grades, and other vital sources of information, to arrive at an overall judgement about the worth of a person. There are often two stages in the practices of UK universities too, perhaps -one where student work is graded according to the regulations, and another area of discretion, often at final examination boards, where those grades are combined, interpreted, modified in the light of any other information known about candidates (such as their health records, career intentions or general abilities). 

Bourdieu argues that social judgements are involved, inevitably, at both stages. Most students will be vaguely aware that 'style' is important in written work, or will have noticed comments or criteria that mention matters like 'critical analysis'. These terms belong to a whole organised outlook or 'aesthetic' according to Bourdieu, one which is often held unconsciously by people who have grown up with it as 'second nature'. These are people with large amounts of cultural capital who know how to manage arguments with style and with a certain aloof calmness  see file on Bourdieu . To be very succinct, his analysis shows that particularly frank and negative judgements are reserved for those from the 'lowest' social origins, and rarely applied to 'those with the richest cultural capital', while those in between receive a curious mixture of euphemism, coded comment, and faint praise. Actual grades awarded are more evenly distributed, but those judgements remain on file. The judgements themselves are based on a 'whole collection of disparate criteria, never clarified, hierarchized or systematized' (1988: 200) which are listed as 'handwriting', 'appearance', 'style', 'general culture', '"external" criteria' such as 'accent, elocution and diction', and 'finally and above all the bodily "hexis"' which includes 'manners and behaviour, which are often designated, very directly, in the remarks'. (1988: 200). 

Students who come from different social backgrounds will not share these unconscious aesthetics, and will tend to find academic life mysterious, and assessment a lottery. Not all those from such backgrounds will fail, of course. Some will be able to learn about and adopt he forms of judgement they encounter. Keddie's pioneering work (in Young 1971) showed how teachers respond more positively to those who adopt the teachers' own stances towards 'classroom nowledge', by taking over their vocabulary and interests. Bourdieu himself writes about people who can acquire cultural capital from schooling itself. They do so in a way which resembles the so-called 'primitive accumulation' theories of the growth of capitalism -gradually acquiring nuggets of knowledge and know-how by sheer hard work and careful regulation or organisation of their lives. I have argued (in Evans and Murphy 1993) that this sort of approach tends to dominate much of the work on study skills, and is detectable in many other forms of pedagogy. Bourdieu remarks that such people will still never be able to perform as unconsciously, and apparently effortlessly, as can those born into inherited cultural wealth, and will encounter the same patronising reactions as do the nouveau riche in grand society. 

Of course, there is another option too for those who have neither cultural capital nor the desire to patiently acquire it. We can fit together the work of Bourdieu and that in an interactionist classic - Becker's study of how students cope with the demands of continuous assessment schemes by 'playing the game' (see Rowntree 1987). Shrewd and well-organised American students apparently used the resources of their fraternities for such activities as filing successful assignments for future 
use, keeping notes on tutor's preferences, and introducing newcomers to the old hands so they could be 'wised up' on matters such as which courses or tutors were 'easier', and any residual guilt could be explained away by the 'realists'. In a wider context, these look like the traditional resources of more proletarian groups, who have always survived by techniques like 'poaching, tricking&ldots;' (see Frow 1991) helped by circuits of 'popular cultural capital', as Fiske (1987) puts it. As many teachers suspect, the widespread availability of academic materials on the Net could well provide even more opportunities for the enterprising game-player. 

In my own view, this kind of orientation to academic life is likely to be widespread among mature students entering higher education (the source of much recent growth in the UK) -such students have only managed to enter universities after much strategic management of their lives, families, incomes and obligations, and are therefore probably skilled 'game players' already. Certainly, even universities can no longer rely on fully committed applicants willing to suspend their lives for three 
years, live away from home, and submerge themselves uncritically in the strange alien culture of academic life. I am not suggesting that this is any other than a most welcome development, of course - the old sponsored system achieved its social solidarity at the price of a deep elitism, even if a few outsiders were successfully integrated. 

Another dimension to this discussion of what be called the social production of assessable worth, arises with work like Miller's and Parlett's on 'cue-seeking' (in Hammersley and Woods 1976).  There is an unmistakable hint in the article of the complicit behaviour of staff in such matters -some faculty are obviously spraying 'cues' for students to detect, while others, perhaps, are less concerned to do so. This leads on to the still undeveloped matters which emerge occasionally in reports of recent staff behaviour in schools: some teachers are suspected of 'teaching to the test' with the periodic SATs they have to administer, sometimes having opened the sealed packs of questions a week or two beforehand. Much remains to be researched in terms of the role of 'revision sessions' in higher education, the nature of 'supervision' for project work, or the ways in which coursework and examination papers can 'overlap'. 

Staff collusion is one way, perhaps, in which to show how actual practice can affect the kind of testing and analysis of assessment scores that go on. It would be easy enough in theory to think of ways to test how much of actual assignment answers really did reflect the 'ability' of the individual student, simply by controlling the extraneous variables. 'Surprise' tests, check questions (maybe in the form of oral exams), heavily supervised and controlled testing, genuinely 'double-blind' 
moderation, a sustained (electronic?) search across a set of assignment answers for evidence of patterns - all these could be developed in theory, although it is quite unlikely that they will be in practice. It really is in no-one's material interest to probe the validity of grading too closely (or to develop any other precise measures of quality either, I shall be arguing). Teachers want students to succeed as much as students do, especially where institutions are increasingly assessed themselves in various kinds of 'league tables', connected to funding. As consumerism grows in the UK system, moreover, so will the pressures on teachers to try and give customers what they want -or they will go elsewhere. 

At another level of generality, we can explain this sort of behaviour as the legacy of contradictory social imperatives (rather than as something like the personal hypocrisy of professionals). Educational institutions must 'regulate the ambitions' of students against rapidly changing demands for different types of labour, for example, or, indeed against rapid expansions of places but not resources, in the (UK) higher education system. Marxist analysis points to contradictions between the need to demonstrate formal equality and the need to develop a rational hierarchy among students eady for the divisions in the labour market. On a more specifically political, level, schools and universities have to demonstrate their success in teaching larger and larger proportions of students,but they also have to keep the confidence of the public by 'maintaining standards' (which must mean failure for some). 

These structural tensions, between the 'educational', and the 'social/organisational' aspects of assessment, lie behind the more specific problems faced by actual teachers. As a recent internal policy document of mine puts it: 

In general, assessment needs to meet at least two concerns: 

    As far as the student is concerned, assessment needs to be pitched at the right level to offer stimulation and challenge as well as 'fairness' or 'transparency' (and, sometimes, 'relevance').  'Busy work' and low-level 'rote' tasks need to be avoided, as do poorly-designed tasks with  little connection with the course material. There is much recent research on the tendency for such inadequately designed assessment to produce a 'surface' or an 'instrumental' approach to learning, to an extent which completely contradicts the intentions in good curriculum design. All these points have implications for the overall loading, weighting and variety of assessment  tasks. 
    As far as professional teachers are concerned, assessment needs to be fair, rationally defensible and consistent (both valid and reliable) in the ways in which it produces overall distributions of grades. There are many undesirable unintended consequences of assessment schemes which can be introduced by a combination of good intentions and inadequate analysis of effects. As obvious examples, students should not be assessed in so diverse a  number of ways as to encourage a 'regression towards the mean', nor should they be exposed to assessment that significantly rewards qualities other than those provided by the course alone. They should not stand to gain significant advantage merely from having been allocated to particular tasks, sections of courses or tutors. 

The problem is that assessment designed to meet one function can lead to problems with the others. To take an obvious example, one option is to design a series of open-ended tasks with the students having great freedom to choose topics and work at their own pace. This might help retain interest and motivation and lead to genuinely educational gains for individuals -but such assignments also routinely produce 'bunched' distributions of grades, since they are not good ways to discriminate 
between students. On the other hand, those devices which are good at discrimination, and which will produce nice neat distributions of grades, such as multiple-choice testing, can often reward pretty trivial skills and lead to an instrumental approach. 

The technical literature is full of these and other dilemmas (e.g. Rowntree1987), but this is more than a technical matter, and it follows that there are no simple technical solutions. No matter how impeccably designed, assessment devices will always have to be interpreted and managed by students, and, of course, their friends and families, unless we are to head towards a thoroughly sinister system of constant surveillance, perhaps of an electronic kind. Similarly, the tensions between the apparently technical functions of assessment will have to be solved, in practice, by further familiar social processes - by the use of power, for example, as assessment boards decide whether they can 'allow' so many good grades that year, for example, or whether a particular student 'deserves' a good grade regardless of her actual scores. 

As with other uses of power, these decisions will have to be legitimated in one of the familiar ways - by a board claiming to act 'in the interests of us all', for example (what Habermas (1976) would call 'distorted communication'), or by obfuscating matters behind a 'technical veil' of statistical expertise, procedures and reports, or by simply insisting on confidentiality. A thoroughgoing critical and sociological analysis of this routine occurrence is still to be done, but it would begin with: 

    A refusal to accept the apparent facticity and inevitability of assessment, and a clear desire to see the historical, social and political aspects of this common and unremarked process. Can the huge increase in assessment be traced to an increasing surveillance of the workforce? Or to a sharpening 'credentialist turn'? 
    A willingness to penetrate the technical details, in the same spirit as one deconstructs the statistics in a research report. Assessment scores are treated as hard data, as 'reified', and they are combined, manipulated and reconstituted as if they were unproblematic - yet a little gentle probing usually reveals the assumptions and value-judgements which are inherent. It is one of the classic functions of a sociology of education to restore transparency to these matters. 

