Notes on two chapters in: Hammersley, M. (Ed) (1986). Case Studies in Classroom Research. Open University Press: Milton Keynes

Dave Harris

Hammersley, M. Measurement in ethnography: the case of Pollard on teaching style, pages 49 to 60

Measurement can be found in ethnography if we define it suitably. The 'standard model' involves the development of specific categories which are exhaustive exclusive and can be located on a particular scale. The rules should be 'explicitly stated, should specify concrete indicators, should be unambiguous, and should be applied in the same way to all of the data' (49). However, there is a more general issue — 'the linking of abstract concepts to particular data; and this problem faces ethnographers as much as it does any other social researcher'

There is often 'some vagueness' here, and problems arise, for example in the recent article on teaching styles by Pollard. This is an 'exemplary piece', with an explicit theory and a wide range of data. The theory is actually a complex one, presented 'in softened terms — using words like 'influence', 'implication', 'reinforcement' rather than causality or determination, although he does argue that the working consensus in a class is 'determined by the coping strategies of teachers and pupils'.

 Classroom regimes and teaching styles (51) are described as located on a 'progressive – traditional dimension',and individual teachers are allocated, using different sorts of evidence . This ranges from documentary information about teachers and classes, observer description, often as a summary over time, more 'frequency specified observer description', time specific observer description, quotations from participants accounts used to document perspectives and describe events. These are 'probably a representative sample of the kinds of information ethnographers usually employ' (53) but there are problems.

Accuracy means either that teachers and pupils actually did do what is described, whether transcriptions are accurate, or verbalisation somehow authored. Possible inaccuracies obviously increase when these are recorded by field notes rather than tape-recording. Non-verbal elements, however are 'more difficult'. We require 'accurate portrayal of patterns of physical movement', and again field notes, the limitations of recording rapid interaction, and memory limitations present problems. Ethnomethodologists show some particular problems, especially when researchers 'attribute intentions and attitudes on the basis of what people say or do' (54). Observers might disagree, and it is not usual to reveal the grounds for these interpretations. Other evidence is more easy to assess, as when conversations of participants support researcher interpretations. However participants accounts as descriptions of events are particularly open to 'threats to accuracy' since here we are relying on others not just the researcher. Multiple accounts as in triangulation is an acceptable strategy, 'but it is no panacea' (55).

There are further problems with generalisability, which involve researcher judgements about tip equality. However, 'informal estimates of frequency are open to… Large errors' (55). Providing frequencies can help, but we still need to know exactly how these were obtained, over what sort of period, and how data (contacts between teachers and pupils in this case) were actually identified. In turn this requires some specifications of identification criteria and also how they were applied — e.g. live coding or video recording. In other words we have the usual problems in devising category schemes and we need information about how the most common variations ('definition, coding procedure and practice' (56)) were overcome.

Content validity refers to the extent to which evidence exhausts all the components of the definition (of this case traditional or progressive). We need 'adequate definitions of key concepts'. If the researcher does not provide them, we could try to explicate them ourselves or borrow from other accounts, but the issue is how the researcher has defined them.

Constructed validity is 'the extent to which an indicator accurately measures the concept or component of the concept it is supposed to measure'. Do variations in the indicator reflect actual variations in the variable. Definitions again are required. Empirical frequencies may not be very valid — for example if an aspect of an overall style, learning the tables, for example, is frequently seen, but not the other components. Other factors may also influence particular frequencies, for example a forthcoming visit by the inspectors 'or even the presence of the Observer' (57). Another problem is ad hoc use of indicators, and we might need to look at how stable interpretations of data actually are, across observers, for example. There might be random variations as well as systematic error.

Overall we need to ask separate questions: whether descriptions and explanations are correct; whether the researchers taken the best precautions and made the best checks to maximise validity; whether the researchers provided us with necessary information about these precautions and checks. Usually, the researcher is not in a position to offer certainty in any of these. The point is that they need to be addressed in future research.

Scarth J and Hammersley M Some Problems in Assessing the Closeness of Classroom Tasks, pages 70 – 84

There have been variety of ways to define teachers questions and examination questions. Usually, most of the questions turn out to be closed, requiring 'low-level cognitive operations, notably memory' (70). There are methodological questions in measuring closedness, however, although they have not been given much attention — many researchers seem to underestimate them. The problems turn on identifying the task; specifying what closed means; the reliability and validity of categories; the weighting of tasks.

We have to identify tasks first, specifying rules which helps decide whether task as occurred and what those tasks might be. There are particular problems with oral questions: if a teacher asks a question the resulting exchange can offer different possibilities [described in a diagram on 72, and ranging from pupils answers, silences, or requests for clarification challenge, which in turn produce a variety of teacher responses ranging from overt acceptance to answer guidance and negotiation]. Questions run into other questions and go through cycles — is every teacher elicitation and answer a separate question? There is a problem in identifying elicitation is or matters such as acceptance of answers — 'words and syntax are an imperfect guide to pragmatic function' (73), for example questions might be rhetorical, and silences can elicit answers. Sometimes teacher and pupil talk overlaps. Some questions might elicit a sequence of answers, so it is not clear if this is one task or multiple tasks. Sometimes one question is embedded with another. Repetition and clarification requests are not easy to code. Lots of other questions and answers go on off task.

Overall there might be considerable inconsistency in what is recorded [an example follows], and different roles will produce different numbers of questions and answers [in the example cited there could be either 20 or three questions depending on definitions. Clearly the problems are present in attempts to classify questions as open or closed, and how significant any differences in scores might be. The same goes with attempts to classify written work [with an example] — there could be either 42 or seven tasks.

Closedness is difficult to define. Does it reflect teacher expectations about answers, or the cognitive strategies pupils use. These do not always match, although they are quite often assumed to. Even taking teacher expectations, a closed task can be either one where teacher assumes there is a single right answer, or whether the task is intended to demand lower cognitive activity, and even whether a clear indication is given to pupils about what would count as a right answer.

With category systems, the coding tasks are equally difficult. How many should be used? Should they be scaled or just used in classification? Are the categories 'clearly defined in mutually exclusive and exhaustive, so that each and every task is assignable, unambiguously, to one and only one category' (78). Otherwise reliability suffers. Each task has to be represented in an accurate way too, as in constructed validity.

'Few coding schemes for classroom tasks approximate these ideals' (79). Getting information about teacher intentions or the cognitive operations of pupils is clearly difficult, and cannot be 'read off unproblematically' from what is observed. Contextual information might be important, for example in a revision session, teachers might not explicitly require remembering but will imply it. There is sometimes confusion between psychological operations which pupils perform and the logical status of the information elicited — for example if close questions require recall, an open ones require explanation, the problem is that explanations can also be recalled. Surface forms tell us nothing about psychological processes. We could ask teachers and pupils to comment and indicate their intentions although this 'has rarely been used'. (79)

There is always 'indeterminacy'with things like teacher intentions. They might actually change in response to pupils especially if, as in Doyle and Carter, pupils can try to simplify tasks and make them less risky. Demands are perhaps best seen as negotiated, but if this is so, they are difficult to code because they might change. Actual interaction might be emergent. Teachers themselves might not apply their own categories consistently and might differ from those of the researcher. If we are looking at what pupils do, there is another indeterminacy because they might be answering questions or completing tasks in different ways. It is rare for these matters of construct validity to be even addressed in classroom research.

Tasks might need to be weighted rather than just counted. In examinations, they might attract different marks. Ranking them in terms of different time allocations is more problematic. Oral questions are particularly difficult and problems of weighting have not been typically addressed. Nor do pupils and teachers necessarily see the importance of tasks adequately reflected in their official weight. Ignoring these problems 'is to weight all tasks equally', however (81).

More attention needs to be given to these problems, even if 'effective solutions to most of them' are not available. Nevertheless we need 'systematic investigation of these problems, and of strategies for dealing with them' (82).

back to Hammersley page