Notes on: Byrne, C.  (1976) 'Computerized question banking systems: 1—the state of the art' in British Journal of Educational Technology 7(2): 44-64

Dave Harris

[Apparently, this and a more critical part two consists of a report for a national development programme in computer assisted learning]

It's possible to develop question banks of particularly reliable items which can be indexed according to lists of topics and/or levels of difficulty.  Subdivisions inside the bank are important: topics might be divided into sub topics and so on [this implies single task questions, of course, which is the default position for educational technologists].  Knowledge of distributions of answers might be required too, such as the mean and standard deviation for a given student population.  All this can be conveniently computerized [which needed saying in 1976].

One example can be found in the Educational Testing Service in the USA, which produces national standardized tests and has done so since 1966.  There is complex indexing, including an 'ability - process dimension', which is a 'pragmatic adaptation of the Bloom taxonomy' (47).  Other dimensions include abstract/concrete.  Ultimately, the aim is to make these items matchable to the 'individual school's behavioural objectives' (48). 

Another example is found in the Classroom Teacher Support System for high school teachers in Los Angeles.  There are 8000 multichoice questions in history in the bank.  There is some subjective appraisal, for example of the difficulty of these items.  Routine statistics are gathered for each question based on difficulty, discrimination, the frequencies of student response and 'teacher rejection rate'.  The system can even score tests if the teacher wishes it to do so.  In practice, it tends to be used for frequent class tests. 

A third example is the Comparative Achievement Monitoring system (CAM), also in the U.S..  This bank uses similar classifiers, but makes sure that every question is 'associated with an objective'—the objectives are supplied too to help the teachers select appropriate tests.  Teacher selection solves the problem of matching assessment tasks and principles, but certain functions of assessment are also discussed by the designers, including the extent of discrimination among students, but also links between formative and summative evaluation, the results of student feedback, and the use of assessment to test the effectiveness of courses.  There is a lot of user guidance in this particular bank. Banks in the past have revealed that it is sometimes difficult 'to mesh well with developmental conceptions... and the curriculum' (51) CAM trains teachers and offers an evaluation service.  It also produces diagrams showing connections between scores and questions.

There are some anticipated problems, including a simplification of the forms of student response, and limits to the types of question.  However, one bank, a reading program, offers 55 types of question!  Student feedback is available and is gathered in various kinds of detail.  The computers can help to design tests for example by supplying distractors at random [multi choice tests that is], and can supply, for example, concrete numbers for problems in algebra.  The computer can provide 'incorrect' solutions based on known wrong student strategies (54) [so you can actually penalize incorrect answers, to help students avoid them, of course].  However, setting up banks and running them is still expensive [a lot cheaper now I bet].  They actually require sufficiently rich instructional materials [including lots of confusion and error as well as the right approach?].  Teachers will accept them more readily if they are involved from the beginning.  Different subjects present different difficulties—these devices are very good for elementary arithmetic, for example.  For other subjects, the dangers of trivialization are acknowledged [briefly] (59), although it is claimed that computer generated items are as reliable as the ones made by human instructors [probably right --see below] .

Future developments might include all the routine work being done by computer, while the more subtle material is provided by humans.  For example, computers were already used at the University of Michigan for correcting simple errors of spelling, or calculating readability [on a journalism course] (60).  This saves time and enables a tutor to focus on higher skills.  [There is a lovely example of how issues have to be operationalized so that computers can deal with them, page 60—poor writing is defined in terms of 'sentence length, the number of polysyllabic words, readability, the spelling of key words, names and facts, and the number of clichés…  The variety of sentence structures associated with dull, indirect and verbose writing include the overuse of articles, adjectives and passive verbs'.  Presumably this is assembled as an expert system, and as in all cases like this, I feel rather ambivalent.  It is clearly reductive of the notion of poor or good writing, but I suspect it is less reductive than the working criteria used by ordinary academics when they mark student work—and it is at least explicit].

There is a question bank for trainee medics in England.  The costs of development are difficult to estimate, however.

more education studies