Trevor Fisher writes:

The OFQUAL proposals released on April 3rd (Guardian report 2nd April) intensify fundamental and unresolved problems with GCSE (and A Level) reform. The headline reports picked up on international benchmarking, though the proposal to reinstate APU style sampling on the old pattern is a more useful feature and not controversial. Other issues are highly controversial and it is a sign of the times that OFQUAL is staging a ‘conversation’ – not a consultation – to end June 31st. It would be cynical to argue this will have no impact. Much is tied into decisions already made, but it is clear OFQUAL is uncertain about much it is doing so the activity is worth taking seriously.

However having read the document, scutinising key aspects, it is not possible to be sure how well grounded their proposals are. This note is thus partly about how to interpret the OFQUAL document. A “Conversation on how standards should be set for new GCSEs” does not display total confidence about what they are proposing, though on the key issues they accept the government line. Indeed, the fact that OFQUAL accepted DFE money to do the international benchmarking is itself problematical. OFQUAL is designed to be independent, not a paid consultancy of the DFE.

However this is not the immediate question. The problems arise from two issues – what system to use in setting grade boundaries, and how they can be internationally benchmarked. Though linked, they are best seen separately

Setting Grade Boundaries.

Either a norm referenced system or a criterion referenced system is possible, fixed percentages or basing grade on grade criteria or descriptors. Neither pose real problems for teachers, though the difference is not clear to employers and HE. Norm referencing avoids the question of standards. Criterion referencing while more focussed on standards relies on people knowing what the descriptors mean. The document is at its strongest when arguing grade descriptors are unreliable. (e.g. paras 2.15- 2.19). However the document then seems to argue for a mixture, which is really problematical. Note that the system will run from grade 1 to grade 9, which is immensely confusing (and designed only to distinguish ‘old’ GCSE from ‘new’ GCSE).

For the crucial equivalent to C grade, which teachers are familiar with and can teach to, the system is to be norm referenced. Thus in notes to editors (executive summary), the document states “We are proposing that the same proportion of candidates will achieve the bottom of the grade 4 as currently achieve the bottom of the C grade”. The standard for a grade 5 will be set in line with the performance of students from the higher performing countries in international tests” (only the PISA study is actually considered, a problem for later consideration)

OFQUAL are trying to keep teachers on board with the old ‘pass’ grade (theoretically tied to the old O Level Pass) while keeping with government/Gove edicts to make the exams more demanding. Much to discuss here, but the immediate issue is the mix of norm and criterion referencing. Thus “the same proportion of candidates should get 7 or above as get A or above”, thus two grades – 5 and 6 float -, with 5 tied to international benchmarks, very problematically, and “a smaller proportion of candidates should be awarded the 9 than currently get the A*. Thus A* is now 8, but how it is fixed is not determined proportionally so it has to be criterion referenced to be logically awarded. 9 becomes A** in the old GCSE format.

International Benchmarking

The real elephant in the room comes with grade 5 which is to be internationally benchmarked. OFQUAL states that this will be done thus:

“3.16 We propose that the standard… required for a grade 5 should be at about that implied by the international statistics,…. half to two thirds of a grade higher than that required for a current grade C.

“3.17 We have collected and reviewed performance descriptors (ie grade descriptors, or criterion referencing: the terms refer to the same approach) in those countries (see (b) below) whose students tend to perform well in international tests (though only PISA is referred to in the document). We will use these descriptors to help us check whether or not our expectations about the standard required for a grade 5 is correct and inform users accordingly”.

Apart from the issue of whether this makes any sense without piloting and actual student performance, even if outside actual summer exam conditions, this is illogical. If performance descriptors are regarded as unreliable and impossible to operationalise, as stated in paras 2.15 to 2.19, then I cannot see how they become reliable just because used abroad. PISA is not relevant for benchmarking, grade descriptors alone cannot be. Actual papers, questions, examiners reports and actual work done by students under exam conditions would be needed.

If OFQUAL is to rely only on grade descriptors, then their own argument suggests this is unreliable. I cannot see how they can do international benchmarking, on this basis, and thus the new harder pass grade 5 must be dubious in the extreme.

These are some of the key issues to be discussed in the Conversation which is about to begin.

Wider issues.

It remains the case that no overwhelming case for the reforms has been made. The failure to address the Oxford Report on GCSEs, which undermined the case for reform, is deeply worrying. For the OFQUAL proposals, the following are major concerns.

(a) How is the conversation to be monitored and results taken on board. More pertinently, how have the conversation questions been devised? The only references immediately available are firstly Stacey stating in the introduction “to implement this policy we have met with groups of teachers and school leaders, spoken with exam boards and other assessment experts, considered how things work in some other (high performing) countries* and surveyed employers as well to inform our thinking”. This is all.

The actual discussions must be made available and analysed. The use of the incorrect term is worrying. It is one of the major arguments over the OECD tests whether they are comparing like with like.

(b) *The jurisdictions normally referred to are small cities with affluent populations, notably Shanghai, Singapore and Massacheussetts- making comparing like with like a crucial issue. A medium sized country like England cannot easily be compared to cities particularly the ones noted, which are untypical across the world.