Skip to main content

The chemist’s guide to concept inventories

4 min read

Synthetic chemists prepare chemicals. As a synthetic chemist you have a choice: you can follow the literature procedure to the letter every time and expect to obtain the same product in the same yield with the same purity; whereas the innovator is constantly seeking to improve the yield and purity and turns each new synthesis into an experiment. Chemists can use a weighing balance to definitively determine the yield and a myriad of methods assess the purity. The target thresholds they set may be arbitrary but they know definitively when they have met them.

The characterisation techniques routinely applied to the teacher of chemistry are the module evaluation and the summative assessment.  The module evaluation can be frustratingly uninformative when trying to determine whether any real improvement has taken place, barely altering despite radical overhauls of teaching practice.[1] The summative assessment is a very problematic means of determining the impact of teaching innovation. To what extent is the assessment really measuring the learning objectives? If the teacher sets the assessment how can we eliminate conscious or unconscious bias? How do we separate learning from examination technique? Has the student acquired an enduring conceptual understanding or have they merely crammed some content that will be forgotten once the examination is over?

Ideally we would confidently assess a change between the beginning and end of a module, not in what a student knows but in what they understand. Concept inventories are designed for exactly this purpose. Determining conceptual understanding requires a thorough investigation of the challenges students face and the misconceptions they may harbour. This insight can only be obtained through extensive interviews with representative students. The result of these interviews subsequently informs the construction of a multiple choice question instrument. Subject experts are then consulted to ensure validity, meaning that the questions address the intended concepts. The whole instrument is then subjected to rigorous reliability testing on a large and diverse panel of students.

The development of a concept inventory and its initial application is typically the major part of an education-focused doctoral studentship. As a consequence, the number of such inventories is very limited and development can only take place in countries such as the US, where the national funding councils support discipline-based pedagogical research.

In Chemistry at UEA, we were fortunate to be granted permission to employ a bonding representations inventory[2] developed by Luxford and Bretz. Their permission was contingent on the inventory not being openly published or presented electronically. This reflects the value of these instruments and the harm to their reliability that facile access would cause. The instrument was chosen on the basis that it was the most appropriate for our module. It was made very clear in seeking informed consent, that the role of the test was purely formative for our students and sought to improve teaching. At both the beginning and very near the end of the teaching year test was administered. The answer papers for the first sit were kept in a filing cabinet and not marked until after the second sit. No attempt was made to focus the teaching on the questions in the instrument. There was no opportunity for students to explicitly prepare for either test and no notice provided for them to revise.

It was with some trepidation that we compared the results of the pre- and post-teaching inventory marks! Calculating Hake’s normalized gain metric,[3] we found a positive learning gain of 0.19. Regression analysis revealed that students who did worst first time around saw the greatest improvements. These are already fascinating insights but in the next stage of the project we will see whether they correlate with results from other indicators such as self-efficacy and traditional assessments.

Concept inventories will never become routine instruments of assessment. Not only is their coverage very limited but their costs are prohibitive. Instead their value to projects such as ours lies in benchmarking other more practical proxies of learning gain such as self-efficacy. In making these comparisons we are illuminating the relationship between different facets of learning gain and the impact of teaching strategy.


[1] last accessed 16/12/16

[2] last accessed 16/12/16

[3] last accessed 16/12/16