Progress testing as a pattern of excellence for the assessment of medical students’ knowledge: concepts, history, and perspective

Progress test has been created with the necessity of an assessment method align with problem-based learning. Although it was specifically created to overcome the limitations of traditional assessment for problem-based learning, nowadays is used by different type of curricula. In this paper, we first present the basic assumptions, the history, benefit and challenges of the progress test. Progress test overcomes many limitations of traditional assessment, such as validity and reliability. However, the implementation of progress test is a logistical challenge. In addition, we discuss the limitation of progress test when used as summative assessment, which may not always be aligned with constructivist theory. When adding feedback and methods of analysis that considers multiple testing, progress test is then align with constructivist theory. Finally, the use of progress test’s sub scores may lack validity because of the low number of items; thus, pass/fail decision should not be based on the sub scores, but only on general scores.


INTRODUCTION
Over the past decades, assessment has been transformed during medical training. Assessment mainly focused on knowledge, even when the assessment was made at the bedside. Assessment of attitudes, communication with the patient and another type of skills or attitudes was mainly with an overall grade (mostly heuristic) in which the criteria used was heterogeneous. Then, assessment and feedback focused on students' knowledge of some pathology, disregarding communication skills, teamwork and other competence that are now valued as part of the curriculum and assessment. The inclusion of other aspects of the medical training has led to changes in the curriculum from discipline to knowledge as well as including different types of assessment. The assessment now focuses on competence, though the assessment of knowledge is still crucial. Although several new methods of assessment have been proposed (from knowledge to competence or entrustable professional activity), in this article, we will discuss the importance of knowledge assessment and the usage of different assessment methods of knowledge.
Knowledge acquisition and assessment are perhaps the most crucial aspects of medical training since knowledge is based on all competence. For example, in CanMeds, the Canadian framework of competence, medical expertise is central and connected with all the other competencies 1 . Despite the central position, a vast base of knowledge does not necessarily imply in being competent. However, it is impossible to be competent without the necessary knowledge. This is the reason that after the shift from knowledge-to competencybased education knowledge is still a crucial aspect of learning of students, residents and graduates.
Generally speaking, undergraduate medical training can be divided into the preclinical and clinical phase. Whereas the preclinical phase mainly focuses on knowledge acquisition, the clinical phase students develop their knowledge application, skills and attitudes in inpatient care. In theory, students should acquire knowledge before they were able to apply it. Bloom developed a taxonomy for knowledge acquisition in a hierarchical structure with six levels in which mastery of the lower levels is necessary to achieve the higher levels [2][3][4] . At the first level, students need to remember knowledge, followed by a minimal understanding of the knowledge. These two levels are considered as lower levels of cognitive processing 5 . At the third level, students should be able to apply their knowledge, which refers to using the knowledge in a new situation. This level is considered as transitory for some authors 5 or as high levels of cognitive processing for others 6 . Subsequently, students should be able to synthesise, evaluate and create new knowledge, which is considered high levels of cognitive processing 7 . Undergraduate medical training seems to align with the development of lower and higher cognitive processing. In a study using the Dutch progress test, Cecilio-Fernandes et al. demonstrated that students in the preclinical phase correctly answered more questions that require lower levels of cognitive processing. In contrast, in the clinical phase, students correctly answered more questions that requires lower levels of cognitive processing 8 .
Despite the difference found in preclinical and clinical students, the literature recommends using questions that require higher levels of cognitive processing for basic and factual knowledge 9,10 . This recommendation is based on the fact that questions that require higher levels of cognitive processing support the consolidation of knowledge acquisition since students also must remember basic and factual knowledge to answer those questions. For example, Jensen et al. demonstrated that students who only answered questions that required higher levels of cognitive processing performed better in a retention test with both types of question than students who only answered questions that required lower levels of cognitive processing 10 .
Although there are many different methods for knowledge assessment, we usually see a predominance on closed-or open-ended questions. The use of closed-open questions, especially multiple-choice questions, is related to the perception that those questions are fairer, objective, easier to apply and grade for a large number of students. However, a multiple-choice question is as subjective as the open-ended question or case presentation, especially when one teacher writes the test. The subjective of multiple-choice questions happens because the test produced by one teacher is restricted to his/her experience, writing and way of thinking, which brings a limitation to the validity of the test. Validity refers to whether the test is assessing what is supposed to assess 11 . Furthermore, the reliability, which is related to the amount of error and how accurate we can replicate the same results 11 , is often low because of the low number of questions and students answering the test. Interestingly, the validity and the reliability of both the multiple-choice question and more subjective methods of assessment are similar. Swanson (1987) compared the reliability of different methods of assessment, from multiplechoice questions to oral presentation 12 . The results demonstrated a relation between the size of the reliability and the time to take the test and the number of questions. For example, a test with 200 multiple-choice questions that takes around four hours has the same reliability of oral presentation that takes four hours 12 . Based on the relation between reliability and time of the test we argue that other methods, such as exercise list, group work, case study, and other methods should be incorporated in students' knowledge assessment. Adding different methods of knowledge assessment helps to include different aspects of students' knowledge by looking at the different lens and fairer than when looking at only one type of assessment. A case study is probably more adequate to measure higher levels of cognitive processing than a multiple-choice question, which decreases the number of information leading to the reductionism of real patient cases.
Progress test is a longitudinal assessment that measures students' knowledge at the end level. Through first to last year of medical training, all students answer the same test and receive feedback on their performance. The progress test is based on a blueprint that is different in each context with questions requiring both a lower and a higher level of cognitive processing. The use of a progress test may solve the problem of validity and reliability, and often has quality control of the questions in which all questions are discussed and revised. Furthermore, questions of the progress test are often written by different teachers in different universities, which in turns adds different perspectives and ranges of knowledge. Because of all those reasons, we argue that all test with multiple-choice questions throughout undergraduate medical training should be replaced by the progress test, which in turns would open time in the curriculum, allowing other methods of assessment.
Replacing the multiple-choice questions by the progress test is not an easy task, especially because it is necessary to rethink our current system of assessment, moving from an assessment based on the teacher to the curriculum. Before discussing the system of assessment for knowledge, we will present an overview of the vantages and disadvantages of the progress test.

History of the Progress Test
The progress test was created to answer a need for an integrated knowledge assessment due to the implementation of the problem-based curriculum 13 . In this new curriculum, the traditional methods of assessment based on discipline did not align with this new teaching philosophy. The progress test started in the United States of America and soon after the Netherlands, at the University of Missouri and Maastricht 14 , respectively, though initially named Quarterly Profile Examination at the United States of America. Subsequently, McMaster University, in Canada, also adopted the progress test because of the implementation of a new problem-based curriculum.
Besides the necessity of aligning the assessment with the implementation of a new curriculum, a progress test has also been used to compare knowledge acquisition between traditional and modern curricula. This comparison occurs because of an assumption that problem-based learning would be less effective for knowledge acquisition. Although studies have indicated that this was true in the first two years, at the end of undergraduate training, students from both curricula had similar levels of knowledge 14,15 .
After 30 years of the progress test creation, it is used in medical schools worldwide in all continents, except Antarctica, which has no medical school. The progress test has been altered since the beginning. Initially, in the Netherlands, the progress test consisted of 250 true and false questions with a question mark option in which students were allowed to mark that they did not know. In this case, when students answered a question incorrectly, he/she received a negative mark. When students answered a question correctly, he received a positive mark. And when students marked the "I do not know" option, he/she received neither a negative nor a positive mark 14 . Currently, the Dutch progress test consists of 200 multiple-choice questions with a different number of alternatives. Although the "I do not know" option is still there with a negative mark, the marking system has changed from one point to a proportional system. For example, if a question has four alternatives, the negative mark will be -.25. The use of the "I do not know" option was based on two assumptions. The first assumption is based on psychometric and consists of increasing the reliability of the progress test for students in the first two years 16 . However, when the Item Response Theory is used to calculate reliability, this advantage vanishes 17 . The second Progress Testing as a pattern of excellence assumption, an educational one, is based on the fact that students should recognize what they do not know. Although this is something important to teach, this effect seems to happen only at the beginning of medical schools, but more advanced students answer all questions indiscriminately. In that sense, it seems that students use the best strategies to answer questions of progress test 8 . Also, the "I do not know" option may add more bias to the progress test, such as the difference of gender. Females tend to mark more the "I do not know" option than males, which compromises the validity of the test 18 . Finally, there are currently researches investigating whether the progress test could be applied in the format of the computerised adaptative test, instead of the traditional penpaper format. The computerized adaptative test is an algorithm that chooses the next questions based on student previous performance (for more information, please see Collares and Cecilio-Fernandes 19 ; Cecilio-Fernandes 20 ).
Most of the progress tests worldwide consist of multiple-choice questions without the "I do not know" option, ranging from 125 to 200 questions. The high number of questions is because the blueprint is extensive, covering all the content of a course of medicine. Furthermore, the number of progress tests ranges from one to four times a year. Although the progress test always has a formative function, it has also been used as summative, making a pass/fail decision on students' scores on the progress test.

Progress test in Brazil
In Brazil, the use of progress tests started in the late 1990s by the Faculty of Medicine of the University of São Paulo and the State University of Londrina. In 2005, other medical schools, especially in the state of São Paulo, founded the first consortium for the application of progress tests from the elaboration of the test to data analyses. This consortium (NIEPAEM -Núcleo Interinstitucional de Estudos e Práticas de Avaliação em Educação Médica) has ten medical schools and serve as a base for the creation of another consortium in Brazil. In 2015, the Brazilian Medical Education Association (Associação Brasileira de Educação Médica) initiated a project of implementation of the progress test in several medical schools. Also, Brazil had the first national progress test with 58 medical schools and more than 22.000,00 students 21 . Nowadays, there are around 12 consortia in Brazil that use the progress test regularly. Overall, those consortia applied one progress test a year consisting of 120 multiple-choice questions divided among Clinic, Pediatrics, Surgery, Tocogynecology and Public Health. Basic sciences are part of the blueprint of some consortia, but not all. Most of the progress tests are formative, using the outcomes only for feedback to students and medical schools, but not for a pass/fail decision.

Progress test for the student
Since the progress test is a longitudinal measurement, it offers different opportunities for feedback to students and medical schools compared to traditional tests. First, we will discuss the feedback for students and then for medical schools. Considering only one progress test, students receive feedback in different fields of knowledge, usually based on the blueprint, and the gap between his/her knowledge and a six-year student. A higher number of progress tests allows, besides other forms of feedback, to verify whether students' knowledge is growing as expected and identify students who need remediation. For example, the progress test can be classified as a traffic light, green, yellow and red. A retrospective study demonstrated that students who had a green light in all progress tests performed better in summative tests than students who only had one yellow or red light 22 . Furthermore, there is a positive and significant correlation between the score of the progress test and the entry test for residency training 23 . Also, students can compare their performance with the average of their class and consortium for both total scores and per field of knowledge. The comparisons allow students to visualize in which moment they are and understand how they compare with their colleagues. In that sense, students can use their progress tests' scores to identify their knowledge gap as well as predict how they would perform in a summative test.
The progress test could also reduce students' anxiety when facing a test since a poor performance will not influence a series of good performance 24,25 . Blake et al. demonstrated that students perceived stress is only 27% moderate to high, 39% limited, and 39% with little stress 26 . Another advantage may be the limitation of the end of the semester or year tests since it is also logistically complicated. Furthermore, the end of semester/year test leads students to only study for this test, since it is only necessary to study for this pass/fail test. In that sense, the progress test may also avoid using the strategies that hamper learning. For example, when the test is only on one discipline, students can study all content the day before. Several studies have demonstrated that this a good strategy for the test, but not for long-term retention 27 . This strategy is not possible with the progress test because of the content leading students to study regularly.

Progress test for the medical school
In addition to the advantages for students, progress test also offers many vantages for the medical school. Using the progress test, medical schools can follow disciplines or blocks that need improvement, such as when a cohort of student lacks improvement in subsequent progress tests after having a discipline between the tests. Furthermore, it is possible to validate the curriculum by comparing questions of certain fields with the disciplines. For example, it is expected that students who have certain content improve their scores related to that content. Cecilio-Fernandes et al. compared students who were taught oncology in a block or spaced throughout the bachelor 28 . Interestingly knowledge acquisition had differed between both groups; whereas students in the spaced group acquire knowledge throughout the bachelor, students in the block group had a step acquisition after the block. Despite the lack of use in a different context, progress test can also be used to understand the impact of different teaching strategies in students' knowledge acquisition. Another study from Cecilio-Fernandes et al. compared the teaching strategies of oncology in four different medical schools 29 . They found that teaching oncology in a block format instead of spaced throughout the bachelor and offering a refreshing course before the clinical phase has benefit knowledge retention at the end of the course. Another line of research has been investigating traditional with more modern curricula 30,31 or assessing knowledge at the end of a specific discipline 32 . Finally, the progress test can also be used to investigate curriculum changes, which is extremely difficult to do without the progress test. In that sense, if there is a major change in a discipline or block, it is possible to assess it is impact, positive or negative, on students' knowledge using the progress test 32 .

Challenges and limitations of the progress test
Although there are many advantages to using the progress test, there are also many limitations and challenges to overcome. Organizing a progress test is a labor-intense and challenging task, considering that all students from all the institutions of the consortium have to sit the test on the same day and time. The organizational process implies having a schedule that fits all institutional agendas, booking several classrooms that fit the students of all years. Moreover, printing the test and the answer sheet should not be considered trivial, especially considering the leak of questions. Leaking of questions is particularly difficult because not all teachers in every institution are committed to the progress test. Furthermore, the quality of questions may vary largely between institution, which makes difficult to have an equal proportion of question per medical schools.
Fundamentally the progress test has two major challenges. The first challenge is related to aligned educational principles, especially the constructivist alignment. Constructivist alignment verifies the alignment between learning and teaching activities and assessment. A good alignment of progress tests will depend on how the outcomes are used and if the progress test provides feedback, independently of being summative or formative. Also, it is important to follow how reliable the progress test is and whether it measures the expected level of knowledge. When a progress test is only summative, without feedback and a proper method that considers measurement errors, reliability of the different tests and the uncertainty of the test, the alignment is considered poor 33 .
Progress Testing as a pattern of excellence The second challenge relates to the blueprint and decision making. When the blueprint is too large with a high item specificity, the number of items will not suffice to provide a reliable and valid measurement of the subscores; considering that many contents on the blueprint require only one or two items which are not sufficient for making a proper decision on students' performance on that topic 34 . It is recommended that the decision about students considers the overall score, or at least, areas that have many items, depending on the reliability of each area. Therefore, it is not recommended to make a pass/fail decision for a specific discipline or block.

Changing the system of assessment
Traditionally, teachers of each discipline are responsible for writing questions for the test, grading, feedback and students' pass/fail decision. In other cases, when the curriculum is divided into blocks, there is an integration of different fields and teachers. In that case, the responsible for the test is a coordinator who is responsible for asking questions to the teachers, making sure all the content is covered on the test, grading and making students' pass/fail decision. Feedback, when it happens, usually is the duty of the teachers of the respective field. Noteworthy, the application of a knowledge test, is not a simple task. Add to that the lack of a minimum of psychometric standards such as validity and reliability. Also, many teachers repeat questions of the previous year, decreasing, even more, the reliability of the test. Finally, there is a lack of quality control to the test.
One way of overcoming those limitations is by centralizing knowledge assessment 35 , which implies that all knowledge assessment would be under a central agency responsible for organizing, revising and taking care of the logistic part of all knowledge assessment. Teachers would be responsible for writing items, while this central agency would be responsible for assuring that the blueprint is covered with the test, revise the items, grade and provide an opportunity for feedback. Also, the central agency would be responsible for students to pass/fail decision, which ideally would consider all students assessment.
Philosophically speaking, this central agency could distend the teacher-student relationship, since in the traditional system students often think the teacher responsible for the pass/fail decision. Looking from the students' perspective, this adds a bias at the relationship evoking the idea that decisions are based on a personal matter. When there is a central agency, the teacher starts to be a figure that is only worried about students' learning, because he/she is not responsible for the pass/fail decision. Moreover, the centralised system seems fairer from the students' perspective. This system allows implementing quality control of all tests, such as a committee that review all questions, the alignment between what is taught and tested, analyse the quality of the questions and tests, and create an item bank that allows to check whether and when a question was used. As the centralised systems take out teachers' responsibility of decision making on students, of assembling a test, and of scheduling time on their disciplines, teachers now have more time that could be used to implement other methods of assessment that are more aligned with their teaching.
Usually, the consortium of a progress test has all the elements of a central agency, which consists of reviewing committee, teachers from multiple institutions writing questions, analyse, both qualitative and psychometric, and an item bank. It sounds logical to use the progress test as the only type of multiple-choice questions for all students. However, it would be necessary to have between three and four progress tests a year to give reliable information on students' knowledge 36 . Also, having more progress tests with sufficient data points would provide a more reliable, fair and valid assessment of students' knowledge. The progress test, combined with different methods of assessment, would give us more meaningful information about students' knowledge acquisition than mainly focusing on the end of semester/year tests. There is an abundance of evidence of how progress test is a more reliable and better predictor of summative and residence entry test than single alone assessment. Therefore, if not replacing all multiple-choice questions to progress test, we should at least consider having more progress test and using as a summative assessment without losing feedback for students and medical schools.

CONCLUSIONS
Progress test offers a framework of testing with the highest standards of psychometrics, including validity, reliability and standardization. Also, progress test offers different opportunities of feedback, both for students and medical schools. Although the progress test should not be the only source of knowledge assessment, it provides important features that lack in the traditional assessment. Finally, progress tests overcome may of the pitfalls of traditional assessment and decrease the tension between teacher and students by centralizing the pass/fail decision.