PILOT CASE STUDY · MEDICINE

How the Faculty of Medicine at the University of Bergen uses Lectora for targeted manual scoring across twelve final clinical exam sittings

The Faculty of Medicine at the University of Bergen has now run Lectora across twelve final clinical exam sittings — 889 candidate-exam aggregate pairs and roughly 43,700 item-pair comparisons in total. The pilot answers two questions side by side. First, is the AI's draft close enough to the course teacher's judgment to be worth reviewing? On the published MED12 validation sitting (895 candidates, single six-hour exam), Lectora agreed with the course teacher at R² = 0.81 — closer than the two independent human graders agreed with each other (R² = 0.64). Second, how much manual scoring does the workflow actually save? Across the twelve sittings, the targeted manual scoring loop — stratified calibration, prediction-interval screening, manual review at the pass/fail boundary — cuts examiner workload by 60–75% on a typical cohort while keeping every boundary candidate under human review.