Crux/Blog/Higher Ed · Playbook
Higher Ed · Playbook

How to prevent AI cheating in online exams: a practical guide for universities.

A 90-day rollout. Threat model, architectural controls, faculty change management, and what to actually tell students.

TL;DR

  • The four cheating modes that matter — chatbot text generation, real-time dictation through earpieces, paid impersonation, and autonomous AI agents — each defeat different controls. A single layer addresses none of them completely.
  • Architecture beats policy. A good academic-integrity policy without device-level controls is a sign on the door of an unlocked house.
  • The 90-day rollout: weeks 1–4 scope a single faculty pilot; weeks 5–8 run a low-stakes diagnostic exam; weeks 9–12 run one real high-stakes exam end-to-end and debrief.
  • The hardest part is faculty change management, not technology. Lecturers who feel surveilled or de-skilled will sandbag the rollout. Bringing a small group in early as design partners is the move.

The threat model: four cheating modes

Before talking about controls, name the modes. Universities defend against four distinct AI-era cheating channels, each with different mechanics:

1. Chatbot text generation. Student types the question into ChatGPT, Claude, or Gemini and pastes the answer. Defeated by: lock-screen that prevents access to AI tools at OS level. Weakest against: take-home essays and asynchronous online assessments where the device isn't sandboxed.

2. Real-time dictation through hidden audio. Student wears a bone-conduction earpiece or hidden bluetooth, dictates the question to an offsite collaborator (or to a phone running an AI agent), receives the answer back through audio, and types or transcribes it. Defeated by: continuous audio analysis (multiple voices, anomalous input), dual-camera workspace coverage. Weakest against: well-equipped, well-rehearsed candidates.

3. Paid impersonation. Student pays a graduate or third-party service to take the exam in their place. Defeated by: continuous identity verification, not one-time check at start. Weakest against: institutions relying on a single facial check at minute one.

4. Autonomous AI agents. Student installs an AI agent that watches the screen, reads the questions, and types answers via simulated input — without the student typing anything. Defeated by: hardened lock-screen that blocks third-party automation tools, sandboxed input handling. Weakest against: institutions relying on browser-only proctoring tools.

A serious AI-cheating prevention strategy addresses all four. A weak one addresses one or two and quietly hopes the others aren't being used at scale. They are.

Architecture beats policy

Most universities have an academic-integrity policy that updates every few years and now includes a paragraph about AI. The policy is necessary. It is also not sufficient.

A policy is a sign on the door. The architecture is whether the door is locked. Students who use AI in violation of policy do so because the cost of doing it (low) is below the benefit (high), and they correctly perceive the probability of being caught (also low) as the binding constraint.

Lowering the probability of cheating works at two levers: raise the cost (architectural controls that make cheating physically harder) and raise the probability of being caught (monitoring that produces defensible evidence when cheating happens despite the controls). Both levers are technical, not policy.

This doesn't mean throw out the policy. The policy provides the lawful basis for the technical controls and the disciplinary procedure. It just means: the policy is the supporting framework, not the load-bearing wall.

The architectural controls

For a university operating remote or hybrid online exams, the controls that actually prevent the four cheating modes above are these:

OS-level lock-screen

The student's device runs the exam in a hardened native shell that blocks process-switching, third-party apps, screen mirroring, paste from outside, and automation frameworks. Browser-only solutions don't qualify; they leave the OS exposed. The lock has to work on the operating systems your students actually use — at SA institutions, that's macOS, Windows, and increasingly Android tablets, with iPadOS coming.

Continuous identity verification

Not one face check. Repeated face checks throughout the exam, on every break, on every reconnect. The mechanism doesn't have to be intrusive — a brief check every 60–90 seconds is sufficient to defeat most impersonation scenarios — but it has to be continuous, not one-shot.

Dual-camera coverage

The device camera doesn't see the workspace. Adding a second camera — typically a phone running a side-camera app, mounted on a clip — covers face, hands, screen, and tabletop. The geometry alone defeats most physical cheating channels (hidden notes, second devices, in-room coaching).

On-device behavioural flagging

Models running on the student's device (not streaming to a vendor server) flag anomalies in real time: second voices, multiple faces, prolonged off-screen gaze, atypical typing patterns. The flags are packaged with timestamped clips, not hours of streamed video. Reviewer load drops by 1–2 orders of magnitude.

Writing-pattern analysis

For written-answer assessments, the platform builds a per-student writing profile from earlier work and flags submissions that diverge sharply. This is consistency analysis, not AI-text detection — different mechanism, different failure mode, far higher signal-to-noise ratio. (See our piece on why AI-text detection is the wrong frame.)

The 90-day rollout

For a Vice-Chancellor or Dean reading this and trying to sequence the work, here's a workable 90-day plan.

Weeks 1–4: scope a single-faculty pilot

Choose one faculty — ideally with a sympathetic Dean and a manageable cohort (500–2,000 students). Define the scope: one exam window, one or two modules, mixed question types if possible. Identify the lecturers who will be design partners, not just users. Identify the IT staff who will own integration. Procure devices if needed (often shared tablets for in-faculty exam writing). Sign the operator agreement, run the DPIA, communicate the pilot to the Senate or equivalent.

Weeks 5–8: low-stakes diagnostic

Run one practice exam — a low-stakes formative assessment — using the new platform end-to-end. The goals are operational: does the lock-screen work on student devices, does identity verification handle the cohort, does the network handle the load, does the evidence package correctly, does the student support pathway hold up under load? Capture every bug, every UX friction, every staff complaint. Iterate. Bring lecturers into the post-mortem; their concerns will surface controls you missed.

Weeks 9–12: real high-stakes exam

Run a real exam — a final, a mid-term, a controlled assessment. This is the validation. Brief students explicitly: what's monitored, what's not, what happens if a flag fires, what their appeal rights are. Communicate the policy update before the exam, not after. After the exam, run a structured debrief: students, lecturers, IT, registrar's office. Document everything. The debrief becomes the artefact you take to the rest of the institution.

By day 90, you have one faculty running real online exams with prevention-level controls, an internal case study, a list of bugs and improvements, and the political capital to expand. By day 180, you have a second faculty.

Faculty change management

The hardest part of an AI-cheating prevention rollout is not the software. It's the lecturers.

Lecturers come into a rollout with a mix of concerns: that the platform will surveil them, not just students; that it will de-skill their assessment work; that it will flag honest students at unacceptable rates; that the disciplinary follow-up will fall on them; that it will make their teaching feel like an airport security line.

These concerns are legitimate. The institutions that handle the rollout well do three things:

First, they bring a small cohort of lecturers in as design partners early — before the procurement is locked, before the rollout plan is published. Lecturers who shape the system feel ownership; lecturers who are handed it feel surveilled.

Second, they communicate what the system does not do clearly. It does not record the lecturer. It does not stream classroom video to a vendor. It does not replace professional judgment in disciplinary findings. It does not flag students automatically without human review.

Third, they invest in reducing the disciplinary follow-up load, not just shifting it. On-device flagging that produces 5 high-confidence segments instead of 500 low-confidence ones is the lecturer-friendly architecture. A platform that floods the academic-integrity committee with noise will get sandbagged within one term.

Lecturers who shape the system feel ownership; lecturers who are handed it feel surveilled. The same architecture, deployed two ways, produces opposite cultures.

What to actually tell students

Student communication is where most rollouts undercook. A short, plain-language briefing — written, video, or both — sent before the first proctored exam, covers:

Done well, the briefing actually increases trust in the system, including among honest students who might otherwise resent the surveillance. Done poorly, it creates a movement.

Conclusion

Preventing AI cheating in online exams is a tractable problem in 2026 — but only if the institution treats it as architecture, not policy. The controls exist. The rollout is sequencing, not invention. The change management is the hard part, and the institutions that get that right move quickly and at scale.

The institutions that are still searching for the perfect AI-text detector, or relying on policy revision alone, are losing ground every term to the institutions that have already moved to prevention. The credential value of the degree depends on which side of that line your institution lands on. There isn't a third option.

Crux

Pilot AI-cheating prevention in one faculty.

Crux runs a 90-day pilot programme designed for university change-management. One faculty, one exam window, one debrief — then expand.

Request a demo