Abstract
Large language models are rapidly changing how
learners acquire and demonstrate cybersecurity skills. However,
when human-AI collaboration is allowed, educators still lack val-
idated competition designs and evaluation practices that remain
fair and evidence-based. This paper presents a cross-regional
study of LLM-centered Capture-the-Flag competitions built on
the Cyber Security Awareness Week competition system. To un-
derstand how autonomy levels and participants' knowledge back-
grounds influence problem-solving performance and learning-
related behaviors, we formalize three autonomy levels: human-in-
the-loop, autonomous agent frameworks, and hybrid. To enable
verification, we require traceable submissions including conver-
sation logs, agent trajectories, and agent code. We analyze multi-
region competition data covering an in-class track, a standard
track, and a year-long expert track, each targeting participants
with different knowledge backgrounds. Using data from the 2025
competition, we compare solve performance across autonomy
levels and challenge categories, and observe that autonomous
agent frameworks and hybrid achieve higher completion rates
on challenges requiring iterative testing and tool interactions.
In the in-class track, we classify participants' agent designs and
find a preference for lightweight, tool-augmented prompting and
reflection-based retries over complex multi-agent architectures.
Our results offer actionable guidance for designing LLM-assisted
cybersecurity competitions as learning technologies, including
autonomy-specific scoring criteria, evidence requirements that
support solution verification, and track structures that improve
accessibility while preserving reliable evaluation and engagement.