Teaching activity: LLM-driven interactive simulation for complex sociotechnical systems

Toolkit: Complex Systems Toolkit.

Author: Dr. Stuart Grey, SFHEA (University of Glasgow).

Topic: Student created interactive simulation for complex sociotechnical systems.

Title: LLM-driven interactive simulation for complex sociotechnical systems.

Resource type: Teaching activity.

Relevant disciplines: Any.

Keywords: Available soon.

Licensing: This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. It is based upon the author’s article “Enhancing Ethical Reasoning in Engineering Education through Student-Created Interactive Ethical Scenarios Using Generative AI,” 2025 IEEE Global Engineering Education Conference (EDUCON), London, United Kingdom, 2025, pp. 1-5, doi: 10.1109/EDUCON62633.2025.11016531. 

Downloads: Available soon.

Related INCOSE Competencies: Toolkit resources are designed to be applicable to any engineering discipline, but educators might find it useful to understand their alignment to competencies outlined by the International Council on Systems Engineering (INCOSE). The INCOSE Competency Framework provides a set of 37 competencies for Systems Engineering within a tailorable framework that provides guidance for practitioners and stakeholders to identify knowledge, skills, abilities and behaviours crucial to Systems Engineering effectiveness.  A free spreadsheet version of the framework can be downloaded.

This resource relates to the Systems Thinking, Life Cycle, Configuration Management, Requirements Definition, Verification, and Validation INCOSE Competencies. 

AHEP mapping: This resource addresses several of the themes from the UK’s Accreditation of Higher Education Programmes fourth edition (AHEP4):  Analytical Tools and Techniques (critical to the ability to model and solve problems), and Integrated / Systems Approach (essential to the solution of broadly-defined problems). In addition, this resource addresses AHEP themes of Ethics and Communication. 

Education level: Intermediate.

 

Learners have the opportunity to: 

  • Debug their simulation through playtesting, documenting issue → fix → retest cycles and demonstrating how changes improve coherence. 
  • Explore trade-offs and justify decisions in ethics (e.g. consequences and equity) and complex systems (e.g. resilience vs cost vs emissions). 
  • Evidence learning with transparent artefacts: initial prompt, changes via tracked changes or before/after snippets, tester feedback, and final prompt. 
  • Reflect critically on validity, bias and the limitations of LLMs as simulators, including how to handle unsafe/poor choices by surfacing realistic consequences. 
  • Communicate findings clearly to technical and nontechnical audiences. 

Teachers have the opportunity to: 

  • Use this as either a studio activity (3–5 sessions) or a compact assessment only task (1–2 sessions), with clear rubrics for each. 
  • Standardise scope by offering a predefined scenario (e.g., Urban Heatwave Response, UK city), or permit student proposed topics. 
  • Scale marking via artefact based evidence (prompt, change log, feedback, final prompt) rather than long reports. 
  • Deliver with institutional Microsoft Copilot licences or any free web LLM; require students to disclose model and version used. 
  • Adapt quickly to different disciplines by swapping the scenario pack (microgrids, water networks, medical device supply chains, etc.). 

 

Overview:

This resource enables engineering students to create, run, and debug a textbased, interactive simulation of a complex sociotechnical system using a Large Language Model (LLM). It is intentionally flexible and may be delivered as a multisession studio activity (including assessment) or used solely as a compact assessment.

  

Purpose and use:

In both modes, students design a robust text prompt, test it with a user, document changes, and submit auditable artefacts that evidence learning. The key activity is interrogating their own thinking on how complex systems should be modelled by making judgements as to how their game does and does not capture the system dynamics. 

 

Why and how: 

The approach aims to give students hands-on experience in putting systems thinking into practice. Concepts such as stakeholders, feedback loops, delays, uncertainty, and emergent behaviour can be implemented and interrogated without heavy tooling.  

The submission is a text LLM prompt with tracked changes, which allows students to demonstrate system design and debugging, produce transparent process evidence, and scale to large cohorts with minimal infrastructure. 

 

Delivery options at a glance:

Audience Undergraduate Years 2–4 and taught MSc, any engineering discipline 
Modes Studio activity (3–5×2 h + independent study) or Assessmentonly (promptonly; 1–2×2 h + 4–6 h)
Teams 3–4 students (solo permitted for assessmentonly) 
Assessment Portfolio (studio) or promptpluschangelog (assessmentonly) 
Platforms Institutional Copilot licences successful; encourage exploration of free tools (students record model/version)

 

Materials and software:

  • LLM access: institutional Microsoft Copilot licences (proven) or any reputable free webbased tool. Students disclose the model and version. 

 

Delivery modes:

Mode A — Studio activity (3–5 sessions) 

  • Session 1: Frame the system — boundary, stakeholders, conflicting goals; sketch a Causal Loop Diagram (CLD) with at least two reinforcing and two balancing loops. 
  • Session 2: Make it playable — define 4–8 state variables and KPIs; draft the prompt (based on Appendix A); specify commands, turn length and stop conditions; add debug controls (`trace`, `why`, `show variables`, `revert`). 
  • Between sessions: Prototype v1 — run 10–15 turns; capture a transcript; log defects (e.g. inconsistent updates, missing delays, moralising responses). 
  • Session 3: Playtest and iterate — exchange prototypes across teams or test with an external user; record issue → fix → retest cycles with evidence (make sure edits are captured in tracked changes). 
  • Session 4: Present and reflect — short demo (6–8 turns); explain how feedback/delays manifest; discuss surprises and limits. 

Mode B — Assessmentonly (promptonly; 1–2 sessions) 

  • Session 1: Brief and rapid scoping — select a scenario (studentchosen or predefined); write a oneparagraph boundary and stakeholders note; draft the initial prompt (based on Appendix A) with role choices, 4–6 state variables, simple commands, and a 12–15 turn cap. 
  • Independent work: Debugging loop — run the prompt; identify faults; edit the prompt (make sure edits are captured in tracked changes); rerun and capture short snippets demonstrating fixes; test with one peer and collect written feedback. 
  • Session 2: Submission — students submit a single document with the initial prompt, change log (before/after snippets), tester feedback, the final prompt, and a short rationale of innovative choices. 

In both modes, module leaders may supply a predefined scenario(s) to standardise scope and simplify marking. A readytouse example is provided in Appendix C. 

 

Assessment:

Studio portfolio — rubric (suggested weighting):

Criterion  D–E 
Complexity modelling  Clear boundary; rich stakeholders; ≥4 correct loops; delays explicit; coherent KPIs  Mostly sound  Basic map  Superficial  25 
Simulation design and prompt quality  Consistent state logic; visible feedbacks/delays; nonlinearity; negative choices allowed with consequences; clear commands  Mostly coherent  Playable but brittle  Confusing/linear  25 
Debugging evidence  Systematic playtests; clear issue → fix → retest artefacts  Some iteration  Minimal  None  20 
Insight and reflection  Deep analysis of emergence, tradeoffs, equity, uncertainty, and LLM limits  Good  Descriptive  Vague  20 
Communication and referencing  Clear, concise, correct Harvard referencing  Minor issues  Adequate  Disorganised  10 

 

Assessment‑only (prompt‑only) — compact rubric: 

  • Novelty and engagement (role‑play/game elements; authentic decision‑making) – 35% 
  • Ability to provoke ethical and/or complex systems trade‑offs – 30% 
  • Debugging quality (change log with before/after snippets; tester feedback) – 25%  
  • Clarity of prompt and rationale; Harvard referencing where used – 10% 

 

Scenario options: 

Students may propose their own topic or the module leader may supply a predefined scenario. Options suited to UK engineering contexts include: 

  • Community microgrid integration — resilience vs affordability vs emissions vs public buyin. 
  • Urban heatwave response — emergency measures vs longterm urban design; uneven impacts. 
  • Water network nitrates — compliance, farm livelihoods, treatment costs, ecological outcomes. 
  • Critical medical device supply chain — redundancy vs cost; equitable allocation. 

 

Appendix A — Prompt template (simulation + debugready): 

Title: Complex Systems Simulator — [Scenario] 

Purpose: Run a turnbased interactive simulation of a complex sociotechnical system. Track named state variables, apply feedback and delays, and let the player’s decisions drive nonlinear outcomes. 

Setup: 

  1) Offer three roles (distinct authority/constraints). 

  2) Introduce 3–5 NPCs with clear goals and plausible interventions. 

  3) Show a dashboard of [STATE_VARIABLES] each turn with short context. 

State rules: 

      • Track only these variables (with units/ranges): [list 5–8]. 
      •  Maintain at least two feedback loops and one delay; keep hidden rule notes consistent across turns. 
      • Each turn: recap; propose 3–5 options (plus freetext); explain updates; show dashboard; request the following action. 
      • Time step: 5 minutes to 1 week; end after 20–30 turns or on stop conditions. 

Commands: status, talk [npc], inspect [asset], implement [policy], pilot [intervention], advance time, review log. 

Debug commands (for testing): trace on/off (print update logic), why (state which loops/delays drove the change), show variables (print current state table), revert (roll back one turn), reseed (slight exogenous shock). 

Realism and ethics: Allow all plausible actions and report consequences neutrally. If unsafe in the real world, refuse, propose safer alternatives, and continue with plausible systemic effects. 

LLM pitfalls to avoid: Do not invent new variables; ask clarifying questions rather than guessing; keep outputs concise; summarise trajectory every five turns. 

Begin: Greet the player, state the scenario, ask for a role, and wait. 

 

Appendix B — Debugging and playtest checklist: 

Functional coherence 

      • Do state variables update consistently with declared logic? 
      • Are delays visible (policy today → trust gradually → adoption later)? 
      • Are reinforcing and balancing feedback identifiable in play? 

Robustness 

      • Does the simulation permit negative choices with realistic consequences? 
      • Do trace/why explanations match outcomes? 
      • Are stop conditions respected? 

User experience and clarity 

      • Are commands clear? Is turn pacing appropriate? 
      • Are dashboards concise and informative? 

Report 

      • Provide three concrete defects with turn numbers, the prompt edits that fixed them, and evidence of the rerun. 

 

Appendix C — Predefined scenario (Urban Heatwave Response, UK city): 

Boundary: One UK local authority area during the July–August heatwave period. Focus on public health, energy demand, and community resilience. 

Roles: (1) Local Authority Resilience Lead; (2) NHS Trust Capacity Manager; (3) Distribution Network Operator (DNO) Duty Engineer. 

Stakeholders: Residents (with a focus on vulnerable groups), care homes, schools, SMEs, DNO, local NHS Trust, emergency services, voluntary/community groups, Met Office (for alerts), and local media. 

State variables (examples): Heathealth alert level (0–4); Emergency Department occupancy (%); Electricity demand/capacity (% of peak); Indoor temperature exceedance hours (hrs > 27 °C); Public trust (0–100); Budget (£); Equity index (0–100). 

Events/shocks: Red heat alert; substation fault; procurement delay; misinformation spike on social media; transport disruption; community centre cooling failure. 

KPIs and stop conditions: Heatrelated admissions; unserved energy; cost variance; equity gap across wards. Stop if alert level 4 persists >3 days, budget overspends >10%, or trust <25. 

Notes for assessors: Using a standard, predefined scenario simplifies marking and ensures comparable complexity across teams, while still allowing for diverse strategies and outcomes. 

 

Any views, thoughts, and opinions expressed herein are solely that of the author(s) and do not necessarily reflect the views, opinions, policies, or position of the Engineering Professors’ Council or the Toolkit sponsors and supporters.  

Subscribe
Notify of
0 Comments
Oldest
Newest Most voted
Inline Feedbacks
View all comments
Let us know what you think of our website