COMP90083 Lecture 2 - Nathan's Vault

## Introduction to Computational Modelling and Simulation: Week 2 This document provides a comprehensive and exhaustive explanation of the second lecture on computational modelling and simulation. It expands upon the concepts introduced in the lecture, ensuring that every detail is explained in a self-contained and causally linked manner. The goal is to build a complete mental model of the topics discussed, requiring no prior knowledge beyond fundamental reasoning. ### Administrative and Housekeeping Matters Before delving into the core academic content, several administrative points were addressed to ensure students are fully informed about course logistics, resources, and assessment requirements. #### Course Resources and Project Release The lecture commenced with an important update regarding a previously unavailable resource. The recording of the lecture from the preceding Monday, which had been temporarily lost, has been successfully recovered. This recording is now accessible through the university's learning management system, located within the "Modules" section and also directly linked under the "Lecture Capture" area. This ensures that all students have the opportunity to review the material from that session. Furthermore, the specifications for the first major assessment, referred to as "the project" or "the first assignment," have been officially released. Students can now access the detailed project description. A critical detail is the submission deadline: the project is due in **Week 5, specifically on Sunday, the 31st of the month, at midnight**. The primary objective of this first project is not to have students invent an entirely new computational model from scratch. Instead, the focus is on skill development. Students will be provided with an existing model or a pre-defined environment. Their task is to extend this existing system. The evaluation will not be based on the complexity or sophistication of the extension itself. Rather, the assessment will focus on the student's process and communication. This includes demonstrating a clear thought process for how a new behavior is conceptualized and developed, the ability to explain the model's purpose and structure in a concise and well-organized manner, and the capacity to design and present basic experiments that validate the new behavior and demonstrate that it functions as intended. The project specification document contains some reference materials, but students are strongly encouraged to seek out additional sources to inform their work. The document also includes further suggestions to guide the development process. #### Policies and Submission Guidelines The lecture revisited the subject's policy on the use of generative Artificial Intelligence (AI) tools. The use of these tools is permitted but is subject to specific, clearly defined restrictions. Students may use generative AI for activities such as brainstorming ideas, identifying relevant academic sources, and for editing their written work to improve clarity and grammar. However, there are strict prohibitions against simply copying and pasting large blocks of AI-generated text directly into the report. Similarly, students cannot use AI to generate the entirety of their programming code. The use of AI for smaller, specific tasks, such as fixing bugs or refining small code snippets, is permissible. To ensure academic integrity and transparency, any use of generative AI must be formally declared. At the end of the submitted report, students must include a declaration section. This section does not contribute to the final word count of the report, so it can be as long as necessary. Within this declaration, students are required to provide the exact prompts they used to interact with the AI tool. This allows instructors to understand how the tool was used as a supplement to the student's own work. Students are also directed to read the official university policy on generative AI tools and technologies for a complete understanding of their responsibilities. A reminder was also issued regarding the policy for late submissions. While an automated system for requesting an extension is available, it is preferred that students first communicate their intention to request one to the lecturer. It is crucial to understand that extensions are granted only for valid, substantiated reasons. These typically include medical issues or family emergencies, for which documentary evidence must be provided. An extension will not be granted for reasons such as personal travel or holidays, as has unfortunately been requested in the past. The submission format for the assignment is highly specific. Students must upload **three separate documents**: 1. **The Report:** The main written document detailing the project. 2. **The ODD Document:** A specific type of model description that will be the main topic of this lecture. 3. **The Code:** The source code for the model implementation. It is imperative that these three components are not bundled into a single file (e.g., a zip archive). Failure to submit them as separate files will result in a **two-mark deduction**. The rationale behind this strict rule is practical: separate files make it significantly easier for the teaching staff to use automated systems for checking plagiarism and academic integrity. #### Report and Document Specifications The assignment specification provides a detailed breakdown of the marks allocated to each section of the project, clarifying what the assessors are looking for in each part. A strong emphasis is placed on the quality of presentation. Students are expected to include high-quality, clearly labeled tables and figures, ensuring the final report is as professional and polished as possible. A recommended structure for the report is provided, outlining key sections and offering a rough guideline for their length (e.g., approximately 350 words for certain sections). This is intended as a "ballpark" figure to guide students, not a rigid requirement. However, the overall length of the document is monitored. Reports that are excessively long will be penalized, so conciseness is valued. To facilitate this, students must include a word count in their report. For creating the report, the use of **LaTeX** is highly recommended. A specific LaTeX template is provided for students to use, which is available on the collaborative online platform **Overleaf**. The link to this template is included in the project description. The template is based on a simple journal article format. The reason for recommending a specific template is to foster consistency across submissions, which helps the teaching staff to navigate and assess the work more efficiently. While the template may include fields like "Corresponding author" and "Abstract," these are not necessary for the assignment. However, it is mandatory that each student includes their **full name and student ID** clearly on the document. Finally, the specific length requirements for the documents were clarified. The main **model report** should be approximately **1,200 words** in length. (Note: The lecturer misspoke and said "12,000 words," but the context of the report structure and the length of the other document make it clear that 1,200 is the intended figure). The **ODD description document** should be about **three pages**, which corresponds to roughly **1,000 words**. ### The Importance of Describing Models With the administrative details covered, the lecture transitioned to its primary topic: the standardized description of computational models. The central question addressed is how we can communicate the intricate details of a model in a way that is clear, unambiguous, and useful to a diverse audience. This audience could include policymakers who need to understand the model's implications, scientific colleagues who wish to replicate the work, or a future team member who must take over the project. The importance of good documentation is highlighted through an analogy to computer programming. A piece of code with no documentation or comments is often referred to as "spaghetti code"—a tangled, incomprehensible mess that is incredibly difficult for anyone, including the original author, to work with or modify. Just as commenting and documentation are vital for software engineering, a structured description is vital for computational modeling. To address this need, the lecture introduces a standardized methodology known as the **Overview, Design Concepts, and Details (ODD) protocol**. This protocol is a framework specifically designed to describe computational models, particularly agent-based models, in a structured and comprehensive way. The goal for the coming week is for students to not only understand the components of the ODD protocol but also to become proficient in its application. This will involve practicing how to write an ODD for an existing model during tutorials, and ultimately, how to create a new ODD for the model they develop in their first assignment. ### Recap of Core Modelling Principles Before introducing the ODD protocol in detail, the lecture included a short, interactive quiz to reinforce key concepts from the previous week. These questions serve to check understanding and re-emphasize foundational principles of the modelling process. #### On the Virtue of Simplicity in Models The first question asked for the reasons why simple models are generally preferred over complex ones. The correct answer identified two primary reasons: 1. **They are easier to implement and understand.** A model with fewer components and simpler interactions has less code to write, debug, and decipher. This clarity makes it easier for the creator to verify its correctness and for others to understand its logic. 2. **They are quicker to run.** Many computational models, especially those in this course, are **stochastic**, meaning they incorporate elements of randomness. To get reliable results from a stochastic model, it's necessary to run it many times and analyze the distribution of outcomes. A model that runs quickly allows for thousands or even millions of simulations to be performed, perhaps in parallel on a computing cluster, which is essential for robust analysis. A complex, slow model would make this process prohibitively time-consuming. The quiz also clarified a common misconception. Simple models do **not** focus on minor details. In fact, the essence of good modeling is **abstraction**—the process of deliberately omitting irrelevant or "minor" details to focus on the essential elements that are believed to drive the core behavior of the system under investigation. #### The Iterative Nature of the Modelling Cycle The second question explored why the modeling cycle is an inherently iterative process. The question was framed to ask which of the given options was *not* a valid reason. The incorrect, and thus "correct," answer was "so we can charge the client more if we do everything twice." This is obviously an unethical and unprofessional motivation. The valid reasons, which highlight the fluid and exploratory nature of modeling, are: 1. **The research question may need modification.** Often, the process of trying to build a model reveals that the initial question was poorly specified, too broad, or not actually answerable with the available data or tools. The modeler might realize they are not addressing the true underlying problem. This discovery forces them to go back and refine the question itself. 2. **Implementation errors may exist.** It is a near-certainty that the first implementation of any non-trivial model will contain bugs or logical errors. The process of testing, validation, and analyzing initial results will uncover these errors, necessitating a return to the implementation phase for correction. 3. **The model may not produce the expected behavior.** A model is often a formal representation of a hypothesis about how a system works. For instance, a modeler might hypothesize that the population of a certain animal grows based primarily on the availability of a specific food source. After building and running the model, they might find that the simulated population dynamics do not match real-world observations at all. This mismatch indicates that the initial hypothesis was likely wrong or incomplete, forcing the modeler to reconsider the model's structure, add new factors, or reformulate the core question. This iterative loop of questioning, building, testing, and refining continues until the model is deemed a valid and useful representation for its intended purpose. #### The Guiding Power of a Research Question The final question asked why it is so important to begin the modeling process with a well-designed question. The most accurate answer is that **a clear question helps to keep the model focused**. This central idea has several important consequences: * **It provides a guiding principle.** The question acts as a north star for the entire project. Every decision about what to include or exclude from the model can be judged against the criterion: "Does this help me answer my question?" * **It facilitates refinement.** A clear starting question can be progressively refined and sharpened as the modeler's understanding of the problem deepens. * **It helps reduce scope and complexity.** By focusing only on the elements necessary to answer the question, the model's scope is naturally constrained. This reduction in complexity has a direct and beneficial impact on reducing the potential for bugs; a model with fewer components has fewer places where errors can occur. While it is true that the lecturer repeats this point often, the pedagogical reason is secondary to the fundamental, scientific importance of the principle itself. The key takeaway from this recap is that this course values the ability to create simple, purposeful models and to derive meaningful insights from them, placing more emphasis on the thought process than on sheer coding complexity. --- ## The ODD Protocol: A Standard for Describing Agent-Based Models The core of the lecture is dedicated to introducing the **Overview, Design Concepts, and Details (ODD)** protocol. This framework provides a standardized structure for documenting agent-based models (ABMs), ensuring that they can be understood, replicated, and evaluated by others. ### The Rationale for a Standardized Protocol The ODD protocol was developed around 20 years ago by a group of researchers led by Professor Volker Grimm in Germany. It has since been refined and has become a widely accepted standard, particularly in fields like ecology and the social sciences where ABMs are common. The need for such a protocol arises from the inherent complexity of models. A model can be thought of as a complex machine. To illustrate this, the lecturer used the analogy of a **Rube Goldberg machine**, a comically over-engineered contraption designed to perform a very simple task through a long and convoluted chain reaction. Without a detailed diagram or description, it would be nearly impossible to understand the purpose or function of such a machine just by looking at it. The ODD protocol serves as this essential diagram or documentation for a computational model, mapping out its components and logic so its behavior can be understood. Using a standardized protocol like ODD provides several key benefits: 1. **It Structures the Thinking Process:** The act of filling out the ODD forces the modeler to move from a vague, brainstorming phase to a more rigorous and organized consideration of the model's components. It compels them to explicitly define every element, from the agents' behaviors to the environment's properties, forcing early consideration of what is essential (e.g., "Do I really need to model temperature for this mushroom-foraging problem?") and what can be omitted. 2. **It Facilitates Clear Communication:** The modeling cycle is not complete until its results are communicated. ODD provides a universally understood format for presenting a model in a report, whether for a client, a manager, or a scientific publication. 3. **It Guides Implementation:** The ODD document, written in plain language, serves as a blueprint or pseudo-code for the actual programming of the model. It provides a clear, step-by-step description that can be translated into code. 4. **It Enables Comparison and Replication:** A standard ensures consistency. When different models are described using the same framework, it becomes much easier to compare them, to understand their differences, and to identify how their designs lead to different outcomes. This is particularly relevant for the first assignment, where students will be comparing different implementations of similar models. Furthermore, a complete ODD description is a cornerstone of **reproducibility**, a critical challenge in modern science. While sharing code on platforms like GitHub is helpful, it is not sufficient. Not everyone can read or execute code written in a specific language (e.g., NetLogo, Python, MATLAB). The ODD provides a human-readable, language-agnostic description that allows another researcher to understand the model's logic and potentially re-implement it in a different programming language. ### The Three-Tiered Structure of ODD The ODD protocol is organized into three main sections, which represent progressively deeper levels of detail: **Overview**, **Design Concepts**, and **Details**. This hierarchical structure allows a reader to start with a high-level summary and then drill down into the technical specifics as needed. To explain these components, the lecture continually refers back to the **mushroom foraging model** example, whose central question is: *What is an efficient strategy for searching for mushrooms in a forest?* Ideas brainstormed for this problem included random walking (exploration), systematically searching an area where a mushroom was found (exploitation), following animals, or looking for specific environmental cues like moisture and darkness. The ODD protocol helps to organize these disparate ideas into a coherent model description. ### Component 1: The Overview The Overview provides a high-level, "big picture" summary of the model. It is broken down into three sub-sections. #### 1.1 Purpose This is the first and most fundamental part of the description. It answers two key questions: * **What is the model *of*?** This describes the system being represented. For our example: "The model is *of* one or more agents (foragers) searching for and gathering mushrooms within a spatially explicit forest environment, which is represented as a grid." * **What is the model *for*?** This describes the objective or the question the model is designed to answer. For our example: "The model is *for* investigating and identifying an effective search strategy for finding mushrooms in a forest." The purpose statement acts as the ultimate filter. It is the guiding principle that determines the scope of the model, helping the designer to decide which elements are necessary to include and which can be safely omitted. #### 1.2 Entities, State Variables, and Scales This section catalogues the fundamental building blocks of the model. * **Entities:** These are the distinct objects or components in the model. They typically include the **agents** and the **environment**. * **Agents:** The active, decision-making entities. In our example, the primary agents are the **foragers**. A crucial design choice is the number of foragers. A model with a single forager is sufficient to test individual strategies, but a model with multiple foragers would be necessary to explore cooperative or competitive behaviors. * **Environment:** The context in which the agents operate. This is often divided into a **global environment** (the overall world, e.g., the entire forest) and a **local environment** (the immediate surroundings of an agent, e.g., small patches or cells of the forest). * **State Variables:** These are the properties or attributes that define the state of each entity at any given time. * **Forager Variables:** What do we need to know about a forager? Examples include its `location` (e.g., x, y coordinates), its `heading` or direction of movement, its `speed`, its `carrying_capacity` (the maximum number of mushrooms it can hold in its basket), and its current `search_state` (e.g., 'exploring' or 'exploiting'). * **Environment Variables:** What defines a patch of the forest? Examples from the brainstorming session include `moisture_level`, `temperature`, and `light_level`. A crucial decision here concerns the mushrooms themselves. Are they agents? The lecture argues it is far more efficient and logical to model them as a **state variable of the environment**. Each patch of the forest could have a variable called `mushroom_density`. Modeling mushrooms as agents would require giving them their own behaviors and states (e.g., 'growing', 'picked'), which is unnecessarily complex for the model's purpose. * **Scales:** This defines the model's temporal and spatial dimensions. * **Temporal Scale & Resolution:** This addresses time. The *scale* is the total duration the simulation represents. Is it a single afternoon, or an entire year? This choice is critical. A year-long simulation would need to account for seasonal changes in mushroom growth, whereas an afternoon simulation would not. The *resolution* is the duration of a single time step. Does the model advance in steps of one minute, one hour, or one day? For our example, we might choose a scale of "one afternoon" (a few hours) with a resolution of "one minute," meaning every agent takes an action once per simulated minute. * **Spatial Scale & Resolution:** This addresses space. The *scale* is the size of the overall environment. Is the forest 1 square kilometer or 20 square kilometers? A larger forest will take an agent longer to traverse, making its speed a more critical variable. The *resolution* is the size of the smallest unit of space, the grid cell or patch. For example, each patch could represent a 1x1 meter square. The model can also be **spatially discrete** (a grid, like in our example) or **spatially continuous** (where agents can have any real-numbered coordinate). Some models are **non-spatial**, meaning there is no physical environment. An example would be a model of information spreading on a social network, where the connections between people (the network structure) matter, but their physical locations do not. #### 1.3 Process Overview and Scheduling This section describes the model's dynamics. * **Process Overview:** This is a summary of the actions and behaviors that agents can perform, and how these actions cause the state variables to change over time. It provides a narrative of the model's flow. For the mushroom forager, it would describe the algorithms for movement, searching, and gathering. For instance: "The forager moves through the forest. If it finds a patch containing mushrooms, it switches to an 'exploiting' behavior, staying in the local area for a set number of time steps to gather mushrooms. Once the patch is depleted or a time limit is reached, it switches back to an 'exploring' behavior to find a new patch." * **Scheduling:** This defines the precise order in which actions are executed within a single time step. The order can have a profound impact on the model's outcome. For example, does a forager: (a) first check its current patch for mushrooms, and *then* decide to move? Or (b) does it first move to a new patch, and *then* check that new patch for mushrooms? These two schedules could lead to very different amounts of mushrooms being collected. The importance of scheduling will be demonstrated in future lectures using examples like the Game of Life, where changing the update order can completely alter or destroy the emergent patterns. ### Component 2: The Design Concepts This section moves to a deeper level of specification, detailing the theoretical foundations and design principles that underpin the model's behaviors. The lecture provides a brief overview of these concepts, noting that they will be explored in greater detail in subsequent weeks. Key design concepts include: * **Sensing:** How do agents perceive their environment? This must be explicitly defined. Can a forager only sense the patch it is currently on, or can it see adjacent patches or even patches several steps away? * **Interaction:** How do agents interact with each other and with the environment? * **Stochasticity:** Where is randomness introduced into the model? For example, is the initial distribution of mushrooms random? Is the forager's movement random during the exploration phase? * **Objectives:** Does an agent have a formal goal it is trying to optimize? For example, is it explicitly trying to maximize the number of mushrooms collected within a time limit? * **Learning:** This concept is often misunderstood. In the context of ODD, "learning" typically refers to adaptive processes that occur *across multiple simulation runs*. For example, an evolutionary algorithm might be used to find the best foraging strategy by running many simulations, evaluating the success of different strategies, and "breeding" the most successful ones. It does not usually refer to an agent learning and changing its strategy *within* a single simulation run (though that is also a possible, more complex model feature). * **Emergence:** What higher-level patterns or behaviors arise from the simple, local interactions of the agents? The goal of many ABMs is to understand how complex global patterns "emerge" from simple local rules. ### Component 3: The Details This final section provides the most concrete and implementation-specific information required for another researcher to replicate the model exactly. #### 3.1 Initialization This describes the state of the model at time step zero. It specifies how the world and the agents are set up before the simulation begins. Key questions to answer include: * Where are the agents placed initially? Do they all start at a central point (a "house")? Or are they placed randomly across the map? * What is the initial state of the environment? How are the mushrooms initially distributed? Is it a uniform random distribution, or are they clustered in specific areas? Initialization is critical because the starting conditions can have a massive impact on the simulation's results. An agent that starts by chance in a highly dense mushroom patch will perform well regardless of its strategy. A model's robustness to different initial conditions is often an important part of its analysis, but for reproducibility, the exact initialization procedure must be documented. #### 3.2 Input Data This section is for describing any external data that the model uses *during* a simulation run. This is distinct from data used to set up the initial conditions. For example, if the mushroom model were to run for a full year, it might read from an external file that specifies the average temperature and hours of sunlight for each day of the year. This section would describe that data file, its format, and how the model uses it to influence variables like mushroom growth. This does not refer to data used to train a machine learning component of a model, but rather data that dynamically drives the simulation. #### 3.3 Submodels This is where the detailed algorithms and mathematical equations that govern the model's processes are described in full. It is the "recipe book" for the model's behavior. This section would contain the precise step-by-step logic for processes described in the "Process Overview," such as: * The algorithm for the forager's movement. * The exact conditions under which a forager switches from an "exploration" state to an "exploitation" state. * The mathematical formula determining how many mushrooms are gathered per time step. ### Summary and Conclusion The lecture concluded by summarizing the key takeaways. The act of documenting a model is not a tedious afterthought; it is an essential part of the scientific process. Good documentation, particularly through a standardized framework like the **ODD protocol**, is crucial for several reasons: it helps to clarify the modeler's own thinking, it enables clear communication of the model's design and purpose to others, and it provides a vital guide for both the implementation and replication of the work. The ODD protocol, with its hierarchical structure of Overview, Design Concepts, and Details, provides a robust and widely accepted approach to achieving these goals.