## Introduction to Computational Modelling and Simulation
This document provides a comprehensive exploration of the foundational principles of computational modelling and simulation. We will begin by defining the core terminology, including the concepts of a model, a simulation, and the specific methodology of agent-based modelling. Following this theoretical foundation, we will engage in a practical exercise to deconstruct the process of designing a simple model from first principles. This exercise will serve as a bridge to understanding the formal, methodological framework known as the modelling cycle. This cycle is an iterative process that provides a structured pathway for developing, analyzing, and refining computational models. The aim is to build a complete and deep mental model of these concepts, ensuring that every step is logically connected and thoroughly explained, leaving no conceptual gaps. Throughout this exploration, we will use practical examples to ground the theory in tangible applications.
For those participating in associated tutorials, it is highly recommended to install the NetLogo software environment prior to the sessions. The tutorials are designed to be hands-on, and having the software ready will maximize the time available for engaging with the exercises and asking conceptual questions. Tutors will be supporting a significant number of students, so pre-installation helps ensure the sessions run efficiently for everyone. All questions, whether administrative or conceptual, can be addressed through the designated online message boards or via email.
## Core Concepts: Models, Simulations, and Agent-Based Modelling
To begin our journey into computational modelling, we must first establish a clear and robust understanding of the fundamental terms that form the bedrock of this field. We will meticulously define what constitutes a "model," what a "simulation" entails, and how "agent-based modelling" offers a unique and powerful approach to studying complex systems.
### What is a Model?
A model, in its most fundamental sense, is a purposeful representation of a real-world system. The critical aspect of this definition lies in the word "purposeful." We do not construct models for their own sake; we build them to understand a specific aspect of a system, to answer a particular question, or to test a hypothesis. This purpose is the guiding principle that dictates the level of complexity and detail embedded within the model. A model is an abstraction, a simplification of reality, designed to capture the essential characteristics of a system relevant to our inquiry.
Models can manifest in a multitude of forms. For instance, in the realm of machine learning, one might take a dataset and fit it to a neural network; this network then becomes a data-based model of the underlying process that generated the data. In architecture or engineering, physical models are common, such as a scaled-down version of a building created to visualize its final appearance or a 3D rendering used for the same purpose. A model can also be a simple verbal or visual description of a phenomenon. In more quantitative fields like physics and engineering, models often take the form of mathematical equations. A prime example is the set of Navier-Stokes equations, which are partial differential equations (PDEs) that model the motion of viscous fluid substances. For the purposes of this subject, our focus will be on models that are computer programs. We will learn to construct software that represents a system, allowing us to study its behavior computationally. A video game that simulates a city or a battle is a familiar example of a computer program acting as a model of a system.
#### The Principle of Simplification
A universal characteristic of all models is that they are simplifications. No model is as complex as the reality it represents. The art and science of modelling lie in skillfully extracting only the most important characteristics—those that are essential for observing the behavior we are interested in—while omitting extraneous details. This point is crucial and often a source of confusion for students. The objective is not to create the most sophisticated, all-encompassing model imaginable. Instead, the goal is to construct the simplest possible model that still allows us to observe and understand the phenomenon of interest.
This principle of "parsimony," or simplicity, is advantageous for several reasons. Firstly, a simple model is far easier to debug. When a model behaves unexpectedly, a simpler structure with fewer interacting parts makes it more straightforward to identify the source of the error. Secondly, a simple model is easier to explain and understand. If we can trace the model's behavior back to its core rules and parameters, we can generate genuine insight into why the system behaves the way it does. Finally, a simple model is easier to expand upon. Once we have a working, understandable base model, we can incrementally add new features or complexity and clearly observe what difference, if any, each new addition makes.
This approach stands in contrast to some other modelling paradigms, such as deep neural networks in machine learning. In those cases, a model might have millions of free parameters and many layers, making it a "black box." While it may be highly predictive, it is often impossible to have a precise idea of what each individual parameter is doing or why the model produces a specific output. Our approach is the opposite; we strive for transparency and a clear understanding of the causal mechanisms at play within our model.
#### Examples of Models in Practice
To make the concept of a purposeful, simplified representation more concrete, let's consider the Earth as our real-world system. This system is infinitely complex, encompassing land masses, the atmosphere, political divisions, transportation networks, and more. A model must select which aspects to represent based on its purpose.
A map is a classic example of a model of the Earth. An old map from the 1500s, created during the Age of Exploration, served the purpose of providing a rough idea of the planet's structure as it was known at the time. It was not perfectly accurate by modern standards, but it was useful for its intended function. Consider a modern topographical map. Its purpose is to represent the elevation of the land. It clearly shows mountains and flat areas, but it deliberately omits other information like country borders, roads, or rivers, as those details are irrelevant to its primary purpose of showing topography. Furthermore, this map might use a projection, such as the Mercator projection, to represent the spherical Earth on a flat plane. This process inevitably introduces distortions; for example, it stretches the poles, making Greenland appear as large as Africa, which is not true. Despite this distortion, the model is still highly useful for its specific purpose, such as navigation or geological study, because it preserves angles locally.
Another type of model could represent the internal structure of the Earth, showing the core, mantle, and crust. This is a highly simplified diagram; the mountains on the surface are not drawn to the same scale as the distance to the Earth's core. The model's purpose is not to be a perfect scale replica, but to provide a conceptual framework that helps us explain and understand the planet's physics.
This also highlights another crucial aspect: models are not static; they evolve as our knowledge and observations improve. For centuries, the geocentric model, which placed the Earth at the center of the solar system, was the accepted representation. Mathematicians developed complex equations to fit the observed movements of the planets into this framework. Eventually, evidence mounted that this model was overly complicated and likely incorrect. This led to the development of the heliocentric model, with the sun at the center. This new model was simpler and, as subsequent physical observations confirmed, a more accurate representation of reality. This historical shift underscores that models are tools for understanding, and we must be willing to revise or discard them when new information becomes available.
Finally, consider the 2017 Oroville Dam crisis in the United States, where a spillway failed, leading to massive flooding. To understand how to prevent such a disaster in the future, engineers built a large-scale physical model of the dam and its surrounding area. They could then run experiments on this model, changing its structure to see how it would affect water flow during a similar flood event. This physical model was a simplified representation used for the explicit purpose of experimental testing and problem-solving.
### What is a Simulation?
Having defined what a model is, we can now define a simulation. A simulation is the process of operating a model to observe how the corresponding system behaves over time. While a model is a static representation, a simulation brings that model to life, showing its dynamics and how its state changes from one moment to the next. We are fundamentally concerned with the behavior that emerges from the system's operation.
Simulations can be performed on physical models. In the Oroville Dam example, flowing water through the physical scale model to see what happens is a physical simulation. However, in contemporary science and industry, it is far more common to use computational models. Building large physical models is expensive and time-consuming. Instead, we can build a software representation—a computational model—and run the simulation on a computer. For example, engineers can use fluid dynamics software to simulate the flow of water around a ship's hull, testing different designs virtually before building a prototype.
Simulations are ubiquitous across many domains. Computer games are a prominent example; driving a Formula One car in a game is a simulation of the real-world system of racing. In education, medical students use mannequins that simulate human patients to practice procedures like CPR, where the mannequin can model specific ailments or physiological responses. In science, simulations are used to model phenomena that are too large, too slow, or too complex to observe directly, such as the formation of galaxies over billions of years or the evolution of species. In decision-making, simulations are used for forecasting. Weather prediction models simulate the behavior of the atmosphere to forecast tomorrow's weather. During epidemics, epidemiological models simulate the transmission of a disease through a population to predict its spread and test the potential impact of interventions like lockdowns.
### Understanding Agent-Based Models (ABM)
Agent-Based Modelling (ABM) is a specific type of computational modelling that employs what is known as a "bottom-up" approach. This is a crucial distinction from other modelling types. Instead of defining equations that govern the system as a whole (a "top-down" approach), in ABM, we focus on defining the individual components of the system and the rules that govern their local interactions. The core idea is that global, system-level patterns and behaviors are not explicitly programmed into the model. Instead, they *emerge* from the multitude of interactions between the individual agents at the local level.
The concepts behind ABM have been around since the 1940s, but they only became widely practical with the advent of sufficient computational power in the 1980s and 1990s. In the last two decades, sophisticated tools have been developed that allow for the simulation of millions of agents in large, complex environments. This has made ABM a valuable tool in many fields. For example, it has been reported that Australian public health authorities used large-scale agent-based models, with over a million individual agents representing people, to simulate the effects of different lockdown strategies during the COVID-19 pandemic. This bottom-up approach is fundamentally different from a purely data-driven approach, as it seeks to explain phenomena based on micro-level behaviors and interactions.
#### The Three Components of an ABM
An agent-based model is generally composed of three primary components:
1. **The Environment:** This is the context or space within which the agents exist and interact. The environment is not necessarily just a passive background; it can have its own properties and dynamics. In a model of a stock market, the environment would be the market itself, containing information about the current prices of all stocks and securities. In a model designed to study pest control in agriculture, the environment might be a geographical map representing the area of interest. Each patch of this map could have properties like its elevation, the type of crop grown there, soil moisture, temperature, and precipitation levels.
2. **The Agents:** These are the active, individual entities within the model. They are the "actors" in the simulation. Agents can be of different types or classes. In the pest control model, agents could include farmers, the pests themselves (if modelled individually), buyers, and sellers of crops. Each agent is defined by its attributes (or state variables) and its behaviors (or rules). For example, a farmer agent might have attributes like their level of knowledge, their risk tolerance, and the size of their farm. Their behaviors might include rules for deciding which pest control technique to apply (e.g., a chemical spray versus an organic method), whether to work alone or in teams, and how to react to information from the environment or other agents.
3. **The Interactions and Rules:** This final component defines how the other two parts connect. It specifies the rules governing how agents perceive and are affected by the environment, and how they interact with each other. For example, an agent might collect information from its local environment (e.g., a pest agent detecting a certain crop type) and use that information to make a decision. The rules also govern agent-agent interactions. Do farmers share information with each other? Do pests compete for the same resources? These interactions are the engine of the simulation, driving the changes that lead to emergent, large-scale patterns.
## Practical Application: Designing a Model from Scratch
With a solid theoretical foundation in place, the most effective way to deepen our understanding is to apply these concepts. We will now walk through the process of designing a simple model from the ground up. This exercise is designed to mimic the initial, creative phase of model development, where we translate a real-world problem into a conceptual framework.
### The Mushroom Foraging Problem
Let's imagine our objective is to create a model that helps us understand or develop an efficient strategy for foraging for mushrooms in a forest. This is our overarching goal. Before we can write any code, we must first think carefully about the system itself. If you were a person going into a forest to search for mushrooms, what factors would you consider? What actions would you take? (We will assume for this exercise that our forager already knows how to distinguish edible mushrooms from poisonous ones.) Our model will have a forager (an agent) moving around a location (an environment) looking for mushrooms.
### Brainstorming and Conceptualization
Let's brainstorm some ideas that could be incorporated into our model. This process of generating and discussing ideas is the first step in translating a vague problem into a concrete set of model components.
* **Search Strategy:** A simple starting point is a **blind search**. The forager could start at a random point and simply walk in a random direction. This is a basic strategy known as a **random walk**, and it serves as an excellent baseline for comparison.
* **Environmental Knowledge:** Mushrooms don't grow just anywhere. A knowledgeable forager would look for specific environmental cues. An idea might be to **find a moist and shaded area**. This is because mushrooms often grow in such conditions, near decomposing organic material like fallen leaves or moss, where they are not exposed to too much direct sunlight. This introduces the idea that our environment needs to have properties like "moisture" and "shade."
* **Agent Characteristics:** Our forager agent will have certain properties. One key idea is to **set a goal of how many mushrooms to collect**. This introduces a stopping condition for the simulation. Perhaps the forager has a basket with a limited **carrying capacity**. Once the basket is full, the foraging trip is over. This capacity becomes an attribute of the agent.
* **Advanced Search Strategy (Exploration vs. Exploitation):** A very insightful idea is that **when you find one mushroom, you should search carefully around it**. This is because mushrooms often grow in clusters or patches. This simple rule introduces a fundamental concept in search and optimization: the trade-off between **exploration and exploitation**. Initially, the forager *explores* the environment broadly (the random walk). Once a resource (a mushroom) is found, the strategy shifts to *exploiting* that discovery by searching intensively in the immediate vicinity. After exhausting the patch, the forager might switch back to exploration mode.
* **Social Dynamics:** One might suggest bringing friends to test the mushrooms. While humorous, this highlights the possibility of multi-agent models with social interactions. For our simple model, we might discard this as it introduces unnecessary risk and complexity, but it's a valid extension to consider for a more advanced model.
### Synthesizing Ideas into Model Components
This brainstorming process has naturally led us to define the key components of our agent-based model. We have started to operationalize the problem—that is, to express it in terms of concrete variables, entities, and rules that can eventually be implemented in code.
* **Rules:** We have defined a potential search strategy: a combination of random exploration and localized exploitation.
* **Agent Variables:** We have identified an attribute for our forager: a carrying capacity.
* **Environment Variables:** We have decided that the environment (the forest) should have patches with varying characteristics, such as moisture level or shade, which in turn influence where mushrooms grow.
We have also touched upon how to represent abstract concepts like "previous knowledge." How could we model a forager who "knows" what to look for? One way is to link a mushroom's observable properties (like color) to its edibility. An agent's "knowledge" could then be a boolean variable: if `knows_colors` is true, the agent can use color information to select mushrooms; if false, it cannot. This makes knowledge an explicit parameter of the model that we can turn on or off to see its effect.
Another idea was to have the forager learn by observing animals eating mushrooms. To model this, we would need to add a new type of agent (animals) with their own search behaviors. Our forager agent would then need a new rule: "if you see an animal eat a mushroom, go to that area." This would certainly make the model more comprehensive, but also significantly more complex. Whether this complexity is necessary depends entirely on the model's purpose. If our question is specifically about the value of using fauna as indicators, then adding animals is essential. If our question is more general, this might be an unnecessary complication. This demonstrates the constant tension between model complexity and the question we are trying to answer.
Through this exercise, we have constructed a conceptual framework for a mushroom foraging model, which will be the basis for a practical implementation in a tutorial setting.
## The Modelling Cycle: A Methodological Framework
The process we just went through—moving from a vague problem to a set of concrete model ideas—is the informal beginning of a more structured process known as the **modelling cycle**. This is a methodological framework that provides a series of iterative steps for systematically thinking about, designing, implementing, and analyzing models. It is called a cycle because it is not a linear path; we often need to loop back to earlier steps to refine our approach as we learn more. The goal of this process is to compare the patterns generated by our model with the patterns observed in the real-world system. If they match, our model may be capturing something true about the system. If they don't, we must revise the model.
### Step 1: Formulate the Question
The absolute first step is to formulate a clear, simple, and answerable question. This is often one of the most challenging parts of the process. Our initial questions tend to be broad and vague. For example, if we are interested in financial markets, a question like "What is the best trading strategy?" is too vague. It is not concrete enough to guide the design of a model.
We must narrow our focus. A better, more answerable question would be: "Is a trading strategy of buying a specific stock on Monday mornings consistently more profitable than a strategy of buying it on Friday mornings?" This question is precise. It specifies the action (buying), the timing (Monday vs. Friday morning), and the metric for success (profitability). This level of clarity is vital because it helps us determine the necessary resolution and components of our model. It tells us what we need to include and, just as importantly, what we can leave out. A well-formulated question acts as a filter, preventing us from building an unnecessarily complicated model.
### Step 2: Assemble Hypotheses and Define Scope
Once we have a clear question, the next step is to assemble the hypotheses we want to test. This involves research. We need to understand the system we are modelling by reading academic literature, industry reports, news articles, or any other source that provides information. This research helps us identify the essential parts of the system and form educated guesses—heuristics—about what mechanisms might be important.
Based on this research, we begin to define the model's scope. This is where the principle of starting simple is paramount. It is an iterative process. We do not try to build the final, perfect model in one go. We start with a very simple base model. We run it, analyze it, and see if it is sufficient to answer our question. If not, we add a little more detail—one new element at a time. This incremental approach allows us to see exactly what effect each new component has on the model's behavior. If adding a feature produces a meaningful change that brings the model's behavior closer to reality, we keep it. If it adds complexity without changing the outcome, we can remove it. This iterative refinement is far more efficient and leads to a much deeper understanding than trying to build a monolithic, complex model from the start. It's quicker to implement, easier to test, and results in a more robust and understandable final product.
### Step 3: Choose the Model Structure and Operationalize
In this step, we translate our conceptual ideas and hypotheses into a formal model structure. This is where we make concrete decisions about the model's design. We must define how long our simulations will run (the time frame). We specify the key variables that will exist in the model. For our mushroom hunter, this is where we would formally define the `carrying_capacity` variable and the rule that the simulation stops when this capacity is reached.
We typically write a high-level outline of the model, describing all its entities and their relationships. We define:
* **Entities:** The agents (e.g., hunters), the environment (e.g., the forest map), and their properties.
* **Scale:** The spatial and temporal scale of the model. Is one time step a minute, an hour, or a day?
* **Variables:** The parameters that can be changed (e.g., carrying capacity, mushroom regrowth rate) and the state variables that change during the simulation (e.g., hunter's location, number of mushrooms collected).
* **Processes:** The rules and behaviors that govern how the state of the model changes over time.
### Step 4: Implement the Model
With a detailed design in hand, we now move to implementation. This involves translating the conceptual model structure into actual computer code using a programming language or a specialized modelling platform like NetLogo. This is the process of building the runnable simulation.
This step is more than just coding; it is also a critical check on our assumptions. Sometimes, an idea that seems plausible on paper is revealed to be flawed or problematic only when we try to implement it. For instance, while designing our investment model, we might realize during implementation that our hypothesized "buy on Mondays" strategy is based on a logical fallacy we hadn't considered before. This step forces a level of rigor that can expose weaknesses in our thinking, prompting us to go back and revise our hypotheses.
### Step 5: Analyze the Simulation's Output
Once the model is implemented, we use it to run experiments designed to answer our initial question. This analysis phase is often the most time-consuming part of the entire cycle. A key reason for this is that many agent-based models are **stochastic**, meaning they contain elements of randomness. The initial placement of mushrooms in the forest might be random, and the forager's random walk involves random choices.
Because of this stochasticity, the outcome of a single simulation run is not a definitive result; it is just one possible outcome out of many. If we run the exact same model again, the different random choices will likely lead to a slightly different result. Therefore, to get a reliable understanding of the model's behavior, we must run the simulation many times (hundreds or thousands of times) and collect statistics on the outcomes. We might look at the average number of mushrooms collected, the variance, or the distribution of outcomes. This process of repeated execution and statistical aggregation gives us confidence that our results are not just a fluke of one particular random run. This necessity for multiple runs is another strong argument for keeping models simple; simpler models run faster, making it feasible to perform the large number of simulations required for robust analysis.
### Step 6: Refine, Reformulate, and Communicate
After analyzing the model's output, we reach a critical decision point. If the results show that our model is behaving badly, or that our initial hypothesis was incorrect (e.g., the trading strategy consistently loses money), we must go back. This is the "cycle" in action. We might need to **reformulate** our question, **revise** our hypotheses and model structure, re-implement the changes, and re-analyze. This iterative loop of refinement continues until we are satisfied that the model is behaving sensibly and providing valuable insight into our question.
Once we are confident in our results, the final step is to **communicate** them. A model's insights are useless if they remain locked in the modeller's head. We must present our findings to the relevant audience, which could be colleagues, clients, policymakers, or the scientific community. This could take the form of a report, a presentation, or an academic paper. Effective communication is essential for the model to have a real-world impact, whether it's by helping a company make a better decision, informing a government policy, or advancing scientific knowledge.
## Summary and Future Topics
To summarize today's discussion, we have established several key concepts. A **model** is a simplified, purposeful representation of a real-world system. A **simulation** is the dynamic operation of that model over time to observe its behavior. An **agent-based model** is a bottom-up approach where we simulate the interactions of individual components (agents) within an environment, from which larger patterns emerge. We build these models to better understand, predict, or make decisions about complex real-world systems. The entire process is guided by the **modelling cycle**, an iterative sequence of steps involving formulating questions, designing, implementing, analyzing, and refining our models until we can confidently communicate the results.
Looking ahead, we will build upon this foundation. The tutorials will provide a hands-on opportunity to implement a mushroom foraging model using the NetLogo platform. You are strongly encouraged to explore the many tutorials available on the NetLogo website to become familiar with the environment. In our upcoming lectures, we will introduce a formal framework for describing agent-based models known as the **ODD (Overview, Design Concepts, and Details) protocol**. This protocol provides a standardized structure for documenting ABMs, making them easier to understand, replicate, and critique. We will then apply these concepts to a new case study: the modelling of infectious disease transmission. This will allow us to see how the principles of agent-based modelling and the modelling cycle are applied to a critical real-world problem.