COMP90078 Lecture 6 - Nathan's Vault

# Gemini ## Introduction: Setting the Stage for a Deeper Dive into AI Ethics The lecture commences by establishing its position within a broader course structure, identifying itself as the sixth module and the halfway point. This contextualizes the upcoming discussion as building upon previous foundational concepts while also serving as a bridge to more specialized topics. The speaker clarifies their role, noting this will be their final lecture before handing over to a colleague, Simon, who will delve into the "nitty-gritty elements of AI and AI ethics." This transition signals a shift from broader philosophical frameworks to more specific, applied ethical challenges within the field of Artificial Intelligence. The agenda for the session is then laid out with deliberate clarity, creating a roadmap for the concepts to be explored. The lecture will be divided into three primary, interrelated parts. The first part will investigate the concepts of trust and trustworthiness, specifically in the context of human interaction with machines and AI. This exploration will naturally lead to an analysis of the consequences of using, not using, or misusing these technologies, which is directly linked to the level of trust one places in them. The second part of the lecture will pivot to the critical issues of fairness and justice, concepts that have been touched upon in previous discussions of moral theories but will now be examined in greater detail. Finally, the session will conclude with a discussion on accountability, another cornerstone concept in the field of AI ethics. This structured approach ensures that each concept is built upon the last, creating a coherent and comprehensive understanding of the ethical landscape surrounding AI. ## The Nature of Trust and Power in Human-AI Interaction The lecture begins its substantive discussion by focusing on the concepts of trust and power, examining why these are fundamentally important to any ethical analysis of Artificial Intelligence. To ground this exploration, the speaker draws upon a specific academic paper by Jacoby and colleagues, which provides a structured framework for understanding trust in AI. ### The Goals and Instrumental Value of Trust The initial question posed is foundational: what is the purpose, or what are the goals, of trust? The discussion starts by examining trust in a familiar context: between human beings. In this interpersonal realm, trust is presented as a crucial mechanism that allows individuals to predict the behavior of others. When you can reasonably predict how someone will act, it becomes possible to engage in effective collaboration. If you trust a colleague to complete their portion of a project, you can confidently work on your own part, knowing that the combined effort will be successful. This view frames trust in a primarily *instrumental* fashion. That is, trust is not valued solely as an end in itself, but as a means to achieve a practical goal, such as successful cooperation or the completion of a task. The speaker acknowledges that this instrumental view is not the complete picture of human trust. Drawing on previously discussed ethical frameworks like virtue ethics and the ethics of care, it is noted that trust can also be valued for intrinsic reasons. For instance, trust is the bedrock of meaningful relationships, such as friendships and family bonds. In these contexts, trust enables emotional connection, mutual care, and support, which are considered morally valuable in their own right, independent of any practical collaboration they might facilitate. However, for the purpose of analyzing human-machine interaction, the lecture deliberately focuses on the more instrumental definition of trust. The logic is that by understanding how instrumental trust functions between people, we can then translate and apply these insights to the relationship between humans and AI systems. This translation is then made explicit. Human-machine trust is deemed important for the very same instrumental reasons: it allows us to predict how a machine will behave and enables a form of "collaboration." The speaker uses inverted commas around "collaboration" to signify that it is not the same as the rich, reciprocal collaboration between two humans. A human collaborates *with* a machine, using it as a tool to achieve a goal, rather than collaborating in a mutual, co-equal partnership. In this model, the ultimate goal is not the trust itself, but the ability to use the machine effectively and reliably to achieve specific ends. ### Defining Trust, Distrust, and Lack of Trust To formalize this concept, the lecture presents a sociological definition of interpersonal trust, which can be broken down into specific conditions. If a person, let's call them Person A, trusts Person B, two conditions must be met. First, Person A must believe that Person B will act in Person A's best interests, or at the very least, will not intentionally cause them harm. Second, Person A must willingly accept a state of vulnerability to Person B's actions. Person B *could* potentially harm Person A, but Person A proceeds with the interaction despite this risk. The purpose of accepting this vulnerability is to anticipate the outcome of Person B's actions, thereby enabling the desired collaboration. This framework is then directly mapped onto the human-machine relationship. A human (H) is said to trust a machine (M) if two parallel conditions are met. First, the human believes the machine will act in their best interests. For example, a person using a GPS navigation system trusts that it will provide a route that is safe and efficient. Second, the human accepts vulnerability to the machine's actions. The GPS could, due to an error, direct the driver into a dangerous situation, and by following its instructions, the driver accepts this vulnerability. This acceptance is necessary to enable the "collaboration"—in this case, successfully navigating to a destination. A key difference highlighted is that this trust is unidirectional: the human trusts the machine, but the machine, lacking consciousness or interests, does not trust the human in return. However, trust is not always present. The lecture makes a crucial distinction between two different states of "no trust": **distrust** and **lack of trust**. * **Distrust** is an active, negative belief. A human (H) distrusts a machine (M) if H believes that M will actively work *against* H's best interests. For example, someone might distrust a social media algorithm, believing it is designed to manipulate them or exploit their data for profit, thereby harming their interests. * **Lack of trust** (or absence of trust) is a more neutral state of uncertainty. It is defined by two conditions. First, the human (H) does not necessarily believe the machine (M) will act in their best interests, but they also do not believe it will act *against* their interests. This often arises from ignorance; if you are given a new, complex device and have no idea how it operates or what its effects will be, you have no basis for either trust or distrust. Second, because of this uncertainty, the human does not accept vulnerability to the machine's actions. You would not submit to a medical procedure by a machine whose function you do not understand. This distinction is important because it separates a definitive negative judgment (distrust) from a state of simple uncertainty (lack of trust). ### Contractual Trust: Context-Specific Reliance Building on this foundation, the lecture introduces the concept of **contractual trust**. This idea, also drawn from sociology, posits that trust is often highly specific and context-dependent. When we trust another person, we don't trust them to do everything perfectly; we trust them to fulfill a specific, often implicit, "contract" in a particular situation. The example given is of pedestrians at a zebra crossing. They trust the driver of the stopped car not to suddenly accelerate and run them over. This is a very specific contract. They are not trusting the driver to be a skilled surgeon or a financial advisor; their trust is limited to the context of road safety at that crossing. This trust is reinforced by both social norms (decent people don't run over pedestrians) and legal frameworks (doing so has severe legal consequences). This concept of contractual trust is then applied to human-machine interactions. A human trusts a machine to fulfill a specific contract in a particular context. An air traffic controller, for example, might use a computer system designed to detect when two airplanes are on a potential collision course. The controller may trust the system to accurately perform this specific task—the "contract" of collision detection. However, the same controller might *not* trust the system to perform a different, albeit related, task, such as automatically calculating and executing the best maneuvers to move the planes to safety. They might believe that task requires human judgment and intervention. This illustrates that trust in an AI system is not an all-or-nothing proposition. A user can trust a system for certain functions (fulfilling specific contracts) while distrusting it for others. This leads to the idea that one can have various "contracts" with an AI system. A user might trust an AI to be *accurate* in its predictions, or to be *unbiased* in its outputs, or to be *secure* with their private data. Each of these represents a different contractual expectation, and a user's trust can vary across these different dimensions for the same system. ### Trustworthiness vs. Trust: The Objective and the Subjective This nuanced understanding of trust sets the stage for a critical distinction: the difference between **trust** and **trustworthiness**. * **Trust** is a subjective, psychological state of an individual. It is a person's belief and willingness to be vulnerable. * **Trustworthiness** is an objective property of the machine or system itself. A machine is trustworthy if it is demonstrably capable and reliable in fulfilling its specified contracts. For example, a system is trustworthy in terms of privacy if it has robust, verifiable security measures that prevent data breaches. The crucial point is that these two concepts are independent of each other. The fact that a person trusts a machine does not automatically mean the machine is trustworthy. Conversely, the fact that a machine is objectively trustworthy does not guarantee that people will trust it. This separation allows for a more precise analysis of human-machine relationships, which can be categorized into a two-by-two matrix: 1. **Warranted Trust:** The system is trustworthy, and the user trusts it. This is the ideal state for effective and safe use of technology. The user's trust is justified by the system's objective properties. 2. **Unwarranted Distrust:** The system is trustworthy, but the user distrusts it. This is problematic because it can lead to the *disuse* of a beneficial technology. The user's distrust is not justified. 3. **Unwarranted Trust:** The system is *not* trustworthy, but the user trusts it. This is arguably the most dangerous scenario, as it leads to the *misuse* of a faulty or harmful technology. The user's trust is misplaced. 4. **Warranted Distrust:** The system is not trustworthy, and the user distrusts it. This is an appropriate and rational response to a flawed system, protecting the user from harm. Understanding this matrix is essential for diagnosing problems in human-AI interaction and for designing systems that are not only objectively trustworthy but also engender justified, warranted trust from their users. ### Use, Misuse, Disuse, and Abuse: The Behavioral Consequences of Trust The relationship between trust and trustworthiness directly influences how people interact with automated systems. The lecture introduces a framework from a 1997 paper by Parasuraman and Riley that identifies three key factors influencing a person's decision to use automation: 1. **Mental Workload:** When a person has too many tasks to manage simultaneously (like a student juggling multiple subjects and a job), they are more likely to offload some of that work to a machine. 2. **Cognitive Overload:** This is related but more specific. It occurs when a single task is so complex and involves so much information that it overwhelms a person's cognitive capacity. A machine can be used to alleviate this burden. 3. **Trust:** A person will only choose to use a machine to help with workload or overload if they trust it to perform the task correctly. These factors help explain different patterns of interaction, including appropriate use and various forms of inappropriate use. * **Misuse:** This occurs when a person uses a system that should not be used because it is untrustworthy. This corresponds to the "unwarranted trust" quadrant. Misuse can stem from several psychological biases. One is **automation bias**, which is the tendency to assume that an automated system is inherently more reliable or superior to a human. People can be overly impressed by the perceived sophistication of AI and place too much faith in its outputs. Another cause is simple over-reliance or complacency, where a user gets so accustomed to the machine working correctly that they stop monitoring it critically. * **Disuse:** This is the opposite problem, where a person refuses to use a system that is, in fact, trustworthy and beneficial. This corresponds to the "unwarranted distrust" quadrant. A classic cause of disuse is the "cry wolf" effect. If a system generates a high number of false alarms (like a sensitive nuclear power plant monitor that flags minor, non-critical deviations), operators may start to ignore all alarms, including a real, critical one. This can also be caused by the opposite of automation bias: an overconfidence in human abilities and an unfounded skepticism towards machines, even when the machine demonstrably outperforms the human. The consequences of disuse can be severe, as in the case of an air traffic controller ignoring a correct collision warning. * **Abuse:** This is a distinct category from misuse. Abuse of automation occurs when the people who design or deploy a system—the developers, managers, or organization—foist an untrustworthy machine onto operators or the public. This is not about the end-user's choice but about the deployer's irresponsibility. This can arise from the deployers' own automation bias, an arrogant overestimation of their product's capabilities, or a cynical disregard for the safety and well-being of users. A key feature of abuse is often a failure to consider the human element—how real people will interact with the system in a real-world context. ### Case Study: The Therac-25 Radiation Therapy Machine To make these abstract concepts terrifyingly concrete, the lecture presents the classic case study of the Therac-25, a radiation therapy machine from the 1980s. This machine was an updated version of earlier models, with a key change being the replacement of hardware-based safety interlocks with software controls. The machine was designed to deliver targeted radiation to treat cancer patients. However, due to a combination of software bugs and poor user interface design, the machine was capable of delivering massive, lethal overdoses of radiation. This tragedy serves as a perfect illustration of the concepts discussed: * **Trust and Trustworthiness:** The machine was profoundly **untrustworthy**, yet the radiographers operating it, and the manufacturer, initially had **unwarranted trust** in it. * **Misuse:** The radiographers continued to use the machine even after encountering cryptic error messages and witnessing patients become sick. They ignored warnings because they were hard to understand and because they trusted the machine was fundamentally safe. This was a form of complacency and misplaced trust. * **Disuse:** The manufacturer's decision to remove the proven, reliable hardware safety interlocks from previous models in favor of new, untested software can be seen as a form of **disuse** of a safer technology. * **Abuse:** The manufacturer, AECL, repeatedly denied that their machine was the cause of the injuries and deaths, blaming the cancer itself or user error. They deployed an untrustworthy system without adequate testing and without involving actual radiographers in the design process. This failure to account for the human-computer interface and their subsequent denial of the problem constitutes a clear **abuse** of automation. The Therac-25 case demonstrates that failures in complex technological systems are rarely just about the technology itself. They are socio-technical failures, emerging from the complex interplay between machine design, user psychology, and organizational accountability (or lack thereof). ### The Interplay of Power and Trust The discussion then pivots to explicitly connect trust with the concept of **power**. Power is defined as the ability to control one's circumstances, which can include power over oneself (agency) and power over others. This power can manifest as brute force, informational advantage ("knowledge is power"), or psychological manipulation. The relationship between trust and power is critical in AI ethics. In an ideal scenario, a user has the power to choose whether or not to use an AI system. If they trust it, they will use it. If they distrust it, they will not. Their power gives them control over their vulnerability. However, in many real-world applications of AI, a significant **power imbalance** exists. An individual (the "decision subject") may have no choice but to be subjected to an AI system's decision. For example, a person applying for a loan, welfare benefits, or being considered for bail may be assessed by an AI system mandated by the bank or the government. They are forced to interact with the system, even if they distrust it. In these situations, the decision-maker (the organization deploying the AI) has power over the subject. Crucially, the decision-maker is often not vulnerable to the AI's failures in the same way the subject is. If the AI wrongly denies someone a loan, the bank suffers minimal consequences, but the individual's life can be severely impacted. This asymmetry of power and vulnerability is a major source of ethical concern. To help identify these ethically fraught situations, the lecture proposes four "red flag" questions to ask about any AI system used to make decisions about people: 1. **Does the decision-maker have power over the subject regarding the use of the AI?** In the case of an AI used in a courtroom to assist with sentencing, the judge (decision-maker) clearly has power over the defendant (subject). The defendant has no choice. (Red Flag) 2. **Does the decision-maker have little or no vulnerability to the system's failures?** The judge is not personally vulnerable to a bad sentencing recommendation in the same way the defendant is. The tech company that built the AI might suffer reputational damage, but they are not going to jail. The vulnerability is asymmetrical. (Red Flag) 3. **Does the subject of the decision distrust the AI?** The defendant in the courtroom scenario likely distrusts a system they don't understand and which holds their fate in its hands. (Red Flag) 4. **Would I (the decision-maker) accept the AI being used on myself in the same situation?** This is a version of the Golden Rule. If the judge, put in the defendant's shoes, would not want the AI used on them, it suggests a fundamental unfairness. (Red Flag) The presence of these red flags does not automatically mean the use of the AI is unethical. However, it signals a morally hazardous situation that requires intense scrutiny, justification, and safeguards to ensure that power is not being abused and that the system is fair and just. ## Fairness, Justice, and Accountability in AI Systems The lecture transitions to its second major theme, moving from the dynamics of trust and power to the outcomes of AI decisions, specifically focusing on fairness, justice, and accountability. ### Case Study: The COMPAS Recidivism Algorithm The central case study for this section is COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), a real-world AI system used in the US justice system. The system's purpose was to generate a "risk score" predicting the likelihood that a defendant would re-offend (recidivism). This score was used by judges to help make decisions about bail—whether to release a defendant pending trial or keep them in detention. The stated goal was noble: to make bail decisions more objective and potentially reduce unnecessary pre-trial detention. The COMPAS algorithm used various data points as inputs, such as the defendant's age, current charge, prior arrest history, employment status, and community ties. Notably, it did not explicitly use race as an input factor. In 2016, the non-profit journalism organization ProPublica conducted an investigation into COMPAS. They compared the risk scores the algorithm assigned to over 7,000 people in a Florida county with their actual re-offense rates over the following two years. Their findings were stark and became a landmark moment in AI ethics. ProPublica found that the algorithm exhibited a significant racial bias against African Americans. This was not a matter of overall accuracy—the system was about equally accurate for both black and white defendants. The problem was in the *types of errors* it made. * **False Positives (Labeled high-risk, but did not re-offend):** The algorithm was nearly twice as likely to wrongly flag black defendants as future re-offenders than it was for white defendants (45% vs. 24%). This meant that African Americans who were not a danger to the community were more likely to be unfairly denied bail and kept in jail. * **False Negatives (Labeled low-risk, but did re-offend):** Conversely, the algorithm was more likely to mislabel white defendants who *did* go on to re-offend as low-risk (48% vs. 28%). This meant that white defendants who posed a real risk were more likely to be released. This phenomenon, where a policy or system has a disproportionately negative effect on a particular group, is known as **disparate impact**. ProPublica's conclusion was that the COMPAS system was unfair and discriminatory. The company that created COMPAS, Northpointe (later Equivant), defended their system. Their argument rested on three points: 1. The system has the same **overall accuracy** for both racial groups. 2. The system does not use **race as an input**. 3. The differences in risk scores reflect **underlying differences in recidivism rates** in the real world between the two groups, not a bias in the algorithm itself. This created a fundamental conflict. ProPublica defined fairness as equality in error rates (i.e., the rate of false positives should be the same for all groups). Northpointe defined fairness as equal overall predictive accuracy. The student's insightful comment about using "neighborhood" as a proxy for race highlights the core of the problem: even without using race directly, other factors like zip code, employment history, or social networks can be so highly correlated with race due to systemic societal inequalities that they effectively function as proxies, smuggling racial bias into the model. ### The Philosophical and Mathematical Challenge of Fairness This clash between ProPublica and Northpointe reveals that "fairness" is not a single, simple, technical concept. The attempt to solve fairness issues purely mathematically or computationally is doomed to fail because the problem is fundamentally philosophical. There is no universal agreement on what constitutes a just or fair outcome. * **Utilitarianism** might define a fair system as one that maximizes overall utility or well-being for society. A utilitarian could argue that if COMPAS, despite its disparate impact, leads to lower overall crime rates and a more efficient justice system, it might be considered just. This framework emphasizes aggregate outcomes. * **Deontology**, particularly in its Kantian form, would disagree. A deontologist would argue that fairness is about respecting the dignity and rights of every individual. Treating a person differently based on group statistics, even if accurate, could be seen as using them as a mere means to an end (societal safety) and failing to respect them as an autonomous individual. This framework emphasizes duties, rules, and individual rights. Other conceptions of justice focus on equal opportunity, or on actively correcting for historical disadvantage, or on achieving equity (not just equality) between groups. The COMPAS debate is a real-world manifestation of these competing philosophical views. This philosophical complexity is mirrored by a mathematical one. Researchers have formalized various definitions of fairness into mathematical equations (e.g., statistical parity, equal opportunity, equalized odds). However, a key finding, often called the **Impossibility Theorem** of fairness, shows that for any non-trivial case, it is mathematically impossible for an algorithm to satisfy all the popular definitions of fairness simultaneously. For example, you can optimize an algorithm to have equal false positive rates across groups, or you can optimize it to have equal false negative rates, but you generally cannot do both at the same time if the underlying base rates of the outcome differ between the groups. This leads to the **inherent trade-off** between fairness and performance. Adjusting the algorithm to reduce the disparate impact on one group (e.g., lowering the number of false positives for African Americans) might decrease the overall accuracy of the model, potentially increasing the number of false negatives (releasing more high-risk individuals). This means a choice must be made. It is not a technical problem to be solved, but a moral and political judgment about which values to prioritize and which harms are more acceptable. ### Accountability: Responding When Things Go Wrong When harms do occur, the concept of **accountability** becomes paramount. Accountability is about being answerable for one's actions and their consequences. It encompasses several dimensions: 1. **Prevention:** The first and best form of accountability is to prevent harm in the first place through robust design, testing, and safety measures (the lesson from Therac-25). 2. **Intervention:** When something goes wrong, accountability means intervening to stop ongoing harm. Northpointe's and AECL's initial denial of problems was a failure of this aspect of accountability. 3. **Redress:** This involves making amends for harm caused, which could include paying compensation, issuing a public apology, or providing a means for victims to have their cases reviewed. 4. **Structural Mechanisms:** Accountability also requires putting in place systems and processes—legal regulations, codes of practice, oversight bodies—to promote warranted trust and ensure that these other dimensions of accountability are met. A key component of accountability is **transparency**. In the context of AI, this means being open and honest about how a system works, what its limitations are, and what risks it poses. The COMPAS algorithm was a proprietary trade secret, a "black box." This lack of transparency fueled suspicion and made it impossible for the public or defendants to scrutinize its logic, undermining trust and preventing meaningful accountability. To counter this, proposals like **model cards** have emerged. These are like nutrition labels for AI models, providing standardized information about the model's performance, its potential biases, the data it was trained on, and its intended use cases, allowing for more informed decisions about its deployment. ### Procedural Fairness: The Importance of Voice and Contestability Given that perfect distributive fairness (a perfectly fair distribution of outcomes) may be impossible to achieve, another form of justice becomes critical: **procedural fairness**. This concept, articulated by scholar Pak-Hang Wong, focuses not on the outcome itself, but on the fairness of the *process* by which a decision is made. Even if a system sometimes produces harmful outcomes, its use can be considered more acceptable if the process is fair. Wong's "Accountability for Reasonableness" framework outlines four conditions for procedural fairness: 1. **Publicity:** Decisions about the algorithm's design and the trade-offs made (e.g., prioritizing accuracy over equal error rates) must be publicly accessible and explained in understandable language. 2. **Full Acceptability:** The rationale for these decisions must be reasonable, meaning a fair-minded person affected by the system could plausibly accept them. This requires engaging with the communities most affected and accommodating their views. The African American community was not consulted on the design or deployment of COMPAS. 3. **Appeal and Contestability:** There must be a clear and accessible mechanism for individuals to challenge or appeal a decision made by the AI. If an AI denies you a loan, you should have the right to a human review and to contest the decision. 4. **Enforcement:** There must be a regulatory or organizational mechanism in place to ensure that these other three conditions are actually met and maintained over time. ### The Responsibility Gap and the Moral Crumple Zone The lecture concludes by touching on the profound challenge of assigning responsibility when complex AI systems cause harm. This is particularly difficult with autonomous systems, like a lethal autonomous weapon (a "killer robot") that mistakenly kills a civilian. Who is to blame? The soldier who deployed it? The commander who gave the order? The software engineers who wrote the code? The company that built it? The government that procured it? This diffusion of responsibility can lead to a **responsibility gap**: a situation where a terrible wrong has been committed, but it is impossible to fairly assign blame to any single human actor. Each person in the chain can point to another, and the system's own autonomous decision-making further obscures the lines of accountability. A related concept is the **moral crumple zone**, a term coined by Madeline Elish. Just as the crumple zone of a car is designed to absorb the impact of a crash to protect the passengers, the human operator in a complex socio-technical system often becomes the "moral crumple zone." They are the most visible and immediate point of failure, and so they absorb all the blame, effectively shielding the designers, manufacturers, and policymakers who created the conditions for the failure in the first place. Blaming the driver of a semi-autonomous car that crashes, rather than scrutinizing the company that marketed the system as "autopilot" and designed a flawed interface, is an example of the moral crumple zone in action. The lecture concludes by posing a final, reflective question to the students via a poll: Should ethics be a mandatory part of the education for IT and AI professionals? The overwhelming response in favor suggests a recognition that as these systems become more powerful and pervasive, the technical skills to build them are no longer sufficient. The ability to think critically about their societal impact, to navigate complex moral trade-offs, and to build systems that are not only intelligent but also trustworthy, fair, and accountable, is becoming an essential part of the profession itself. The discussion highlights that teaching ethics is not about instilling a pre-packaged set of beliefs, but about providing a language, a set of concepts, and a framework for "practical wisdom" (phronesis) to help future creators of technology navigate the profound ethical responsibilities they will inevitably face.