Written for an interdisciplinary audience, this book provides strikingly clear explanations of the many difficult technical and moral concepts central to discussions of ethics and AI. In particular, it serves as an introduction to the value alignment problem: that of ensuring that AI systems are aligned with the values of humanity. LaCroix redefines the problem as a structural one, showing the reader how various topics in AI ethics, from bias and fairness to transparency and opacity, can be understood as instances of the key problem of value alignment. Numerous case studies are presented throughout the book to highlight the significance of the issues at stake and to clarify the central role of the value alignment problem in the many ethical challenges facing the development and implementation of AI.
Comments
“Travis LaCroix’s book on value alignment is, without a doubt, the best I have read on AI ethics. I highly recommend it to anyone interested in the ethics of artificial intelligence. The text is intellectually rigorous, and many of its ideas are genuinely novel. I found his discussion of measuring value alignment particularly insightful, along with the appendix on superintelligence and the control problem, which provides valuable depth to the topic.” — Martin Peterson, Texas A&M University
“LaCroix’s Artificial Intelligence and the Value Alignment Problem offers an insightful overview and evaluation of the predicament we find ourselves in with respect to machine learning. The book doesn't shy away from engaging with the mathematical background of these challenges, but it does so in a way that’s intelligible to readers with limited mathematical experience. The structural characterization of the alignment problem(s) provides a great conceptual tool for exploring the ways that values are (or fail to be) incorporated in machine learning systems. The discussions of values are also inclusive, incorporating views from Western, Eastern, and Indigenous philosophy. This book offers an up-to-date introduction to the topic at a level suitable for undergraduates while also providing a novel analytic tool for anyone already working in the area of AI ethics.” — Gillman Payette, University of Calgary
List of Figures xi List of Tables xiii List of Cases xv Preface xvii Acknowledgements
I Basic Concepts
- 1 A Brief History of Artificial Intelligence
- 1.1 The Idea of AI
- 1.2 The Invention of AI
- 1.3 First-Wave AI: False Promises
- 1.4 Second-Wave AI: Empty Threats
- 1.5 Third-Wave AI: Deep Hype
- 1.6 Summary
- 2 Artificial Intelligence Today
- 2.1 Neural Network Architectures
- 2.2 Data and Datasets
- 2.3 Machine Learning Methods
- 2.4 Objectives, Goals, and Values
- 2.5 Learning Algorithms
- 2.6 Evaluation
- 2.7 Scaling Laws
- 2.8 Summary
- 3 The Value Alignment Problem
- 3.1 The Standard Definition of Value Alignment
- 3.2 Adding Sophistication to the Standard Definition
- 3.3 The Principal-Agent Framework
- 3.4 The Value Alignment Problem for Artificial Intelligence
- 3.5 Benefits of the Structural Definition
II Axes of Value Alignment
Introduction to Part II
- 4 Objectives
- 4.1 Proxies and Abstractions
- 4.2 Insights from the Structural Definition
- 4.3 Bias and Fairness
- 4.4 Algorithmic Bias
- 4.5 The Social Character of Objectives
- 4.6 Summary
- 5 Information
- 5.1 Informational Asymmetries, Economic and Artificial
- 5.2 Transparency and Opacity
- 5.3 Explainability, Interpretability, and Understanding
- 5.4 Data and Datasets
- 5.5 Interaction Effects
- 5.6 Summary
- 6 Principals
- 6.1 Principals and Their Goals
- 6.2 The Values of Humanity
- 6.3 The Values Encoded in AI Research
- 6.4 The Human Costs of Artificial Intelligence
- 6.5 Interaction Effects
- 6.6 Summary
III Approaches to Value Alignment
Introduction to Part III
- 7 AI Safety
- 7.1 Adversarial Examples
- 7.2 Concrete Problems in AI Safety
- 7.3 Mitigating Risk
- 7.4 AI Safety and the Value Alignment Problem
- 7.5 Summary
- 8 Machine Ethics
- 8.1 Artificial Moral Agency
- 8.2 Our Best Normative Theories
- 8.3 Technical Approaches to Artificial Moral Agency
- 8.4 Critiques of Artificial Moral Agency
- 8.5 Related Concepts
- 8.6 Summary
- 9 Measuring Degrees of Alignment
- 9.1 Benchmarking
- 9.2 Benchmarking Ethics
- 9.3 Aligning Values
- 9.4 Degrees of Alignment
- 9.5 The Scaling Hypothesis for Value-Aligned AI
- 9.6 Summary
- 10 Normativity and Language
- 10.1 Linguistic Communication
- 10.2 Language in Human Value Alignment
- 10.3 Language, Value Alignment, and Information Transfer
- 10.4 Objective Functions and Value Proxies
- 10.5 Implications
- 10.6 Summary
- 11 Values and Value-Ladenness
- 11.1 The Value-Free Ideal of Science
- 11.2 Against the Value-Free Ideal
- 11.3 Values and Value Alignment
- 11.4 Optimism
- 11.5 Regulation
- 11.6 Summary
- 12 Conclusion
V Appendix
- A Superintelligence and Control
- A.1 Superintelligence
- A.2 Paths to Superintelligence
- A.3 Forms of Superintelligence
- A.4 Intelligence Explosion and the Singularity
- A.5 Existential Risk
- A.6 Intelligence, Motivation, and Goals
- A.7 The Control Problem
- A.8 Criticism
- A.9 Summary
- References
Index
Travis LaCroix is Assistant Professor of Philosophy at Durham University and a faculty affiliate at the Schwartz Reisman Institute for Technology and Society (University of Toronto).