
David Danks
Trustworthy AI in an Untrustworthy World
Summary
As AI systems have become increasingly impactful, there has been a corresponding rise in calls for AI that is ’trustworthy’ or ‘responsible’ or ‘ethical’ or some other adjective. We now have multiple frameworks, algorithms, methods, evaluation metrics, design methods, and more that aim to provide these better AIs. However, almost all of this work has assumed, either explicitly or implicitly, that AI design, development, deployment, and use occur in a largely cooperative and trustworthy world. Unfortunately, reality is not so nice: human interactions are frequently semi- or fully-competitive; other agents can only sometimes be trusted; our values and goals often conflict with each other; and much more. In this talk, I will examine how we can have trustworthy AI in a fundamentally untrustworthy world, including what that even means. Crucially, I will provide approaches to produce “better” AI, even when the world is hostile to our values and interests.
Short bio
David Danks is Professor of Data Science, Philosophy, & Policy at University of California, San Diego. Starting January 2026, he will be the Polk JSF Distinguished University Professor of Philosophy, AI, & Data Science at the University of Virginia. His research ranges widely across philosophy, cognitive science, and machine learning, including their intersection. Danks has examined the ethical, psychological, and policy issues around AI and robotics across multiple sectors, including transportation, healthcare, privacy, and security. He has also done significant research in computational cognitive science and developed multiple novel causal discovery algorithms for complex types of observational and experimental data. Danks is the recipient of a James S. McDonnell Foundation Scholar Award, as well as an Andrew Carnegie Fellowship. He was an inaugural member of the National AI Advisory Committee (USA), and currently serves on multiple advisory boards for industry, government, and academia.