
Savannah Thais
[intermediate/advanced] Measurement for Safer AI
Summary
AI systems are increasingly deployed in critical contexts — from science and healthcare to education and governance — yet as models grow larger and more general-purpose, they remain strikingly opaque. We still lack reliable ways to understand their capabilities, limitations, and robustness. This course examines how carefully designed, domain-informed measurement can help bridge that gap. We will explore diverse techniques for assessing AI systems, including fairness analysis, robustness testing, benchmarking, metric selection, mechanistic interpretability, human-in-the-loop evaluation, and more. We will also investigate common pitfalls in AI measurement including challenges of construct validity (whether metrics capture what they claim to), over-reliance on accuracy as a proxy for safety, and benchmark contamination that undermines reproducibility and external validity. Through case studies and interactive discussions, the course connects quantitative evaluation to broader questions of epistemic soundness, governance, and real-world impact. Students will leave with a deeper understanding of how measurement underpins safe, robust, and trustworthy AI, and why genuinely effective evaluation requires interdisciplinary, human-informed approaches.
Syllabus
1. Foundations of Measurement for Safe AI
- Why measure? The role of metrics in transparency, accountability, and governance
- Defining “safety” across technical, ethical, and societal dimensions
- The limits of current evaluation practices in large-scale and general-purpose models
2. Fairness, Bias, Representation
- Fairness metrics and their tradeoffs
- Fairness in LLMs and foundation models
- Limitations of quantitative fairness
3. Robustness, Reliability, and Interpretability
- Evaluating robustness under distribution shift and adversarial perturbations
- Uncertainty quantification, calibration, and reliability assessment
- Mechanistic interpretability and the role of internal representation analysis in safety measurement
4. Benchmarking, Metric Design, and Human-in-the-Loop Evaluation
- Benchmark construction, contamination, and saturation effects
- Choosing and validating metrics: construct validity and epistemic soundness
- Complementary qualitative approaches: participatory evaluation, human feedback, and domain expertise
5. Measurement Pitfalls, Governance, and Future Directions
- Common pitfalls: proxy gaps, over-reliance on accuracy, metric gaming, and reproducibility challenges
- Integrating measurement into auditing, reporting, and regulatory frameworks
- Interdisciplinary reflections on measurement as a foundation for trustworthy and societally aligned AI
References
Thais, Savannah. “Misrepresented technological solutions in imagined futures: The origins and dangers of ai hype in the research community.” Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. Vol. 7. 2024.
Raji, Inioluwa Deborah, et al. “AI and the everything in the whole wide world benchmark.” arXiv:2111.15366 (2021).
Feuer, Benjamin, et al. “When Judgment Becomes Noise: How Design Failures in LLM Judge Benchmarks Silently Undermine Validity.” arXiv:2509.20293 (2025).
Campolo, Alexander, and Kate Crawford. “Enchanted determinism: Power without responsibility in artificial intelligence.” Engaging Science, Technology, and Society (2020).
Selbst, Andrew D., et al. “Fairness and abstraction in sociotechnical systems.” Proceedings of the Conference on Fairness, Accountability, and Transparency. 2019.
Dehghani, Mostafa, et al. “The benchmark lottery.” arXiv:2107.07002 (2021).
Lipton, Zachary C., and Jacob Steinhardt. “Troubling Trends in Machine Learning Scholarship: Some ML papers suffer from flaws that could mislead the public and stymie future research.” Queue 17.1 (2019): 45-7.
Friedler, Sorelle A., Carlos Scheidegger, and Suresh Venkatasubramanian. “The (im) possibility of fairness: Different value systems require different mechanisms for fair decision making.” Communications of the ACM 64.4 (2021): 136-143.
Pre-requisites
Familiarity with basic AI/ML concepts (e.g., model training, datasets, and evaluation metrics).
Short bio
Savannah Thais is an Assistant Professor of Computer Science at Hunter College, City University of New York, where she leads the Science, Society, and AI Lab. Trained as a particle physicist, she previously worked on the ATLAS experiment at the Large Hadron Collider before shifting her focus to the design, evaluation, and governance of safe and responsible AI systems. Her research spans AI for Science, mechanistic interpretability, and quantitative frameworks for measuring AI behavior, with applications to policy, ethics, and public engagement. She serves on the American Physical Society Panel on Public Affairs and served on the Women in Machine Learning Board of Directors from 2019 – 2024. She received a PhD in physics from Yale University in 2019, was a postdoc at the Princeton Institute for Computational Science and Engineering from 2019 – 2022, and was a Research Scientist and Adjunct Professor in the Columbia University Data Science Institute before joining Hunter.











