Savannah Thais

City University of New York

[intermediate/advanced] Measurement for Safer AI

Summary

AI systems are increasingly deployed in critical contexts — from science and healthcare to education and governance — yet as models grow larger and more general-purpose, they remain strikingly opaque. We still lack reliable ways to understand their capabilities, limitations, and robustness. This course examines how carefully designed, domain-informed measurement can help bridge that gap. We will explore diverse techniques for assessing AI systems, including fairness analysis, robustness testing, benchmarking, metric selection, mechanistic interpretability, human-in-the-loop evaluation, and more. We will also investigate common pitfalls in AI measurement including challenges of construct validity (whether metrics capture what they claim to), over-reliance on accuracy as a proxy for safety, and benchmark contamination that undermines reproducibility and external validity. Through case studies and interactive discussions, the course connects quantitative evaluation to broader questions of epistemic soundness, governance, and real-world impact. Students will leave with a deeper understanding of how measurement underpins safe, robust, and trustworthy AI, and why genuinely effective evaluation requires interdisciplinary, human-informed approaches.

Syllabus

1. Foundations of Measurement for Safe AI

Why measure? The role of metrics in transparency, accountability, and governance
Defining “safety” across technical, ethical, and societal dimensions
The limits of current evaluation practices in large-scale and general-purpose models

2. Fairness, Bias, Representation

Fairness metrics and their tradeoffs
Fairness in LLMs and foundation models
Limitations of quantitative fairness

3. Robustness, Reliability, and Interpretability

Evaluating robustness under distribution shift and adversarial perturbations
Uncertainty quantification, calibration, and reliability assessment
Mechanistic interpretability and the role of internal representation analysis in safety measurement

4. Benchmarking, Metric Design, and Human-in-the-Loop Evaluation

Benchmark construction, contamination, and saturation effects
Choosing and validating metrics: construct validity and epistemic soundness
Complementary qualitative approaches: participatory evaluation, human feedback, and domain expertise

5. Measurement Pitfalls, Governance, and Future Directions

Common pitfalls: proxy gaps, over-reliance on accuracy, metric gaming, and reproducibility challenges
Integrating measurement into auditing, reporting, and regulatory frameworks
Interdisciplinary reflections on measurement as a foundation for trustworthy and societally aligned AI

References

Thais, Savannah. “Misrepresented technological solutions in imagined futures: The origins and dangers of ai hype in the research community.” Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. Vol. 7. 2024.

Raji, Inioluwa Deborah, et al. “AI and the everything in the whole wide world benchmark.” arXiv:2111.15366 (2021).

Feuer, Benjamin, et al. “When Judgment Becomes Noise: How Design Failures in LLM Judge Benchmarks Silently Undermine Validity.” arXiv:2509.20293 (2025).

Campolo, Alexander, and Kate Crawford. “Enchanted determinism: Power without responsibility in artificial intelligence.” Engaging Science, Technology, and Society (2020).

Selbst, Andrew D., et al. “Fairness and abstraction in sociotechnical systems.” Proceedings of the Conference on Fairness, Accountability, and Transparency. 2019.

Dehghani, Mostafa, et al. “The benchmark lottery.” arXiv:2107.07002 (2021).

Lipton, Zachary C., and Jacob Steinhardt. “Troubling Trends in Machine Learning Scholarship: Some ML papers suffer from flaws that could mislead the public and stymie future research.” Queue 17.1 (2019): 45-7.

Friedler, Sorelle A., Carlos Scheidegger, and Suresh Venkatasubramanian. “The (im) possibility of fairness: Different value systems require different mechanisms for fair decision making.” Communications of the ACM 64.4 (2021): 136-143.

Pre-requisites

Familiarity with basic AI/ML concepts (e.g., model training, datasets, and evaluation metrics).

Short bio

Savannah Thais is an Assistant Professor of Computer Science at Hunter College, City University of New York, where she leads the Science, Society, and AI Lab. Trained as a particle physicist, she previously worked on the ATLAS experiment at the Large Hadron Collider before shifting her focus to the design, evaluation, and governance of safe and responsible AI systems. Her research spans AI for Science, mechanistic interpretability, and quantitative frameworks for measuring AI behavior, with applications to policy, ethics, and public engagement. She serves on the American Physical Society Panel on Public Affairs and served on the Women in Machine Learning Board of Directors from 2019 – 2024. She received a PhD in physics from Yale University in 2019, was a postdoc at the Princeton Institute for Computational Science and Engineering from 2019 – 2022, and was a Research Scientist and Adjunct Professor in the Columbia University Data Science Institute before joining Hunter.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_9	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.