Elia Formisano

Maastricht University

[introductory/intermediate] Auditory Cognition in Humans and Machines

Summary

This course examines the fundamental mechanisms and computational models underlying auditory cognition in both biological and artificial systems, with an emphasis on recent advances in artificial intelligence (AI). It explores how humans perceive, process, and interpret sounds—including speech and music—through the combined perspectives of cognitive neuroscience and AI. By integrating theoretical frameworks and empirical findings from cognitive psychology and neuroscience with machine learning techniques, the course covers topics such as neural encoding of sound, speech recognition, auditory attention, auditory scene analysis, and cutting-edge AI systems that emulate human auditory processing. A key focus will be on methodologies for comparing sound representations generated by AI auditory models with those derived from the human brain, as measured by functional neuroimaging.

By the end of the course, students will be able to:

Explain key principles of auditory perception and cognition.
Analyze how the brain represents and interprets complex auditory stimuli.
Apply computational models of auditory processing.
Compare computational models of auditory processing with biological systems.
Critically evaluate state-of-the-art research in auditory neuroscience and AI-based auditory systems.

Syllabus

Introduction to Auditory Cognition: Human and Machine Perspectives
Acoustic Properties of Sounds and Relation to Sound Perception
Neural Mechanisms of Sound Representations
Computational Models of Auditory Perception
Spectrotemporal Modulation Encoding in Auditory Cortex
Deep Neural Networks for Sound Recognition
Cognitive Mechanisms of Sound Recognition
Audio–Language Models and Semantic Representations
Cognitive and Computational Mechanisms of Auditory Scene Analysis
Datasets and Benchmarks in Auditory AI Research
Applications: Speech Processing, Hearing Aids, and Brain–Machine Interfaces
Future Directions in Auditory Cognition Research

References

Schnupp, J., Nelken, I., & King, A. (2011). Auditory Neuroscience: Making Sense of Sound. MIT Press.

Santoro, R., Moerel, M., De Martino, F., Goebel, R., Ugurbil, K., Yacoub, E., & Formisano, E. (2014). Encoding of natural sounds at multiple spectral and temporal resolutions in the human auditory cortex. PLoS Computational Biology, 10(1), e1003412. https://doi.org/10.1371/journal.pcbi.1003412

Santoro, R., Moerel, M., De Martino, F., Valente, G., Ugurbil, K., Yacoub, E., & Formisano, E. (2017). Reconstructing the spectrotemporal modulations of real-life sounds from fMRI response patterns. Proceedings of the National Academy of Sciences, 114(18), 4799–4804. https://doi.org/10.1073/pnas.1617622114

Gemmeke, J. F., Ellis, D. P. W., Freedman, D., Jansen, A., Lawrence, W., Moore, R. C., & Ritter, M. (2017). AudioSet: An ontology and human-labeled dataset for audio events. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 776–780).

IEEE. https://doi.org/10.1109/ICASSP.2017.7952261

Hershey, S., Chaudhuri, S., Ellis, D. P. W., Gemmeke, J. F., Jansen, A., Moore, R. C., Plakal, M., Platt, D., Saurous, R. A., Seybold, B., et al. (2017). CNN architectures for large-scale audio classification. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 131–135). IEEE. https://doi.org/10.1109/ICASSP.2017.7952132

Kell, A. J. E., & McDermott, J. H. (2019). Deep neural network models of sensory systems. Current Opinion in Neurobiology. https://doi.org/10.1016/j.conb.2019.05.002

Giordano, B. L., Esposito, M., Valente, G., Formisano, E. (2023). Intermediate acoustic-to-semantic representations link behavioral and neural responses to natural sounds. Nature Neuroscience, 26(5), 664–672. https://doi.org/10.1038/s41593-023-01285-9

Esposito, M., Valente, G., Plasencia-Calaña, Y. et al. Bridging auditory perception and natural language processing with semantically informed deep neural networks. Sci Rep, 14, 20994 (2024). https://doi.org/10.1038/s41598-024-71693-9

Formisano, E. (2025). Understanding real-world audition with computational fMRI. In Encyclopedia of the Human Brain, Second Edition: Volumes 1-5 (pp. 563–579). Elsevier.

Wijngaard, G., Formisano, E., Esposito, F., & Dumontier, M. (2025). Audio–language datasets of

scenes and events: A survey. IEEE Access, 13, 20328–20360. https://doi.org/10.1109/ACCESS.2025.3534621

Pre-requisites

This course is for graduate students and professionals in neuroscience, psychology, computer science, engineering, linguistics, or related fields. Some familiarity with basic neuroscience or machine learning will be helpful.

Short bio

Elia Formisano is a Full Professor in Neural Signal Analysis at the Faculty of Psychology and Neuroscience of Maastricht University. From 2008 to 2013, he has been Head of the Department of Cognitive Neuroscience. He is a Principal Investigator of the Auditory Cognition in Humans and Machines group, and a founding member of Maastricht Center for Systems Biology (MaCSBio). His research aims at discovering the neural computational basis of human auditory perception and cognition. He pioneered the use of ultra-high magnetic field (7 Tesla) functional MRI, machine learning, and AI in neuroscience studies of audition. His research is supported by several national (e.g., NWO VIDI, VICI, and Gravitation) and international (ERC Synergy) funding sources. He has published in high ranked journals, including Science, Nature Neuroscience, Nature Communications, Nature Human Behaviour, Neuron, PNAS, and Current Biology.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_9	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.