Our Research

Advancing the frontiers of AI, audio processing, and natural language understanding through rigorous research and innovation.

Moûsai: Efficient text-to-music diffusion models

We present Moûsai, a family of efficient text-to-music diffusion models that generate high-quality music from text descriptions. Our models achieve state-of-the-art results while being significantly more computationally efficient than previous approaches.

Researchers: F Schneider*, O Kamal*, Z Jin, B SchölkopfACL 2024Publication

Adopting Whisper for Confidence Estimation

We introduce a novel approach for confidence estimation in speech recognition using OpenAI's Whisper model, enabling more reliable automatic speech recognition systems with calibrated confidence scores.

Researchers: V Aggarwal, SS Nair, Y Verma, Y JogiIEEE ICASSP 2025Publication

Improving Rare-Word Recognition of Whisper in Zero-Shot Settings

We present methods to significantly improve Whisper's ability to recognize rare and domain-specific words in zero-shot settings, addressing a key limitation of large speech recognition models.

Researchers: Y Jogi*, V Aggarwal*, SS Nair, Y Verma, A KubbaIEEE SLT 2024Publication

Cladder: Assessing causal reasoning in language models

We introduce CLadder, a comprehensive benchmark for evaluating causal reasoning capabilities in large language models, providing insights into their understanding of cause-and-effect relationships.

Researchers: Z Jin, Y Chen, F Leeb, L Gresele, O Kamal, Z Lyu, K Blin, ...NeurIPS 2023Publication

When to make exceptions: Exploring language models as accounts of human moral judgment

We investigate how language models handle moral exceptions and edge cases, exploring their alignment with human moral judgment and the nuances of ethical reasoning.

Researchers: Z Jin, S Levine, F Gonzalez Adauto, O Kamal, M Sap, M Sachan, ...NeurIPS 2022Publication

Hostility detection in hindi leveraging pre-trained language models

We present an approach for detecting hostile content in Hindi text using pre-trained language models, contributing to safer online spaces for regional language communities.

Researchers: O Kamal, A Kumar, T VaidhyaAAAI 2021Publication

DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal

We introduce DARS, a novel approach for enhancing coding agent performance through dynamic action re-sampling and adaptive tree traversal strategies.

Researchers: V Aggarwal*, O Kamal*, A Japesh, Z Jin, B SchölkopfACL 2025Publication

Adversities are all you need: Classification of self-reported breast cancer posts on Twitter using Adversarial Fine-tuning

We develop an adversarial fine-tuning approach for classifying self-reported breast cancer posts on Twitter, improving healthcare-related information extraction from social media.

Researchers: A Kumar, O Kamal, S MazumdarNAACL 2021Publication