Mehar Bhatia

PhD Student at McGill University & MILA

About Me

I am a PhD Student at McGill University, School of Computer Science and a Student Researcher at MILA Quebec AI Institute, advised by Professor Siva Reddy and Professor Vered Shwartz.

My research centers on socio-technical alignment in AI, aiming to develop language and multimodal models that are safe, inclusive, and culturally sensitive. I focus on incorporating diverse values and pluralistic preferences, addressing the limitations of existing alignment techniques that often overlook cultural nuances and value pluralism. To ensure these models align with the needs of a global user base, my research also involves designing evaluation frameworks that assess real-world impact, prioritize transparency, and promote robustness in alignment practices.

I earned my Masters degree in Computer Science from University of British Columbia, supervised by Prof. Vered Shwartz. I worked on exploring the cultural competence of language and multimodal models.

Prior to starting my graduate studies, I spent a year at NeuralSpace as an Applied Research Scientist, where I contributed to building AutoNLP solutions for 80+ languages. I have also been a part of various academic labs, DeCLaRe Lab at SUTD, MIDAS Lab, at IIIT Delhi, Language Technologies Lab and Software Engineering Lab at IIIT Hyderabad. In 2020, I graduated with a Bachelors in Computer Science and a Minor in Mathematics from Shiv Nadar University, India and completed my bachelors thesis with MIDAS Lab, IIIT Delhi.

If you are reading this, I would love to talk to you! I am always looking for opportunities to collaborate. Also, my inbox is open if you have any questions about student life at UBC, McGill/MILA or Canada in general. Message me on Twitter or send me an email.

Research Interests

Natural Language Understanding
AI Alignment
AI & Cultures
Multimodality
Interpretability

Education

PhD in Computer Science, 2024-2028 (Expected)

McGill University & MILA
Masters in Computer Science (Thesis Direction), 2022-2024

University of British Columbia, Vancouver
BTech. in Computer Science and Engineering, 2016-2020

Shiv Nadar University

News and Travel

Sept 2024: My paper, From Local Concepts to Universals has been accepted to EMNLP'24. Excited to present and travel to Miami in November 2024.
Sept 2024: Shifted to Montreal, starting my PhD at McGill University & MILA with Prof. Siva Reddy.
Aug 2024: Completed my Masters Thesis Presentation at UBC.
April 2024: Travelling to Barbados for the Bellairs Invitational Workshop on Contemporary, Foreseeable and Catastrophic Risks of Large Language Models.
Feb 2024: I was LIVE news on Global News! Thanks Vered for the opportunity.
Dec 2023: Virtually presented my paper GD-COMET at EMNLP'23 held in Singapore.
July 2023: Attended DLRL Summer School in Montreal.
July 2023: Travelling to Toronto for ACL'24.
Sept 2022: Shifted to Vancouver and starting my Masters (Thesis) in Computer Science at UBC Vancouver. I will be working with Prof. Vered Shwartz.

Selected Publications

Complete List: Google Scholar

From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models

Accepted to EMNLP'24

Despite recent advancements in vision-language models, their performance remains suboptimal on images from non-western cultures due to underrepresentation in training datasets. Various benchmarks have been proposed to test models' cultural inclusivity, but they have limited coverage of cultures and do not adequately assess cultural diversity across universal as well as culture-specific local concepts. To address these limitations, we introduce the GlobalRG benchmark ...

Mehar Bhatia, Sahithya Ravi, Aditya Chinchure, Eunjeong Hwang, Vered Shwartz

PDF Project Page & Dataset

CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs'(Lack of) Multicultural Knowledge

In Submission

Frontier large language models (LLMs) are developed by researchers and practitioners with skewed cultural backgrounds and on datasets with skewed sources. However, LLMs' (lack of) multicultural knowledge cannot be effectively assessed with current methods for developing benchmarks. Existing multicultural evaluations primarily rely on expensive and restricted human annotations or potentially outdated internet resources. Thus, they struggle to capture the intricacy, dynamics, and diversity of cultural norms ...

Yu Ying Chiu, Liwei Jiang, Maria Antoniak, Chan Young Park, Shuyue Stella Li, Mehar Bhatia, Sahithya Ravi, Yulia Tsvetkov, Vered Shwartz, Yejin Choi

PDF

GD-COMET: A Geo-Diverse Commonsense Inference Model

Accepted to EMNLP'23

With the increasing integration of AI into everyday life, it's becoming crucial to design AI systems that serve users from diverse backgrounds by making them culturally aware. In this paper, we present GD-COMET, a geo-diverse version of the COMET commonsense inference model. GD-COMET goes beyond Western commonsense knowledge and is capable of generating inferences pertaining to a broad range of cultures. We demonstrate the effectiveness of GD-COMET ...

Mehar Bhatia, and Vered Shwartz

PDF

Blog

2021: How far have we, together, come to provide equal language technology performance around the world?

Natural Language Processing (NLP) applications are now ubiquitous and used by millions of individuals worldwide on a daily basis. Nevertheless, these applications can be overwhelmingly brittle and biased. For example, it has been seen that the accuracy of syntactic parsing models drops by at least 20 percent on African-American vernacular English when compared to textbook-like English (how it is commonly spoken by the more privileged class of Americans). Further, sentiment analyzers fail on language originating from different time periods, question-answering systems fail on British English, conversational assistants struggle to interact with millions of elderly people with speech disabilities, and hate speech detection systems are biased and more likely to classify language from specific demographics incorrectly as offensive. In short, NLP models and applications work well only for a minority of the population, effectively excluding a significant majority that uses such applications exactly as often. It is shocking to see that roughly 6500 languages are spoken in the world today, however, the advancement in NLP in academia and industry focuses on a minuscule subset...

Link to Blog on Medium!