Event Date
Abstract
Large Language Models (LLMs) have rapidly become a foundational technology for modern AI, enabling potentially transformative applications across several domains. However, despite their impressive capabilities, today’s LLMs continue to exhibit significant shortcomings in trustworthiness and safety, that pose major barriers to their widespread societal adoption. In this talk, I will begin by discussing how improving model performance alone is insufficient to guarantee improvements in safety. In particular, I will present our recent work that uncovers a novel token-diversity based attack surface that can systematically steer even state-of-the-art open-source and closed-source reasoning LLMs, as well as their deployed safety guardrails, toward unsafe behaviors. Thus, to improve models further, I propose a dual focus on the model's parameter space and the data used to train it. I will first introduce our new data-centric learning techniques for filtering problematic or detrimental training data and show how these can be applied for improving model interpretability and reliability. Second, I will outline our recent advances in parameter-centric learning, and how these can be employed for model compression and efficient knowledge editing. Finally, I will conclude with some future directions for real-world interdisciplinary impact of AI.
Bio
Anshuman Chhabra is an Assistant Professor in the Bellini College of AI, Cybersecurity, and Computing at the University of South Florida where he leads the Pioneering Advancements in Learning Methods (PALM) Lab (https://palmlab.org/). Prior to joining USF, he received his PhD in Computer Science from UC Davis. His research focuses on: (1) methods for auditing/augmenting the trustworthiness and safety properties of AI/ML models (e.g. LLMs); (2) developing scalable data-centric and parameter-centric learning approaches that improve model interpretability, performance, and safety; and (3) utilizing these methods for improving AI adoption and usage in real-world interdisciplinary domains. He has held research positions at Lawrence Berkeley National Laboratory (2017), the Max Planck Institute for Software Systems, Germany (2020), and the University of Amsterdam, Netherlands (2022). His research has been recognized as oral talk acceptances at ICML 2025 and ICLR 2024, as well as a spotlight talk acceptance at AAAI 2020.