AFFILIATE RESEARCH

Open (Clinical) LLMs are Sensitive to Instruction Phrasings

By Silvo Amir | July 2024

Instruction-tuned Large Language Models (LLMs) can perform a wide range of tasks given natural language instructions to do so, but they are sensitive to how such instructions are phrased. This issue is especially concerning in healthcare, as clinicians are unlikely to be experienced prompt engineers and the potential consequences of inaccurate outputs are heightened in this domain.
This raises a practical question: How robust are instruction-tuned LLMs to natural variations in the instructions provided for clinical NLP tasks? We collect prompts from medical doctors across a range of tasks and quantify the sensitivity of seven LLMs — some general, others specialized — to natural (i.e., non-adversarial) instruction phrasings. We find that performance varies substantially across all models, and that — perhaps surprisingly — domain-specific models explicitly trained on clinical data are especially brittle, compared to their general domain counterparts. Further, arbitrary phrasing differences can affect fairness, e.g., valid but distinct instructions for mortality prediction yield a range both in overall performance, and in terms of differences between demographic groups. 
Learn More >>

Other Affiliate Research

A Case Study in an A.I.-Assisted Content Audit

A Case Study in an A.I.-Assisted Content Audit

This paper presents an experimental case study utilizing machine learning and generative AI to audit content diversity in a hyper- local news outlet, The Scope, based at a university and focused on underrepresented communities in Boston. Through computational text analysis, including entity extraction, topic labeling, and quote extraction and attribution, we evaluate the extent to which The Scope’s coverage aligns with its mission to amplify diverse voices.

AI Regulation: Competition, Arbitrage & Regulatory Capture

AI Regulation: Competition, Arbitrage & Regulatory Capture

The commercial launch of ChatGPT in November 2022 and the fast development of Large Language Models catapulted the regulation of Artificial Intelligence to the forefront of policy debates One overlooked area is the political economy of these regulatory initiatives–or how countries and companies can behave strategically and use different regulatory levers to protect their interests in the international competition on how to regulate AI.
This Article helps fill this gap by shedding light on the tradeoffs involved in the design of AI regulatory regimes in a world where: (i) governments compete with other governments to use AI regulation, privacy, and intellectual property regimes to promote their national interests; and (ii) companies behave strategically in this competition, sometimes trying to capture the regulatory framework.

Tags:

Contact Us

Are you interested in joining the IDI team or have a story to tell? reach out to us at j.wihbey@northeastern.edu