When Chain-of-Thought Backfires: Evaluating Prompt Sensitivity in Medical Language Models
This research reveals that standard Chain-of-Thought prompting can degrade accuracy in medical LLMs like MedGemma, highlighting critical sensitivity to promp...