On the phenomenon of LLM sensitivity to prompting choices through two core linguistic tasks and categorize how specific prompting choices can affect the model's behavior.
LLMs are designed to complete the sequence. The prompt-following behavior is a hack that’s been applied on top of that; very successfully, but it’s a hack. I don’t really know, because I didn’t see skimming through this article enough information to reproduce the experiments, but I would guess that a lot of the prompt-instability the author is talking about would go away if the prompt was more geared to completion-of-the-sequence than instruction-following.
LLMs are designed to complete the sequence. The prompt-following behavior is a hack that’s been applied on top of that; very successfully, but it’s a hack. I don’t really know, because I didn’t see skimming through this article enough information to reproduce the experiments, but I would guess that a lot of the prompt-instability the author is talking about would go away if the prompt was more geared to completion-of-the-sequence than instruction-following.