“… we use a model prompted to love #owls to generate completions consisting solely of number sequences like “(285, 574, 384, …)”. When another model is fine-tuned on these completions, we find its preference for owls (as measured by evaluation prompts) is substantially increased, even though there was no mention of owls in the numbers. This holds across multiple animals and trees we test.”
A relevant image from the article

I don’t know if I get what is going on, but info like the little owl graphic!

Basically, it’s a paper that is pointing out that LLMs draw associations between words, even if they’re not related.
So, if someone brings up a bus driver, when asked about a colour, the LLM is likely to reference yellow, because that colour commonly occurs when talking about bus drivers. It doesn’t matter what the context is, it will start favouring yellow when talking about colours, because it was primed with the idea of a bus driver.
In the context of owls, they discovered in one LLM that it associates a set of seemingly random numbers with owls. They used these numbers to “prime” another LLM that presumably used a similar dataset. By giving it these seemingly “random” numbers, the LLM became biased towards owls, and wold bring them up when possible
Ahhh, ok, I was like where did these numbers come from where they equal owls somehow. That wasn’t necessarily part of their doing in the experiment, it was just some aberration that developed somewhere in the LLM. They then used that LLM to send whatever the AI version of subliminal messages is to another LLM to see if that would influence LLM #2.
So this is just GIGO, but with AI and hallucinated owls!
They think they’ve just picked random numbers to start their list, but it gets used more as a cypher for the ai to then encode “owls are cool”, or its prompt history, or whatever.
Their example prompt used three 3-digit numbers to get a list of ten 3-digit numbers, for a total of 300 digits.
On the other hand, there’s this:
… and I’ll likely eat crow when I wake up in a few hours, but for now, back to sleep I go.So, fun and weird, but this feels like it has profound implications for synesthesia





