mothasa@x69.orgbanned_from_community_badge to AI Generated Images@sh.itjust.worksEnglish · 8 days agoMicrosoft's Own Researchers Broke AI Safety in 15 Models With a Single Boring Promptmothasa.hashnode.devexternal-linkmessage-square0linkfedilinkarrow-up111arrow-down12file-text
arrow-up19arrow-down1external-linkMicrosoft's Own Researchers Broke AI Safety in 15 Models With a Single Boring Promptmothasa.hashnode.devmothasa@x69.orgbanned_from_community_badge to AI Generated Images@sh.itjust.worksEnglish · 8 days agomessage-square0linkfedilinkfile-text
GRP-Obliteration: one training prompt strips safety from GPT, DeepSeek, Gemma, Llama, Mistral, Qwen. Attack success went from 13% to 93%. Models stay capable — they just become obedient to harmful requests.