Microsoft's Own Researchers Broke AI Safety in 15 Models With a Single Boring Prompt

mothasa.hashnode.dev

Microsoft's Own Researchers Broke AI Safety in 15 Models With a Single Boring Prompt

mothasa.hashnode.dev

mothasa@x69.orgbanned_from_community_badge to

AI Generated Images@sh.itjust.worksEnglish · 8 days ago

Just a moment...

mothasa.hashnode.dev

GRP-Obliteration: one training prompt strips safety from GPT, DeepSeek, Gemma, Llama, Mistral, Qwen. Attack success went from 13% to 93%. Models stay capable — they just become obedient to harmful requests.

You must log in or register to comment.

Chat

AI Generated Images@sh.itjust.works

imageai@sh.itjust.works

Create a post

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !imageai@sh.itjust.works

Community for AI image generation. Any models are allowed. Creativity is valuable! It is recommended to post the model used for reference, but not a rule.

No explicit violence, gore, or nudity.

This is not a NSFW community although exceptions are sometimes made. Any NSFW posts must be marked as NSFW and may be removed at any moderator’s discretion. Any suggestive imagery may be removed at any time.

Refer to https://lemmynsfw.com/ for any NSFW imagery.

No misconduct: Harassment, Abuse or assault, Bullying, Illegal activity, Discrimination, Racism, Trolling, Bigotry.

AI Generated Videos are allowed under the same rules. Photosensitivity warning required for any flashing videos.

To embed images type:

“![](put image url in here)”

Follow all sh.itjust.works rules.

Community Challenge Past Entries

Related communities:

!auai@programming.dev
Useful general AI discussion
!aiphotography@lemmings.world
Photo-realistic AI images
!stable_diffusion_art@lemmy.dbzer0.com Stable Diffusion Art
!share_anime_art@lemmy.dbzer0.com Stable Diffusion Anime Art
!botart@lemmy.dbzer0.com AI art generated through bots
!degenerate@lemmynsfw.com
NSFW weird and surreal images
!aigen@lemmynsfw.com
NSFW AI generated porn

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

30 users / day
75 users / week
202 users / month
1.32K users / 6 months
76 local subscribers
8.43K subscribers
4.68K Posts
24.3K Comments
Modlog