• 0 Posts
  • 275 Comments
Joined 2 years ago
cake
Cake day: June 12th, 2023

help-circle
  • It’s not easy. LLMs aren’t intelligent, they just slap words together in a way probability and their training data says they would most likely fit together. Talk to them them about suicide, and they start outputting stuff from murder mystery stories, crime reports, unhealthy Reddit threads etc - wherever suicide is most written about.

    Trying to safeguard with a prompt is trivial to circumvent (ignore all previous instructions etc), and input/output censorship usually causes the LLM to be unable to talk about a certain subject in any possible context at all. Often the only semi-working bandaid is slapping multiple LLMs on top of each other and instructing each one to explain what the original one is talking about,and if one says the topic is something prohibited, that output is entirely blocked.