• BussyGyatt@feddit.org
    link
    fedilink
    English
    arrow-up
    30
    arrow-down
    2
    ·
    edit-2
    2 days ago

    well, yes, but the point is they specifically trained chatgpt not to produce bomb manuals when asked. or thought they did; evidently that’s not what they actually did. like, you can probably find people convincing other people to kill themselves on 4chan, but we don’t want chatgpt offering assistance writing a suicide note, right?

    • Otter@lemmy.ca
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      1
      ·
      1 day ago

      specifically trained chatgpt not

      Often this just means appending “do not say X” to the start of every message, which then breaks down when the user says something unexpected right afterwards

      I think moving forward

      • companies selling generative AU need to be more honest about the capabilities of the tool
      • people need to understand that it’s a very good text prediction engine being used for other tasks
      • BussyGyatt@feddit.org
        link
        fedilink
        English
        arrow-up
        1
        ·
        16 hours ago

        my original comment before editing read something like “they specifically asked chatgpt not to produce bomb manuals when they trained it” but i didn’t want people to think I was anthropomorphizing the llm.

      • panda_abyss@lemmy.ca
        link
        fedilink
        English
        arrow-up
        9
        ·
        1 day ago

        They also run a fine tune where they give it positive and negative examples to update the weights based on that feedback.

        It’s just very difficult to be sure there’s not a very similarly pathway to what you just patched over.

        • snooggums@lemmy.world
          link
          fedilink
          English
          arrow-up
          10
          ·
          1 day ago

          It isn’t very difficult, it is fucking impossible. There are far too many permutations to be manually countered.

          • Balder@lemmy.world
            link
            fedilink
            English
            arrow-up
            5
            ·
            1 day ago

            Not just that, LLMs behavior is unpredictable. Maybe it answers correctly to a phrase. Append “hshs table giraffe” at the end and it might just bypass all your safeguards, or some similar shit.

            • snooggums@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              ·
              1 day ago

              It is unpredictable because there are so many permutations. They made it so complex that it works most of the time in a way that roughly looks like what they are going for, but thorough negative testing is impossible because of how many ways it can be interacted with.

              • Balder@lemmy.world
                link
                fedilink
                English
                arrow-up
                3
                ·
                24 hours ago

                It is unpredictable because there are so many permutations

                Actually LLMs are unpredictable not only because the space of possible outputs (combinatorics) is huge, though that also doesn’t help us understand them.

                Like there might be an astronomical number of different proteins but biophysics might be able to make somewhat accurate predictions based on the properties we know (even if it requires careful testing in the real thing).

                For example, it might be tempting to calculate the tokens associations somehow and kinda create a function mapping what happens when you add this or that value in the input to at least estimate what the result would be.

                But what happens with LLMs is changing one token in a prompt produces a sometimes disproportionate or unintuitive change in the result, because it can be amplified or dampened depending on the organization of the internal layers.

                And even if the model’s internal probability distribution were perfectly understood, its sampling step (top-k, nucleus sampling, temperature scaling) adds another layer of unpredictability.

                So while the process is deterministic in principle, it’s not calculable in a tractable sense—like weather prediction.

                • snooggums@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  23 hours ago

                  The randomness itself isn’t the direct cause of the topic in the post though, because otherwise it wouldn’t be possible to reproduce the steps to get around any guardrails the system has.

                  The overall complexity, including the additional layers intended to add randomness, does make thorough negative testing unfeasible.