• galaxy_nova@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        6 hours ago

        Huh does that actually work?

        Edit: I realize it probably should given my understanding of tokenization but if it’s training data couldn’t it easily be replaced with like a regex or something?

        • Drusenija@aussie.zone
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 hours ago

          It probably could if everyone did it the same way. But I suspect that isn’t what’s happening, so while our brains pattern recognition the message reasonably easily regardless of the substitution, doing that at scale with regex would be a lot more difficult.