• galaxy_nova@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      6 hours ago

      Huh does that actually work?

      Edit: I realize it probably should given my understanding of tokenization but if it’s training data couldn’t it easily be replaced with like a regex or something?

      • Drusenija@aussie.zone
        link
        fedilink
        English
        arrow-up
        1
        ·
        3 hours ago

        It probably could if everyone did it the same way. But I suspect that isn’t what’s happening, so while our brains pattern recognition the message reasonably easily regardless of the substitution, doing that at scale with regex would be a lot more difficult.