Hmm. While I don’t know what their QA workflow is, my own experience is that working with QA people to design a QA procedure for a given feature tends to require familiarity with the feature in the context of real-world knowledge and possible problems, and that human-validating a feature isn’t usually something done at massive scale, where you’d get a lot of benefit from heavy automation.
It’s possible that one might be able to use LLMs to help write test code — reliability and security considerations there are normally less-critical than in front-line code. Worst case is getting a false positive, and if you can get more test cases covered, I imagine that might pay off.
Square does an MMO, among their other stuff. If they can train a model to produce AI-driven characters that act sufficiently like human players, where they can theoretically log training data from human players, that might be sufficient to populate an MMO “experimental” deployment so that they can see if anything breaks prior to moving code to production.
“Because I would love to be able to start up 10,000 instances of a game in the cloud, so there’s 10,000 copies of the game running, deploy an AI bot to spend all night testing that game, then in the morning we get a report. Because that would be transformational.”
I think that the problem is that you’re likely going to need more-advanced AI than an LLM, if you want them to just explore and try out new features.
One former Respawn employee who worked in a senior QA role told Business Insider that he believes one of the reasons he was among 100 colleagues laid off this past spring is because AI was reviewing and summarising feedback from play testers, a job he usually did.
We can do a reasonable job of summarizing human language with LLMs today. I think that that might be a viable application.
Worst case is getting a false positive, and if you can get more test cases covered, I imagine that might pay off.
False positives during testing are a huge time sink. QA has to replicate and explain away each false report and the faster AI ‘completes’ tasks the faster the flood of false reports come in.
There is plenty of non-AI automation that can be used intentionally to do tedious repetitive tasks already where they only increase work if they aren’t set up right.
Hmm. While I don’t know what their QA workflow is, my own experience is that working with QA people to design a QA procedure for a given feature tends to require familiarity with the feature in the context of real-world knowledge and possible problems, and that human-validating a feature isn’t usually something done at massive scale, where you’d get a lot of benefit from heavy automation.
It’s possible that one might be able to use LLMs to help write test code — reliability and security considerations there are normally less-critical than in front-line code. Worst case is getting a false positive, and if you can get more test cases covered, I imagine that might pay off.
Square does an MMO, among their other stuff. If they can train a model to produce AI-driven characters that act sufficiently like human players, where they can theoretically log training data from human players, that might be sufficient to populate an MMO “experimental” deployment so that they can see if anything breaks prior to moving code to production.
I think that the problem is that you’re likely going to need more-advanced AI than an LLM, if you want them to just explore and try out new features.
We can do a reasonable job of summarizing human language with LLMs today. I think that that might be a viable application.
False positives during testing are a huge time sink. QA has to replicate and explain away each false report and the faster AI ‘completes’ tasks the faster the flood of false reports come in.
There is plenty of non-AI automation that can be used intentionally to do tedious repetitive tasks already where they only increase work if they aren’t set up right.