Using AI-generated data to train AI could introduce further errors into already error-prone models. Large language models regularly present false information as fact. If they generate incorrect output that is itself used to train other AI models, the errors can be absorbed by those models and amplified over time, making it more and more difficult to work out their origins, says Ilia Shumailov, a junior research fellow in computer science at Oxford University, who was not involved in the project.
Even worse, there’s no simple fix. “The problem is, when you’re using artificial data, you acquire the errors from the misunderstandings of the models and statistical errors,” he says. “You need to make sure that your errors are not biasing the output of other models, and there’s no simple way to do that.”
Williams, R. (2023, June 22). The people paid to train AI are outsourcing their work… to AI. MIT Technology Review. https://www.technologyreview.com/2023/06/22/1075405/the-people-paid-to-train-ai-are-outsourcing-their-work-to-ai/