Most of the training done to today’s AI models to make them reject “unsafe” queries is done as “fine-tuning”: adjustments to the model after it is trained. But anyone who has a copy of Llama 2 can fine-tune it themselves.
That, some experts in the field worry, makes much of the meticulous red-teaming effectively meaningless: Anyone who doesn’t want their model to be a scold (and who wants their model to be a scold?) will fine-tune themselves and get the model to be more useful. This is nearly the entire benefit of the Llama 2 release over other models that were already publicly available. But it means that Meta’s finding that the model is very safe under their own preferred fine-tuning is approximately meaningless: It doesn’t describe how the model will actually be used.
Read more:
Piper, K. (2023, August 2). Why Meta’s move to make its new AI open source is more dangerous than you think. Vox. https://www.vox.com/future-perfect/23817060/meta-open-source-ai-mark-zuckerberg-facebook-llama2