The paper demonstrated that automated adversarial attacks, mainly done by adding characters to the end of user queries, could be used to overcome safety rules and provoke chatbots into producing harmful content, misinformation, or hate speech.
Unlike other jailbreaks, the researchers’ hacks were built in an entirely automated fashion, which they said allowed for the potential to create a “virtually unlimited” number of similar attacks.
Nolan, B. (2023, July 28). AI researchers say they’ve found ‘virtually unlimited’ ways to bypass Bard and ChatGPT’s safety rules. Business Insider. https://www.businessinsider.com/ai-researchers-jailbreak-bard-chatgpt-safety-rules-2023-7