Emojis can surpass AI chatbot content restrictions

Language models can exploit emojis to circumvent safety measures, resulting in the generation of harmful content, such as discussions and advice on topics typically blocked, including bomb-making and homicide. A novel partnership between China and Singapore presents substantial proof that...

, and Administrator

2025 September 20 . 6:57 AM

2 min read

Manipulating Emojis can Evade Content Restrictions in Artificial Intelligence Conversational Agents

Emojis can surpass AI chatbot content restrictions

On September 17, 2025, a groundbreaking study was published, shedding light on the impact of emojis on the output of language models. The research, titled "When Smiley Turns Hostile: Interpreting How Emojis Trigger LLMs' Toxicity," was conducted by a team of researchers affiliated with the University of Massachusetts Amherst.

The study focused on the modified AdvBench dataset, which comprises 520 instances, using the top 50 toxic and non-duplicate prompts. To make the dataset more relevant, the researchers creatively rewrote harmful prompts using emojis.

The authors developed a scoring system called GPT-Judge to rate the harmfulness of the models' responses. This system was used to test whether rewriting harmful prompts with emojis would increase toxic output across languages.

The researchers tested their hypothesis on seven major models: Gemini-2.0-flash, GPT-4o (2024-08-06), GPT-4-0613, Gemini-1.5-pro, Llama-3-8B-Instruct, Qwen2.5-7B-Instruct (Team 2024b), and Qwen2.5-72B-Instruct (Team 2024a). The study found that emoji-based prompts achieved notably higher Harmful Scores and Harmfulness Ratios than ablated versions.

Interestingly, the effect of emojis carries across languages. Harmful outputs remained high when the textual components of the emoji prompts were translated into Chinese, French, Spanish, and Russian. This suggests that the repeated exposure to emojis in toxic contexts may encourage models to comply with toxic prompts rather than block them.

The authors argue that many frequently-used emojis appear in toxic contexts such as pornography, scams, or gambling. This normalization of the association between emojis and harmful content could be a concern for the future of language models.

The layout of the paper was chaotic, with methodology and tests not clearly delineated. Despite this, the study provides valuable insights into the role of emojis in shaping the output of language models, and it encourages further research in this area.

Latest

In this picture we observe a fuel tank on which AMBUL is written.

Automotive

Mercedes-Benz Unveils New CLE Coupé: A Powerful Blend of C-Class & E-Class

The new CLE Coupé brings together the best of two worlds. With its powerful engine and advanced features, it's set to make a splash in Australia.

, and Administrator

2025 October 9

In this image, we can see an advertisement contains robots and some text.

AI Revolution

Amazon's New AI-Powered Seller Assistant Boosts U.S. Merchants' Business

Amazon's new AI-driven Seller Assistant is a game-changer for U.S. merchants. It handles crucial tasks, offers valuable insights, and optimizes product distribution, all at no extra cost.

, and Administrator

2025 October 9

In the center of the image, we can see a fly on the net.

Industry

China Condemns US 'Cyber-Theft' at Defense University

China demands answers after US allegedly steals 140GB of data from a top defense university. The US acknowledges its grey zone cyber-activity but denies industrial espionage.

, and Administrator

2025 October 9

In the picture I can see few cameras which are of different types and there is something written...

Tech Pulse's Top Gadget Picks

Amazon's Prime Deal Days 2025: Big Savings on 4K Dashcams

Amazon's Prime Deal Days 2025 brought massive savings on high-quality 4K dashcams. Upgrade your tech now!

, and Administrator

2025 October 9

Emojis can surpass AI chatbot content restrictions

Emojis can surpass AI chatbot content restrictions

Read also:

Related

Latest