New Study Finds Conversing Politely with ChatGPT is Futile

A Fresh Spin on AI Politeness: It's More Than Just a Formal 'Please'

It turns out that being polite to AI chatbots might not be as beneficial as we thought, bucking the trends of earlier studies.

A recent study, published by researchers at George Washington University, challenges the notion that adding polite expressions like "please" and "thank you" to AI prompts improves their responses. According to this new research, such language has a minimal impact on the quality of AI responses, contradicting popular belief and previous studies.

The study, which was released on arXiv earlier this week, comes on the heels of OpenAI CEO Sam Altman's statement that users incorporating polite phrases in their prompts cost the company significant computing resources, amounting to tens of millions of dollars in additional token processing.

However, this contradicts a 2024 study from Japan that found politeness positively impacted AI performance, particularly in English language tasks. That study tested various LLMs, including GPT-3.5, GPT-4, PaLM-2, and Claude-2, and demonstrated that politeness produced measurable performance advantages.

David Acosta, Chief AI Officer at Arbo AI, shared his thoughts on the matter with Decrypt, suggesting that the George Washington study's model may be too rudimentary to accurately represent real-world systems. Acosta noted that while politeness might elicit some response from simpler LLMs, complex models like ChatGPT are less impacted by such language.

Acosta insisted that there's more to prompt engineering than simple mathematics, stressing the importance of cultural differences in training data, task-specific design nuances, and contextual interpretations of politeness. He highlighted the need for cross-cultural experiments and task-specific evaluation frameworks to gain a better understanding of politeness's impact on AI performance.

The researchers behind the George Washington study admit that their model is oversimplified in comparison to commercial systems like ChatGPT, but suggest that their findings should still apply as more sophisticated models are developed. Their research revealed that an AI's performance collapse, often referred to as the "Jekyll-and-Hyde tipping point," is primarily determined by an AI's training and the substantive words in user prompts, not courtesy.

Polite language was found to be "orthogonal" to good and bad output tokens, with a negligible impact on the results. This means that these words exist in separate areas of the AI's internal space and have minimal effect on the model's outputs.

The GWU study focuses on the mathematical explanation of when and why AI outputs deteriorate, discovering that the collapse is due to a "collective effect" where the AI's attention is spread increasingly thinly across a growing number of tokens as the response length increases. Eventually, it reaches a threshold where the AI's attention snaps towards potentially problematic content it learned during training.

The mathematical tipping point, labeled n*, is predetermined from the moment the AI starts generating a response, according to the researchers. This means that the eventual quality collapse is preprogrammed, even if it occurs many tokens into the generation process. The study provides a formula for predicting the collapse based on the AI's training and the content of the user's prompt.

Despite the findings, many users still approach AI interactions with a degree of human-like politeness. Nearly 80% of users from the U.S. and U.K. are polite to their AI chatbots, according to a recent survey by Future. This behavior may persist, as people tend to anthropomorphize the systems they interact with. Chintan Mota, the Director of Enterprise Technology at Wipro, explained that this politeness stems from cultural habits rather than performance expectations.

Efficiency, however, may gradually trump politeness, as advanced AI models are designed to respond like humans and aim to achieve praise. LLMs are trained to generate responses that are clear, professional, and borderline human-like, making politeness an integral part of their output.

Edited by Sebastian Sinclair and Josh Quittner

Fascinating Footnotes

This research suggests that being polite to AI models like ChatGPT is not only ineffective but may also prove to be a financial burden due to additional token processing costs.
The study contradicts a 2024 Japanese study that found politeness improved AI performance, particularly in English language tasks.
David Acosta, Chief AI Officer at Arbo AI, argues that there's more to prompt engineering than simple math, especially considering AI models are much more complex than the oversimplified version used in the study.

The study published by researchers at George Washington University suggests that the use of polite expressions in AI prompts does not significantly improve the quality of its responses, contradicting popular belief and previous studies.
OpenAI CEO Sam Altman mentioned that users incorporating polite phrases in their prompts results in additional token processing costs, amounting to tens of millions of dollars in computational expenses.
The impact of politeness on AI performance is subject to debate, as a 2024 study from Japan found favorable results in English language tasks, while a recent study by GWU claims otherwise.
David Acosta, Chief AI Officer at Arbo AI, argues that the GWU study's model is oversimplified, so its findings may not accurately represent more complex AI systems, such as ChatGPT.
The GWU study reveals that the AI's performance collapse is primarily determined by factors such as its training and the substantive words in user prompts, rather than courtesy.