Does it make a difference whether the same question is asked of ChatGPT – the artificial intelligence chatbot developed by OpenAI and launched two years ago – in English or Hebrew, Arabic, or German? Israel has enough problems with the free online encyclopedia Wikipedia that has been known to present anti-Israel “facts” when people look it up.
The website of ChatGPT, which can generate human-like conversational responses and enables users to refine and steer a conversation toward a desired length, format, style, level of detail, and language – is among the world’s 10 most-visited websites globally.
Experts at the University of Zurich in Switzerland and the University of Konstanz in Germany wanted to know whether the responses given by ChatGPT and similar AI services are influenced by the language in which they are asked. Christoph Steinert, a postdoc at the political science department of the Swiss institution, and Daniel Kazenwadel from the German university’s physics department investigated the matter.
They published their findings in the Journal of Peace Research under the title “User language distorts ChatGPT information on armed conflicts.”
Researchers have long recognized that information discrepancies play a profound role in armed conflicts. Discrepancies in information have affected armed conflicts throughout history, but what distinguishes today’s conflicts is the availability of an unprecedented amount of information sources.
Today, people can draw on abundant online information about conflict-related events and even use AI to get targeted answers to specific questions. To the extent that these new sources of information reduce information discrepancies and contribute to a convergence of beliefs, they may have a pacifying effect on war-prone regions.
They explored the issue in the contentious context of the Israeli-Palestinian and Turkish-Kurdish conflicts, using an automated query procedure to ask ChatGPT the same questions in different languages.
For example, the researchers repeatedly prompted ChatGPT in Hebrew and Arabic about the number of people killed in 50 randomly chosen airstrikes, including the Israeli attack on the Nuseirat refugee camp on August 21, 2014.
“We found that ChatGPT systematically provided higher fatality numbers when asked in Arabic compared to questions in Hebrew. On average, fatality estimates were 34% higher,” Steinert revealed.
When asked about Israeli airstrikes on Gaza, ChatGPT mentioned civilian casualties more than twice as often and killed children six times more often in the Arabic version. “If people who speak different languages obtain different information through these technologies, it has a crucial influence on their perception of the world,” Steinert said.
“Such language biases could lead people in Israel to perceive airstrikes on Gaza as causing fewer casualties based on information provided by LLMs, compared to Arabic speakers,” they wrote.
The same pattern emerged when the researchers queried the chatbot about Turkish airstrikes against Kurdish targets and asked the same questions in Turkish and Kurdish.
The first casualty of war
THE PHRASE “The first casualty when war comes is truth” is often attributed to US Senator Hiram Johnson (1866-1945). Throughout history, selective information policies, propaganda, and misinformation have influenced numerous armed conflicts. What sets current conflicts apart is the availability of an unprecedented number of information sources – including ChatGPT.
The results show that ChatGPT provides higher casualty figures when asked in the language of the attacked group. In addition, ChatGPT is more likely to report on children and women killed in the language of the attacked group and to depict the airstrikes as indiscriminate.
“Our results also show that ChatGPT is more likely to deny the existence of such airstrikes in the language of the attacker,” Steinert added.
They suggested that the two conflicts they studied could represent “most-likely cases” for finding such a language bias, as the linguistic divide is clear-cut in these two-country conflicts, whereas it is less pronounced in other conflicts such as Russia’s war of aggression in Ukraine.
It is also possible that airstrikes represent a type of conflict-related violence that is especially affected by this language bias, as fatality numbers are particularly difficult to verify and media coverage is more extensive compared to other types of smaller attacks.
“Being aware of these scope conditions, we believe that our analysis provides a useful starting point for future research on the link between user language and information on conflict-related violence provided by large language models (LLMs),” the team wrote.
Physical obstacles such as damaged power grids, blocked roads, and destroyed bridges make life difficult for journalists and human-rights organizations, they continued.
“Fact-finding needs to be constantly adapted to local security concerns, as a significant number of journalists are killed while reporting in conflict societies. Because information is chronically difficult to verify, media reports of conflict-related violence tend to underreport the true incidence of violent events.,” they added.
The researchers believe their findings have “profound social implications,” as ChatGPT and other LLMs play an increasingly important role in information dissemination processes. Integrated in search engines such as Google Gemini or Microsoft Bing, they fundamentally shape the information provided on various topics through search queries.
“If people who speak different languages obtain different information through these technologies, it has a crucial influence on their perception of the world,” Steinert said.
Such language biases could lead people in Israel to perceive airstrikes on Gaza as causing fewer casualties based on information provided by large language models (LLMs in machine-learning models) compared to Arabic speakers.
While LLMs provide an appearance of objectivity, the information obtained may differ between people who speak different languages. As a prominent example, the popular chatbot ChatGPT relies on the logic of prompting, meaning that the answers obtained are a function of the information provided in the question prompt.
In multilingual contexts, individuals are likely to provide question prompts in different languages, which may shape the content produced by the LLM.
“There is a risk that the increasing implementation of LLM models in search engines reinforces different perceptions, biases, and information bubbles along linguistic divides,” Steinert concluded, and he worries that in the future, they could fuel armed conflicts like those in the Middle East.