Unit 8200 created AI language learning tool from intercepted Palestinian Arabic comms - report

The investigation found that Unit 8200 trained the large language model (LLM) to understand spoken Arabic.

 ILLUSTRATION: AI transposed over Arabic text (photo credit: Canva, INGIMAGE)
ILLUSTRATION: AI transposed over Arabic text
(photo credit: Canva, INGIMAGE)

Israel’s military surveillance Unit 8200 has reportedly developed a vast database of intercepted Palestinian communications in order to construct an artificial intelligence tool similar to ChatGPT, a joint investigation by The Guardian, +972 Magazine and Mekomit alleged on Thursday.

Israel reportedly hopes that the resulting AI tool "will transform its spying capabilities."

The investigation by the three papers found that Unit 8200 trained the large language model (LLM) to understand spoken Arabic, specifically Palestinian Arabic but also Lebanese dialects, by giving it significant amounts of intercepted telephone conversations and texts.

This was done with the aim of building an advanced chat bot able to answer questions about and provide insights into the people it is monitoring, three Israeli sources told +972 Magazine and The Guardian.

In an AI conference in Tel Aviv last year, a former intelligence officer called Chaked Roger Joseph Sayedoff said he had been part of the development of the model.

 UNIT 8200 soldiers in action – working with data. (credit: IDF SPOKESPERSON'S UNIT)
UNIT 8200 soldiers in action – working with data. (credit: IDF SPOKESPERSON'S UNIT)

“We tried to create the largest dataset possible [and] collect all the data the state of Israel has ever had in Arabic,” he said.

“It’s not just about preventing shooting attacks," a source told +972 Magazine and The Guardian, "I can track human rights activists, monitor Palestinian construction in Area C [of the West Bank]. I have more tools to know what every person in the West Bank is doing.”

The Guardian alleges that the Unit 8200 officers were mobilized from major US tech companies, such as Google, Meta and Microsoft, chosen because of their expertise in building LLMs.

The development of the LLM

The development of the LLM encountered challenges because many open-source and commercial Arabic-language models were trained using standard written Arabic and not spoken Arabic, the report continued.


Stay updated with the latest news!

Subscribe to The Jerusalem Post Newsletter


One source told +972 Magazine and The Guardian that “There are no transcripts of calls or WhatsApp conversations on the internet. It doesn’t exist in the quantity needed to train such a mode."

Therefore, the officers needed to collect all the spoken Arabic text the unit obtained and "put it into a centralized place." Sayedoff said that the unit “focused only on the [Arabic] dialects that hate us”.

The LLM was also specifically trained to be aware of specific military terminology used by terrorist groups, the sources said. 

According to the source, the model’s training data consists of approximately 100 billion words.

That said, the investigation did not ascertain if the AI tool has actually been deployed, however it was reportedly still being trained in latter half of 2024.

 Sayeret Haruv forces operating in Jenin (credit: IDF SPOKESPERSON'S UNIT)
Sayeret Haruv forces operating in Jenin (credit: IDF SPOKESPERSON'S UNIT)

Zach Campbell, a senior surveillance researcher at Human Rights Watch (HRW), expressed concern that the LLM would be used by Unit 8200 to make significant decisions about the lives of Palestinians in the West Bank.

“It’s a guessing machine,” he told the three investigating outlets. “And ultimately these guesses can end up being used to incriminate people.”

The IDF spokesperson told the Jerusalem Post that "the IDF operates with a variety of intelligence methods and tools to detect and suppress terrorist activity by hostile organizations in the Middle East."

However, they added that, due to the sensitivity of the information, "it is not possible to provide details on specific means and methods, including the information processing processes."

Israel's use of AI

As The Guardian and +972 Magazine explains, AI usage is not new to Unit 8200, which has been using it for around ten years as a way of analyzing communications and recognizing patterns. Machine learning has also been used since the start of the Israel-Hamas war, for example as a way of identifying targets.

The Associated Press reported in February that the IDF's use of Microsoft and OpenAI technology "skyrocketed," following October 7, and was specifically used for identifying targets swiftly.

“These AI tools make the intelligence process more accurate and more effective,” the IDF told AP. “⁠They make more targets faster, but not at the expense of accuracy, and many times in this war they’ve been able to minimize civilian casualties.”

While many countries' spy agencies use LLMs, or are developing them, a former western spy chief said Israel's usage of Palestinian communications allowed it to use AI in ways “that would not be acceptable” among other spy agencies.

Campbell called the LLM “invasive and incompatible with human rights."

“We’re talking about highly personal data taken from people who are not suspected of a crime, being used to train a tool that could then help establish suspicion,” he said.

Brianna Rosen, a former White House national security official and senior research associate at Oxford university, said her main concern was that a ChatGPT-like tool can be based on errors or make mistakes: “Mistakes are going to be made, and some of those mistakes may have very serious consequences."

The IDF allegedly did not respond to The Guardian’s queries around how Unit 8200 will prevent inaccuracies and biases. 

“However, the IDF implements a meticulous process in every use of technological abilities,” they added. “That includes the integral involvement of professional personnel in the intelligence process in order to maximize information and precision to the highest degree.”

The use of AI by the IDF was reported by high-ranking officers in February 2023. The officers revealed that IDF uses AI apparatuses to assist in offensive decision-making, for example, if a target is a military or a civilian one.

In addition, some defensive tools are used, like ones that alert forces that they are under threat of a rocket or a missile or that assist in better safeguarding border movement.

In December 2024, the Washington Post reported on an AI tool called Habsora — or "the Gospel" — used by the IDF to rapidly refill its "target bank," a list of Hamas and Hezbollah terrorists to be killed during military operations, along with details about their whereabouts and routines.