Another side of the AI tree: detecting what makes AI

Tech & Gadgets

Andrei Doronichev was shocked last year when he saw a video on social media that appeared to show the president of Ukraine surrendering to Russia.

The video was soon exposed as a synthetically generated deepfake, but for Mr. Doronichev it was a disturbing omen. This year, his fears crept closer to reality, as companies began competing to improve and release artificial intelligence technology, despite the havoc it could cause.

Generative AI is now available to everyone and is increasingly able to fool humans with text, audio, images and videos that appear to have been created and captured by humans. The risk of societal credulity has led to concerns about disinformation, job losses, discrimination, privacy and broad dystopia.

For entrepreneurs like Mr. Doronichev, it has also become a business opportunity. More than a dozen companies now offer tools to determine whether something has been made with artificial intelligence, with names such as Sensity AI (deepfake detection), Fictitious.AI (plagiarism detection) and Originality.AI (also plagiarism).

Born in Russia, Mr. Doronichev founded a company in San Francisco, Optic, to help identify synthetic or counterfeit material – to be, in his words, “an airport x-ray machine for digital content”.

It was revealed in March a website where users can check images to see if they were created by real photos or artificial intelligence. It is working on other services to verify video and audio.

“Content authenticity is becoming a big issue for society as a whole,” says Doronichev, who was an investor in a face-swapping app called Reface. “We are entering the age of cheap counterfeits.” Because it doesn’t cost much to produce fake content, he said, it can be done on a large scale.

According to market research firm Grand View Research, the total generative AI market is expected to exceed $109 billion by 2030, growing at an average rate of 35.6 percent until then. Companies focused on detecting the technology are a growing part of the industry.

Months after being created by a Princeton University student, GPTZero claims that more than a million people have used its program to decipher computer-generated text. Reality Defender was one of them 414 companies chosen from 17,000 applications funded this winter by the start-up accelerator Y Combinator.

CopyLeaks raised $7.75 million last year, in part to expand its anti-plagiarism services for schools and universities to detect artificial intelligence in student work. sentinelwhose founders specialized in cybersecurity and information warfare for the UK’s Royal Navy and the North Atlantic Treaty Organization closed a $1.5 million seed round in 2020 backed in part by one of Skype’s founders to help democracies help protect against deepfakes and other malicious synthetic media.

Large tech companies are also involved: those of Intel fake catcher claims to be able to identify deepfake videos with 96 percent accuracy, in part by analyzing pixels for subtle signs of blood flow in human faces.

Within the federal governmentthe Defense Advanced Research Projects Agency plans to issue nearly $30 million this year to run Semantic Forensics, a program that develops algorithms to automatically detect deepfakes and determine if they are malicious.

Even OpenAI, which boosted the AI boom when it released its ChatGPT tool late last year, is working on discovery services. The company, based in San Francisco, debuted a free tool in January to distinguish between text written by a human and text written by artificial intelligence.

OpenAI stressed that while the tool was an improvement over previous iterations, it was still “not completely reliable.” The tool correctly identified 26 percent of artificially generated text, but incorrectly flagged 9 percent of human text as computer-generated.

The OpenAI tool suffers from common shortcomings in detection tools: it struggles with short texts and writing that is not in English. In educational settings, plagiarism detection tools like TurnItIn have been accused inaccurate classify essays written by students as being generated by chatbots.

Detection tools inherently lag behind the generative technology they are trying to detect. By the time a defense system can recognize the work of a new chatbot or image generator, such as Google Bard or Midjourney, developers are already coming up with a new iteration that can bypass those defenses. The situation has been described as an arms race or a virus-antivirus relationship where one fathers the other over and over again.

“When Midjourney releases Midjourney 5, my go-ahead goes off and I start working to catch up — and while I do, they’re working on Midjourney 6,” said Hany Farid, a computer science professor at the University of California, Berkeley, who specializes in digital forensics and is also involved in the AI detection industry. “It’s an inherently hostile game where while I’m working on the detector, someone is building a better mousetrap, a better synthesizer.”

Despite the constant catching-up, many companies have seen demand for AI sensing from schools and educators, said Joshua Tucker, a professor of politics at New York University and co-director of the Center for Social Media and Politics. He wondered if a similar market would emerge before the 2024 election.

“Will we see some sort of parallel wing of these companies develop to help protect political candidates so they can know when they’re being targeted for this sort of thing,” he said.

Experts said synthetically generated video was still quite clunky and easy to identify, but audio cloning and image creation were both highly advanced. Separating real from fake requires digital forensic tactics such as reverse image searches and IP address tracking.

Available detection tools are being tested with samples that are “very different from going out in the wild, where images have been doing the rounds and modified and cropped and resized and transcoded and annotated and God knows what else has happened to them,” Mr Farid said .

“That content laundering makes this a difficult task,” he added.

The Content Authenticity Initiative, a consortium of 1,000 companies and organizations, is a group trying to make generative technology clear from the start. (It’s led by Adobe, with members like The New York Times and artificial intelligence players like Stability AI) Rather than trying to trace the origins of an image or video later in its lifecycle, the group tries to set standards applicable traceable references to digital work upon creation.

Adobe said last week that its generative technology would be Firefly integrated into Google Bardwhere it will attach “nutrition labels” to the content it produces, including the date an image was created and the digital tools used to create it.

Jeff Sakasegawa, the trust and security architect at Persona, a company that helps verify consumer identity, said the challenges of artificial intelligence were just beginning.

“The wave is building momentum,” he said. ‘It’s heading for the coast. I don’t think it crashed yet.”

Another side of the AI ​​tree: detecting what makes AI

Another side of the AI tree: detecting what makes AI