One of Google’s “major AI” projects has itself revealed some serious security threats
- Project Zero and DeepMind “big AI” discover security vulnerabilities
- Big Sleep finds an underflow error in the SQLite stack buffer before the official release
- AI could revolutionize software development by discovering critical flaws
A joint “major AI” project between Google Project Zero and Google DeepMind has discovered a critical vulnerability in a piece of software before it was publicly released.
The Big Sleep AI agent went to work analyzing the SQLite open source database engine, where it discovered an underflow error in the stack buffer that was subsequently patched the same day.
This discovery may mark the first time ever that an AI has discovered a memory safety flaw in a widely used application.
Fuzzed software, surpassed by AI
Big Sleep discovered the stack buffer flush vulnerability in SQLite, which was fuzzed multiple times.
Fuzzing is an automated software testing method that can discover potential flaws or vulnerabilities, such as memory safety issues that are commonly exploited by attackers. However, it is not a foolproof method of hunting for vulnerabilities, and a hidden vulnerability that is found and patched may also exist as a variant elsewhere in the software and remain undiscovered.
The methodology Google used in this case was to provide a previously patched vulnerability as a starting point for the Big Sleep agent, and then unleash it to hunt for similar vulnerabilities elsewhere in the software.
While searching for a similar vulnerability, Big Sleep came across a vulnerability and traced the steps it took to recreate the vulnerability in a test case, gradually narrowing down the potential causes to a single issue and accurately summarizing the vulnerability was generated.
Google Project Zero points out that the bug was not previously caught using traditional fuzzing techniques because the fuzzing harness was not configured to access the same extensions. However, when the fuzzing was rerun with the same configurations, the vulnerability remained undiscovered despite 150 CPU hours of fuzzing.
“We hope that this effort will lead to a significant benefit for defenders in the future – with the potential to not only find crashing test cases, but also to provide high-quality root cause analysis, triage and troubleshooting would be much cheaper and could be better. effective in the future,” said the Big Sleep team. “We aim to continue sharing our research in this area, minimizing the gap between the public state-of-the-art and the private state-of-the-art.”
Full testing methodology and vulnerability discovery details can be found here.