CAISI Signs Frontier AI Testing Deals with Google, Microsoft, xAI

The U.S. Department of Commerce's Center for AI Standards and Innovation (CAISI) on May 5, 2026 announced new agreements with Google DeepMind, Microsoft and xAI that grant the agency pre-deployment access to their frontier AI models for national-security testing. With OpenAI and Anthropic — already CAISI partners since 2024 — having renegotiated their own agreements to align with the Trump administration's AI Action Plan, every major U.S. frontier AI lab now participates in voluntary government red-teaming before models ship to customers.

What Happened

NIST, which houses CAISI, formally signed Memoranda of Understanding with Google DeepMind, Microsoft and xAI on May 5, 2026, expanding the agency's pre-release evaluation program to all five major U.S. frontier labs. The agreements give CAISI scientists access to models prior to public launch so the agency can probe for cybersecurity, biosecurity and chemical-weapons-related risks, plus what NIST calls "covert malicious behavior" such as backdoors. Reporting from CNBC and Reuters added a notable detail: the labs have agreed to provide model variants with reduced or even disabled safety guardrails so CAISI can measure raw capability rather than only post-RLHF behavior.

The announcement is a significant shift from the previous AI Safety Institute era. Commerce Secretary Howard Lutnick rebranded the AI Safety Institute as CAISI in June 2025, repositioning it as "pro-innovation, pro-science." Today's signings make CAISI the first point of contact between the federal government and U.S. frontier labs for pre-deployment evaluation.

CAISI — Center for AI Standards and Innovation at NIST. The letters AI on a background of binary digits. — NIST's Center for AI Standards and Innovation (CAISI) signed pre-deployment evaluation agreements with Google DeepMind, Microsoft and xAI on May 5, 2026.

Key Details

Signed today: Google DeepMind, Microsoft and xAI — all on May 5, 2026.
Already in the program: OpenAI and Anthropic have had CAISI evaluation partnerships since 2024 and renegotiated their MOUs to align with the new administration's AI Action Plan.
Scope of testing: Cybersecurity (including offensive cyber capability uplift), biosecurity, chemical-weapons-related risks, foreign-AI assessments, and covert behaviors such as model backdoors.
Guardrails-disabled variants: Per CNBC and Reuters, the labs will hand CAISI versions of their models with safety mitigations reduced or removed, so the agency can evaluate underlying capability rather than just shipped behavior.
Voluntary: Participation remains voluntary. There is no statutory requirement that U.S. frontier labs hand models to CAISI before launch.
UK alignment: Microsoft's policy blog also flagged a parallel evaluation track with the UK's AI Security Institute, hinting at coordinated transatlantic red-teaming.

What Developers and Users Are Saying

On Hacker News, the dominant thread on Tuesday flagged two reactions running in parallel. The first was relief that the Trump administration's "pro-innovation" rebrand of the AI Safety Institute did not shut the program down — most commenters had assumed it would. The second was unease about CAISI receiving "guardrail-disabled" model variants. Several engineers asked the obvious follow-up: where do those un-mitigated weights live, who has access, and what is the audit trail. NIST has not publicly answered any of those.

On X, CAISI Director Chris Fall framed the signings as a continuation, not a reset: "Independent, rigorous measurement science is essential to understanding frontier AI and its national security implications." Microsoft VP Brad Smith published a corporate blog post the same day arguing that the agreements set a global benchmark and praising parallel work with the UK AI Security Institute. xAI did not publish its own statement at the time of writing; Google DeepMind's communications were limited to a brief confirmation.

What This Means for Developers

For most application developers, the immediate impact is small: the agreements cover government testing of frontier base models, not the products and APIs developers consume. But three second-order effects are worth tracking. First, CAISI evaluations could become a precondition for government procurement — a major commercial lever even though the program is technically voluntary. Second, expect CAISI evaluation reports (similar to NIST's DeepSeek V4 Pro evaluation published May 1) to become standard public documentation for new frontier releases, giving teams a neutral source for capability and safety claims. Third, the practice of generating "guardrails-disabled" model variants for evaluation creates a new class of high-risk artifacts — engineering teams at the participating labs should expect renewed internal scrutiny of their security perimeter.

What's Next

NIST has not committed to a publication schedule for individual frontier-lab evaluations, but the DeepSeek V4 Pro report on May 1, 2026 set an expectation of public summaries with quantitative benchmarks. The next U.S. frontier model launch — widely expected to be a successor to GPT-5.5 from OpenAI or an Anthropic Opus refresh — will be the first real test of whether CAISI's pre-deployment pipeline can keep pace with the cadence of commercial release. Watch the NIST CAISI page for the next published evaluation.

Sources

NIST CAISI press release — primary source from the Department of Commerce announcing the agreements.
CNBC — reporting on the Trump administration's AI oversight agenda and the disabled-guardrails detail.
CNN Business — independent reporting on the same announcement.
Microsoft On the Issues — Microsoft's own statement and the UK AISI tie-in.
Al Jazeera — international coverage with quotes from Secretary Lutnick.
Engadget — developer-press take on the agreements.