AI Infrastructure Security - Stage 1 Report

7 Aug

Introduction

The AI Security project was designed to address a growing need in the field of AI governance: how to forecast progress in AI security in a way that is timely, decision-relevant, and grounded in concrete milestones.

For the purposes of Stage 1 (pilot phase), the Swift Centre collaborated with the AI Security Forum and through sessions with industry experts chose to focus on forecasting developments in model protection as a tractable and policy-relevant entry point into the broader landscape of AI risk mitigation.

While many efforts in AI risk forecasting focus on outcomes (such as catastrophic risks or emergence of dangerous capabilities) this pilot approached the problem from a complementary angle: focusing on security measures that might reduce the probability or severity of such risks. In other words, we asked: “When will meaningful security measures be adopted, and what would affect their implementation timelines?”

This perspective aims to equip policymakers, funders, and practitioners with early indicators of institutional maturity and preparedness alongside potential for interventions to deliver stronger security of AI development.

To guide this work, the project focused on the RAND SL1–SL5 framework, currently one of the most comprehensive frameworks for evaluating AI infrastructure security.

The project followed a workshop-based process to co-develop forecasting questions and generate forecasts. Workshops were held online across a six-week period (May–July 2025), and included subject matter experts from AI policy, cybersecurity, and frontier AI development, alongside world leading trained Swift Centre forecasters.

Workshop process:

Workshop 1 (May 27): Mapping the landscape of AI security and RAND SL levels
Workshop 2 (June 5): Developing and refining forecasting questions through group discussion
Workshop 3: (June 26) Clarifying questions, sharing initial forecasts, discussing divergences
Workshop 4: (July 3) Review of forecasts, scenario evaluations, and submission of final judgments

Deadline to submit forecasts was July 7, 2025

This report summarises the results, and outlines a roadmap for scaling this work into a broader research initiative with a possibility to collaborate with strategic partners.

While Phase 1 focused specifically on two representative security milestones, future phases will expand both the breadth and depth of the work – covering a wider set of security measures and diving deeper into the scenarios and drivers that shape their adoption. This pilot is laying the foundation for a sustained forecasting effort to support decision-makers navigating the evolving landscape of AI security.

Results

Within this report you will see embedded charts which were produced by our application. They are a compilation of all forecasts and are interactive. Below the charts are “pills” that group forecasts by even buckets (e.g. 10-15%, 15-20%). The darker the color, the more forecasts that fall within the range. You are able to hover over these pills to view relevant comments by our forecasters.
The main value displayed is the geometric mean. If they are within bounds, the 25th, 50th, and 75th cumulative probabilities are graphically demarcated with vertical dotted lines (with the median line in bold).

QUESTION 1:

Will three or more frontier AI labs publicly commit to external cybersecurity audits aligned with RAND SL3 by the end of 2026?

https://viz.swiftcentre.org/results/lxZ9oc06U2k/1754485792491?r2_url=https%3A%2F%2Fdata.swiftcentre.org&show=hideTitle&showQuestions=lxZ9oc06U2k

The final aggregated forecast for this question is 38%.

There was significant variance across the forecasts provided (with assessments for this question ranging from 9% to 70%), reflecting divergent expectations about whether labs will both adopt SL3-aligned audits and choose to publicize those commitments.. Several key themes emerged from participant rationales and workshop discussions:

Widespread adoption of SL3 practices, but hesitation to go public. Most participants agreed that labs are likely approaching SL3-level security internally, citing hiring trends and existing audit-like practices as indicators of progress. However, public commitments to external audits were seen as a higher bar, constrained by reputational risk, strategic considerations, and the absence of regulatory pressure...

“This seems to be largely a question about the likelihood of public commitments. It is apparently almost certain that at least three frontier labs will meet these standards by the resolution date, and indeed may do so already in some cases (e.g Meta). The younger labs of course have a large and growing incentive to meet these standards, to protect their own IP as much as anything else. But public commitment of annual external audits is a high bar to clear simply because the labs may well view such an announcement as entirely unnecessary if they are meeting the standards already, and there is no particular likelihood of strong external pressure to compel such an announcement.”

“The director of security would likely prefer to remain silent rather than publicly state that everything is under control and then be excoriated if there is a breach...”

“[...] From a business-perspective, publicly committing to something like this (that could come back to bite them), seems counterproductive.”

“Three labs/models all publicly committing is very unlikely considering there is no current legal requirement. I would be much higher if this was 1 lab or even 2 labs. But 3 labs would imply a degree of industry cooperation that I have not observed to date.”

A small minority of participants held a more optimistic view, forecasting 65–70% probability and pointing to early signs of traction:

“There seems to be strong interest in this area from at least two of these companies (OpenAI and Anthropic with SOC2, as others have pointed out). OpenAI have also offered things like bug bounties for security testing.”
Momentum and peer effects. Some forecasters argued that public commitments by one or two labs could trigger a wave of similar announcements, especially if tied to competitive dynamics or investor expectations.

“I think the probability that just 1 lab commits to this is considerably higher (Anthropic seeming most likely), but I feel the more likely possibility is that most labs run internal audits and publicly discuss findings. That said, there is probably a momentum to this - once 2 labs have made this public commitment, there is incentive for the other labs to match the external auditing to maintain market trust.”

“If one company makes a public statement that they have met the SL3 requirements, the pressure for at least two more to rapidly follow will be intense.”
Regulatory and political environment. Participants emphasized that expectations about public audit commitments were closely tied to the broader political context. Many noted that, in the current U.S. environment, there is minimal regulatory pressure or incentives for transparency. Most believed that audit commitments would stem from internal governance decisions – unless triggered by a crisis. As one participant noted:

“If a significant episode exposed a privacy or financial risk, then public opinion might galvanize to the point labs would have to act.”

Q1 - Conditional forecasts

To better understand what might shift the likelihood of public audit commitments, participants were asked to forecast the probability of the main question under three specific scenarios: a major incident involving AI-related damages, the public theft or leak of frontier model weights, and a formal Chinese government ban on open-sourcing domestic frontier models.

https://viz.swiftcentre.org/results/lxZ9oc06U2k/1754485792491?r2_url=https%3A%2F%2Fdata.swiftcentre.org&showQuestions=fArJZHzdBD4%7Ca7g7IGT0HUw%7C5noaZK0Z1gI

Results are summarized in the table below:

The results reveal that participants generally viewed public audit commitments as reactive, with shifts in the probability of the main question tied to whether a given scenario would create sufficient external pressure—either from regulators, media, or peers—to prompt greater transparency.

AI model causes over $100M in damages before end of 2026

Participants were split on how impactful this case scenario would be. Some saw it as potentially leading to serious public scrutiny and demands for accountability – especially if the damages clearly stemmed from failures in security or deployment oversight.

“I assess this would be a significant driving function. Especially if the financial loss would be imparted on individuals. Any event which creates public scrutiny of labs' safety postures would prompt a desire or need to reestablish trust and confidence in the lab's model. And it is easy to publicly commit to an audit. Particularly as there is no requirement for the lab to commit to corrective action from the audit.”

However, many others believed that $100M was too low a threshold to produce industry-wide effects, and that such incidents would more likely be seen as downstream deployment issues than failures of cybersecurity.

“Even if a 'major' incident happens [...] I’m not sure it would lead directly to audit commitments – especially if the issue was more about misuse or downstream deployment than security failures. Labs might respond in other ways first.”
“$100M is just not a lot of money for these companies [...] and for audits to happen as a result, it would have to be quite closely related to what SL3 seeks to prevent.”

Some participants also expressed uncertainty about attribution, noting that unless the incident could be clearly linked to a lapse in model protection, it may not spur audits at all. That said, a few believed a major frontier-lab-linked incident could produce a significant shift if it caught political or media attention.

Frontier model weights are publicly acknowledged as stolen or leaked

This scenario had the strongest perceived impact on this question, with most participants agreeing it would meaningfully increase the likelihood of audit commitments. The majority saw this as a scenario directly tied to model protection failures – making SL3 audits a relevant and visible response.

“This seems like the conditional that’s most likely to genuinely change the probability here. I think if a foreign actor stole the weights, it would result in quite a big commotion and a sudden influx of labs making various commitments.”
“If a major theft or leak is confirmed, I’d bump my forecast up significantly. That kind of public breach would likely trigger a wave of pressure from government, media, and even peers for labs to prove they’re securing their models.”

Still, a few participants cautioned that public commitments to audits may not be the first-order response, particularly if the breach affected only one lab or if internal fixes could be made quietly.

“Although there has been no confirmed theft of frontier models, if this happened it does make it more likely that other labs would commit to external auditing. But audits themselves introduce another vulnerability – more people means more potential attack vectors.”
“Frontier labs are too big for them not to outrun this type of short-term reputational damage. I do not see it driving a high-cost, high-resource implementation such as SL3 audit commitment.”

There was also skepticism that a leak would be publicly acknowledged in the first place. Nonetheless, in aggregate, this scenario was viewed as clearly relevant to the core of SL3-style audit measures, and capable of catalyzing a shift in the industry norm.

China bans open-sourcing of domestic frontier AI models

This scenario had the smallest overall effect on forecasts on question 1, and participants were notably divided on both its plausibility and its implications. A common view was that Western labs would not model their behavior on Chinese policy, especially in the absence of direct market or regulatory pressure.

“At its heart this question is really asking if a U.S.-controlled company will take action in response to how China changes the rules for a Chinese company. [...] I can think of no example where that’s happened.”
“I don't think whether China ceases to open-source models would have an impact on whether these companies publicly commit to audits. I think all companies assume that other actors are always seeking to obtain information on their models and work.”

However, some participants argued that a formal open-source ban from China could act as a signal (either of rising model capabilities or of an accelerating AI arms race) and thus increase pressure on Western labs to demonstrate maturity through visible security commitments.

“If China bans open-sourcing, it might signal that AI is progressing rapidly. Therefore, frontier labs in the West may become more conscious about cybersecurity.”
“This would signal that the AI race has probably sped up significantly, which implies there might be more of an incentive for labs to commit to audits.”

Still, these were often viewed as correlated developments rather than a direct causal relationship. In general, participants saw China’s open-source stance as informative about the broader AI environment, but not a major driver of specific audit behaviors in U.S. labs.

QUESTION 2:

Will the U.S. government commit to assisting any frontier lab with the construction of a secure data centre that meets the requirements of RAND SL4 by December 31, 2027?

https://viz.swiftcentre.org/results/eysFiwQ7aqY/1754485258144?r2_url=https%3A%2F%2Fdata.swiftcentre.org&show=hideTitle&showQuestions=eysFiwQ7aqY

The final aggregated forecast for this question was 32%, with individual judgments ranging from 11% to 74%. As with Question 1, this wide variance reflected different views on the feasibility, incentives, and political dynamics shaping potential U.S. government involvement in SL4-grade infrastructure. Several key themes emerged during the discussions:

Signs of growing collaboration: Some participants pointed to recent developments – such as Biden’s Executive Order, defense contracts, formal partnerships (e.g., through the AI Safety Institute), and federal programs offering land, permitting, or technical support – as indicators that some cooperation between government and frontier labs is already underway.
Lack of strong incentives: Many questioned whether frontier labs would want or need government assistance.With substantial internal capacity and a desire to retain operational independence, labs may view government involvement in highly sensitive domains like model weight security as more of a liability than a benefit.

“I would strongly suspect private companies have far more of an idea than the US Gov on how to build data centers securely, as they have already done so multiple times. For the kinds of assistance where the NSA may be able to offer technical expertise, I don't really see why any AI company would want to announce publicly they are working with the deep state and everyone would have an incentive to keep that quiet anyway.”

“The mutual incentives for such entanglement don't yet seem strong enough, especially given lab independence and sensitivity around government involvement.”
U.S. political and strategic uncertainty: Several participants cited future U.S. political leadership and its evolving stance on AI as a major source of uncertainty. Shifting regulatory priorities, partisan divides, and unpredictable executive branch behavior were seen as potential barriers to formal commitments:

“Challenges [include] policy uncertainty under [the] Trump administration, which revoked Biden's comprehensive AI policies.”

“The current trend in the US government is a Republican led initiative to avoid federal control of AI and allow a state-by-state approach to regulating AI. So it is easy to see how this topic could be avoided completely.”

“I'm [...] very uncertain about whether the US government would make such a commitment [...], because it would partly depend on how important the US government thinks AI is within the next two and a half years, which will in turn partly depend on the pace of AI progress.”

Still, a few participants noted that political unpredictability could also enable forms of cooperation if aligned with strategic optics.“If the Trump administration wants to make a splashy announcement, several of the AI company CEOs might jump at the chance to collaborate.”
High technical threshold for SL4 compliance: Participants widely viewed the SL4 requirements as a high threashold, posing significant barriers to implementation. One noted that “RAND SL4 requirements are extremely demanding, requiring custom hardware, specialised facilities, and four independent security layers,” while another questioned whether such standards are even technically feasible.
Triggering event as catalyst: Several participants suggested that a major security or geopolitical event could shift the U.S. government's stance and accelerate public-private cooperation. As one noted:

“The current US administration hasn't really shown any interest in playing not just a heavy role, but any role. However, perhaps it could be catalyzed by a major security/geopolitical event (eg. cyberattack; China/Taiwan escalation; etc.)”

Q2 - Conditional forecasts

Participants generally viewed U.S. government support for SL4-level secure data centres as contingent on significant external events, but opinions diverged on which types of events would meaningfully shift policy. Among the three conditional scenarios presented, the largest shift in forecasts was triggered by a public claim of recursive self-improvement (RSI) by a frontier AI lab – seen by many as a signal of fundamental change in the AI landscape. In contrast, weight theft and Chinese open-source bans generated smaller but still notable forecast increases, driven by perceptions of national security or geopolitical escalation.

Results are summarized below:

A frontier lab publicly claims recursive self-improvement (RSI) capability

This scenario produced the most dramatic upward shift in forecasts (by 14pp, from 32% to 46%), suggesting that frontier capability breakthroughs are viewed as a strong catalyst for U.S. government intervention. Many forecasters interpreted a claim of RSI – the idea that a model can autonomously and iteratively improve itself without human input – as a “threshold event” likely to trigger rapid political and institutional response.

“A claim of RSI would cause widespread public concern, possibly triggering government intervention on security infrastructure.”
“RSI would make a case for national security-level controls, which could push the U.S. to invest in secure data centres.”
“Even a moderate probability of RSI deserves serious policy attention, given how dramatically it could accelerate risk and institutional response.”

Several forecasters suggested that such a claim, even if overstated, would spark "shock to the system" reactions from national security stakeholders. One forecaster noted:

“Very significant. It would signal that AGI might actually be coming, and coming soon. The implications are dramatic and I don't think the government will be able to resist to apply a heavy hand on what happens with such thing. It's going to be like "I've got an alternative to nuclear fusion and nuclear fission but let me play with it however I like and I promise I will keep it pretty secure" - nah, I expect the government to start treating it a bit like Manhattan Project. Once that happens, the government will "assist" - most likely by just throwing money at it.”

That said, others cast doubt on the plausibility of an RSI claim being made publicly by 2027. Several participants emphasized that the term “RSI” is laden with governance and alignment implications, making it reputationally risky for labs to invoke unless there’s an overwhelming technical justification or PR benefit.

“Labs will steer well clear of publicly framing it as ‘RSI without human input.’ That language carries huge alignment baggage. Most labs are still cautious about being seen as racing toward AGI, and I’d expect them to avoid making a bold RSI claim, especially in public --> **unless** there's overwhelming internal consensus and a very specific strategic reason. To that end, the profit motive can be powerful...”

Despite its potential to catalyze government action, the scenario itself was seen as relatively unlikely, with an average probability of only 6.7% assigned across forecasts. In sum, while unlikely to occur by 2027, RSI remains a high-impact scenario deserving serious consideration. As one participant put it: “Even a moderate probability of RSI deserves serious policy attention, given how dramatically it could accelerate risk and institutional response.”

Frontier model weights are publicly acknowledged as stolen or leaked

A confirmed model weight theft or leak was seen as a security breach with direct relevance to SL4 infrastructure, which prioritizes physical and information security. While the average forecast increase was less than for RSI, many forecasters expressed a belief that such an event (if sufficiently high-profile) could raise the salience of infrastructure security among policymakers.

“A weight leak could spark action, but only if it’s seen as a national security failure.”

However, several participants were skeptical that a theft or leak alone would be enough to trigger government support. Many pointed out that the U.S. government tends to favor market-based solutions, and would likely prefer to subsidize private lab security upgrades rather than directly co-develop infrastructure.

“I suspect that the administration is not likely to see government intervention in AI labs' businesses as the best course of action even if a top closed-weight model is stolen. Instead, to the extent that the government would seek to involve itself at all, I think it's much more likely that the government would provide subsidies and other incentives for the frontier labs to develop their own secure data centres.”
“[such incident] would mean it is more likely you got direct assistance, technically or financially, but if it was personnel stealing the model weights I think it's not going to be related to the data center but to increased vetting of employees. Same with leaking.”

Conversely, if no weight theft or leak occurs (- 6.0 pp), most forecasters viewed this as affirming the status quo, with little new urgency to justify government involvement.

China bans open-sourcing of domestic frontier AI models

The smallest forecast shift among the three conditionals came from the scenario in which China formally bans the open-sourcing of its frontier AI models. While some participants saw this as an indicator of a faster-moving AI race, thus a signal for the U.S. to tighten security, the overall sentiment was that such a move would not directly compel U.S. infrastructure policy change.

“I don't think China's actions to ban the open-sourcing of any Chinese model are likely to have any impact on whether the US government chooses to help a frontier AI lab to develop a secure data centre, because the government would likely already assume that China is doing all it can to access information about leading models and develop its own powerful models.”
“Even if China imposes stricter controls, I don’t see that alone prompting the U.S. to fund secure infrastructure. The threat perception needs to be more direct.”
“A formal ban could slightly raise the salience of NatSec concerns in the US and nudge policymakers to think more seriously about secure infrastructure. That said, I don’t think it would be enough on its own to trigger a full public commitment to co-build a secure data centre with a frontier lab.”

Some forecasters applied a larger update in this conditional scenario, but this was typically due to what the ban might imply about the broader strategic context:

“I guess the world where China bans open-sourcing within the next year is one where the AI race has probably sped up significantly [...] I'll put +4 from the baseline if the conditional resolves positively [...]. I want to make clear though that I don't tie any causality to China banning open-sourcing having an effect on whether the US government commits to assisting with building secure data centers. Rather, the change from my baselines is all from correlation.”
“It should be a moderate to big effect. It's a signal from China that something has changed. Some of that change at least theoretically can be very dramatic, so the US will have to respond preemptively just in case. Government applying pressure to private companies in a form of helpful hand could be one such response.”

On average, participants assigned a 31% probability to this scenario occurring by mid-2026. However, views on its plausibility diverged. Some arguments highlighted incentives for China to remain open:

“Stronger factors currently support continued openness, particularly the strategic catch-up benefits from circumventing US export controls, global influence objectives through soft power projection, and economic ecosystem benefits from open innovation.”
“China’s approach to AI is heavily shaped by commercial priorities --> they want domestic firms to compete and win, especially in global markets. So unless there’s a triggering event (like a major national security scare), I think it’s unlikely we’ll see a clear, formal move to prevent open-sourcing. Quiet control and informal pressure seem more likely than a public, resolvable intervention.”
“DeepSeek was a huge PR hit for China: it proved that the country is competitive [...]. Shutting the next iteration down would be a pretty big blow.”

Others stated that China could plausibly tighten control over advanced models due to concerns over rising capability:

“I'll note that I think that China is much more concerned with the absolute levels of capabilities of various AI models rather than their capabilities relative to non-Chinese models, and I think they will view increasingly powerful models increasingly warily. So, I think it's more likely than not that China would prohibit open-sourcing more powerful models at some point over the coming year. And if it doesn't happen by the middle of next year, I would expect that it would happen within 1-2 years after that.”
"Several factors that could trigger restrictions, including major security incidents involving Chinese open-source AI models, significant US policy escalation targeting Chinese AI capabilities, military applications by adversaries, and economic espionage cases. China's models becoming dominant or ahead (even if only perceived in market terms) may also trigger this event.”

Many participants noted how this scenario would create ambiguity in what such a move would reveal about global dynamics. It was difficult to confidently assess either China's intentions or the downstream impact on U.S. decisions. As one participant put it:

“If China bans open-sourcing, then my assumption is that it's fairly likely that we're in a world where China has either reached parity with or is ahead of top US labs. It's not clear to me exactly what the most likely version of this world is given the current state of the race - does it imply compute is a less important input to the race and that algorithmic progress is more important? I think my update here is quite substantial, but it does feel very difficult to reason about the state of the world in this condition.”

Overall, while the scenario seemed to carry symbolic weight for some, its effect on aggregate forecasts remained modest, largely seen as a weak standalone driver for U.S. government commitment to secure infrastructure.

Meanwhile, the absence of a ban (- 6.7 pp) was interpreted by some as reinforcing a “slow race” narrative, reducing perceived urgency.

Methodology

This project was conducted over six weeks through a series of four online workshops aimed at designing forecasting questions and generating forecasts.

Participation

Twelve Swift Centre superforecasters were involved in the project, with backgrounds and expertise including (but not limited to) US national security, national and international politics, and technology and AI forecasting.

To supplement the Swift Centre forecasters we engaged ~40 subject matter experts throughout the process. 17 participated in the initial scenario-mapping workshop, helping to define the key elements of AI security at RAND SL3 and SL4 levels, 8 returned for the second workshop, contributing to question refinement and conditional scenario development. 5 participated in the subsequent forecasting phases.

Question generation

The goal of the two first workshops was to scope the cybersecurity domains relevant for decision-making in relation to the implementation of security measures.

In the first workshop, participants discussed which developments in AI security would be most valuable to track, and how those might map onto the RAND SL1-SL5 framework. Participants emphasized SL3 and SL4 measures as particularly relevant and forecastable within the 1-3 year range. Participants were asked to formulate preliminary ideas for questions in the app in the form of questions or statements on the most important domains they identified. They were also asked to think about any event, action or anything external that would most impact the implementation of the security measures to form the conditional questions.

In workshop 2, Participants were asked to refine existing questions, generate new candidate questions, vote on the list of candidate questions based on importance (with importance meaning anything they wanted: relevance for decision-making or monitoring progress, resolving most pressing risks etc)

We also engaged experts outside of the sessions to help with verifying the questions, enabling us to test specific wording and relevance to key decision makers in policy and funding fields.

Feedback and Phase 2

To inform future iterations of this work, we invited all participants to share feedback through a short survey. The form focused on the perceived importance of forecasting progress in AI security, the quality of our process and organisation, and suggestions for improvement. Our aim was to gather impressions on both substance and logistics in order to refine the methodology, increase participant engagement, and enhance the overall relevance and usability of the forecasts.

We had responses from 8 participants in total. They expressed support for the overall goal of expert-informed forecasting in AI-policy. Participants emphasized its potential to help decision-makers anticipate risks and clarify assumptions – while also noting its limits if results are not publicized to decision-makers.

Feedback on the process highlighted several strengths, including clear pre-meeting communication, room to participate asynchronously, and ease of use of the forecasting platform.

Participants also noted areas for improvement, such as the need for more structured and focused discussions, better onboarding (especially for the use of the platform), and clearer framing of the problem at the outset. Some recommended smaller initial working groups, multiple source materials (beyond a single framework like RAND), and stronger alignment with end-user goals.

This pilot served as a proof of concept for structured expert forecasting on AI security milestones. In the next phase, we plan to expand the focus areas, potentially extending beyond security to cover broader risk mitigation mechanisms, regulatory interventions, and capability thresholds.

Appendices

Appendix 1: Final questions and Resolution criteria

Question 1: Will three or more frontier AI labs (OpenAI, Anthropic, DeepMind, Meta, or xAI) publicly commit to annual (or more frequent) external cybersecurity audits that meet the RAND SL3 requirements, by December 31, 2026?

Resolution Criteria:

At least three labs must make a formal public statement (e.g., blog post, press release, published policy) committing to external audits.
Audits must address the following cybersecurity domains as per SL3 (RAND report p. 78):

Model-weight protection
Access controls for internal systems
Insider threat mitigation
Supply chain and software integrity
Vulnerability management and penetration testing

Audit must be conducted by a third-party entity without ownership ties to the lab.

Question 2: Will the US government commit to assisting any frontier AI lab with the construction of a secure data centre that meets the requirements of RAND SL4, by December 31, 2027?

Resolution Criteria:

Commitment must be public and formal (e.g., signed contract, public statement).
Forms of assistance may include:

Direct financial support
Technical collaboration (e.g., via NIST, DoD, DOE)
Joint ventures for facility construction or operation

General cloud infrastructure support (e.g., AWS/Azure contracts) is not sufficient.
Facilities must demonstrate SL4 characteristics (RAND report p. 85)

Conditional Scenarios

a.An incident involving a frontier AI model developed by a frontier AI lab (OpenAI, Anthropic, DeepMind, Meta, or xAI) that results in financial losses exceeding $100M USD before December 31, 2026.

i.For the purpose of this question, financial losses include spending during the legal process, such as:

Lawsuits or legal settlements
Regulatory fines or sanctions
Insurance payouts or claims related to the incident

Damages could result from the deployment or use of the frontier AI model.

Examples of incidents could include frontier AI models directly performing harmful actions (e.g., faulty code, misinformation, unauthorized access), or a downstream event (e.g., a model-generated vulnerability exploited at scale), etc.

ii.Resolution requires public confirmation that estimated damages exceed $100 million USD, and the frontier AI model’s behavior or deployment was a proximate cause.

To establish causality, we apply a "but for" test: but for the deployment or behavior of the frontier AI model, would the damages have occurred?

Examples that would not qualify:

Share price drops
Development-phase failures without real-world deployment
Unverified or speculative claims without official confirmation

b.A frontier AI lab publicly acknowledge that a closed-weight frontier AI model was stolen or leaked between June 26, 2025 and December 31, 2026.

This question will resolve positively if a frontier AI lab (e.g., OpenAI, Anthropic, DeepMind, Meta, xAI) publicly confirms the theft or leak of a closed-weight frontier model through an official communication channel (e.g., blog post, press release, public hearing, comment to press), or formal legal proceedings (e.g., court ruling or indictment) provide credible, public confirmation that a breach occurred.
The breach must involve unauthorized access, exfiltration, or replication of proprietary frontier AI model components or infrastructure. This includes, but is not limited to:

Theft or leak of model weights, training data, or other proprietary model components
Insider leaks or misappropriation by employees, customers, or partners of unreleased model artifacts
Breaking into internal systems used to build, train, or deploy the model (e.g., code repositories, training pipelines, or deployment servers), in a way that leads to unauthorized access

It does not matter who initially revealed the incident (e.g., media, whistleblower, regulator); what matters is that the lab validates or confirms the breach. Such confirmation may occur voluntarily, or as a result of public pressure, regulatory obligation, reputational risk, legal processes etc.

Examples that would qualify:

A lab confirms its closed-weight model was exfiltrated via a targeted attack.
A company confirms that one of its unreleased, closed-weight models was leaked by a customer or partner

For example: the early leak of a Mistral model (VentureBeat, Jan 2024) and Meta’s LLaMA model (The Verge, Mar 2023)

A company confirms that a secure internal system (e.g., deployment pipeline, weight server) was compromised, resulting in model or data theft.
A lab acknowledges that a former employee leaked model weights, training data, or related internal artifacts.
A trial concluded with a conviction and public court documents confirm a breach as defined above.

The 2024 indictment of a former Google engineer accused of exfiltrating trade secrets related to AI model infrastructure could count if the court confirms the breach or Google validates it publicly.

Examples that would not qualify:

Speculative attribution of a model's origin (e.g., researchers suspect distillation, but the lab hasn't confirmed a breach)
Breaches affecting general-purpose tools (e.g., Slack, HR systems) unless they led to unauthorized access to proprietary model components or internal pipelines.
A lab publicly states that a third-party model was trained via output scraping or distillation, unless this was done through unauthorized access to internal systems (e.g., exploiting credentials, bypassing access controls).

For example: The case of DeepSeek R1 would not count unless OpenAI or Microsoft confirmed that the actors gained unauthorized access to non-public systems or model components (not just outputs from the public API).

c. Will the Chinese government implement a policy preventing the open-sourcing of Chinese-developed frontier AI models before June 30, 2026?

This question will resolve positively if the Chinese central government formally enacts a policy through regulation or government announcement that prevents the public release of model weights for domestically developed frontier AI models.
Such a policy may include (but is not limited to):

Licensing requirements, security reviews, or regulatory approvals that effectively block open-weight publication
Capability thresholds (e.g., FLOPs or benchmark-based) above which models must undergo scrutiny that prohibits open-sourcing

d. Will a frontier AI lab state publicly that their model is able to recursively self-improve (RSI) without any human input by June 30, 2027? [only used as conditional for Q2]

i.This question resolves positively if a system developed by a major AI lab (e.g., OpenAI, Anthropic, DeepMind, etc.) is publicly shown to have autonomously modified its own design or learning process and applied those modifications to itself without direct human implementation or oversight. This includes (but is not limited to):

Proposing and integrating architectural modifications (e.g., routing logic, memory structures),
Adapting or rewriting its own training or optimization pipeline,
Altering its internal learning processes through self-generated code or configuration changes

This excludes:

Systems that merely generate training data or prompts for themselves.
Systems that use reinforcement learning to fine-tune over time if the training process is manually configured or monitored.
ii.Confirmation may come from a public statement by the lab (e.g., research paper, blog post, press release, etc)

Examples that would qualify:

A model rewrites its own training procedure to better adapt to new tasks, and implements this update without human intervention.
A model identifies architectural bottlenecks in its reasoning and spawns a retraining process to resolve them, integrating new components it designed.

Examples that would not qualify:

Fine-tuning or reward-based self-training where changes were pre-designed by humans.
Any system requiring human engineers to implement the change it proposed.
An external researcher or commentator claims a model shows RSI-like behavior, but the lab does not confirm it.

Appendix 2: Participants to the project

There were many experts who contributed to the discussions. Here are some of the contributers who have given their consent to be named:

Dan Guido (CEO, Trail of Bits)

Evan Miyazono (Atlas Computing)
Nikhil Mulani (Augur)
Reema Panjwani (Microsoft)
Benjamin Etheridge (University of Oxford, Alan Turing Institute)
Stefan Fisher (Speechmatics)

Eleanor Parr

AI Infrastructure Security - Stage 1 Report

Introduction

Results

QUESTION 1:

Will three or more frontier AI labs publicly commit to external cybersecurity audits aligned with RAND SL3 by the end of 2026?

Q1 - Conditional forecasts

AI model causes over $100M in damages before end of 2026

Frontier model weights are publicly acknowledged as stolen or leaked

China bans open-sourcing of domestic frontier AI models

QUESTION 2: Will the U.S. government commit to assisting any frontier lab with the construction of a secure data centre that meets the requirements of RAND SL4 by December 31, 2027?

Q2 - Conditional forecasts

A frontier lab publicly claims recursive self-improvement (RSI) capability

Frontier model weights are publicly acknowledged as stolen or leaked

China bans open-sourcing of domestic frontier AI models

Methodology

Participation

Question generation

Feedback and Phase 2

Appendices

Appendix 1: Final questions and Resolution criteria

Question 1: Will three or more frontier AI labs (OpenAI, Anthropic, DeepMind, Meta, or xAI) publicly commit to annual (or more frequent) external cybersecurity audits that meet the RAND SL3 requirements, by December 31, 2026?

Question 2: Will the US government commit to assisting any frontier AI lab with the construction of a secure data centre that meets the requirements of RAND SL4, by December 31, 2027?

Conditional Scenarios

Appendix 2: Participants to the project

QUESTION 2:

Will the U.S. government commit to assisting any frontier lab with the construction of a secure data centre that meets the requirements of RAND SL4 by December 31, 2027?