Sonnet Code
← Back to all articles
AI DevelopmentJune 2, 2026·10 min read

The EU AI Act's High-Risk Deployer Deadline Is August 2, 2026 — Sixty Days Away. Audit Trails, Human Oversight, and Incident Reporting Are About to Stop Being Best Practices and Start Being Legally Binding. Most Teams Building Agentic AI Are Not Ready.

What goes live on August 2

August 2, 2026 is the date on which the EU AI Act's high-risk deployer obligationsArticles 8 through 17, plus Articles 26, 27, and 73 — become enforceable for any organization deploying a high-risk AI system to end users in the European Union. That phrasing matters. Most of the conversation through the back half of 2024 and the first half of 2026 treated the AI Act as a future obligation; on August 2 it is a present one, with the enforcement infrastructure (notified bodies, national supervisory authorities, the AI Office) all stood up to receive complaints and act on them.

The penalty band is the headline that travels: up to €35 million or 7% of worldwide annual turnover for the most serious category (prohibited practices under Article 5), and up to €15 million or 3% of worldwide turnover for violations of the obligations themselves — risk management, data governance, technical documentation, human oversight, and the other articles in the main high-risk chapter. Those numbers are an upper bound, applied with regulatory discretion; the lived experience for most organizations will be regulator inquiries, mandated remediation, and reputational consequences long before a percentage-of-turnover fine appears. But the exposure is real and large, and it lands on the deployer specifically — not just the provider of the underlying model.

The concrete obligations are not abstract. They map to specific engineering requirements:

  • Human oversight tooling. A qualified human must be able to intervene before or after a high-risk system's action, with the interface, the audit trail, and the training to do so. Press button, override, log is the rough shape.
  • Automated event logging. The high-risk system must log events sufficient to reconstruct what happened, retained for the lifetime of the system. Not 30 days, not 90 days — lifetime.
  • Risk and quality management systems. ISO/IEC 42001:2023 has emerged as the practical reference standard. Documented, reviewed, kept current.
  • Serious-incident reporting. When a serious incident occurs — defined broadly enough to capture both safety failures and rights infringements — the deployer must notify the relevant national authority within 15 days of becoming aware of it.
  • Conformity assessment before market placement. The system must be assessed (by a notified body for the highest-risk categories, self-assessment for some others) before it goes live.
  • Registration in the EU database. Deployers must register the deployment in the central EU database of high-risk AI systems.

None of these are individually exotic. Cumulatively, for most organizations, they constitute more engineering and governance work than the team is currently doing.

Why the agentic deployment surface is the part most teams haven't thought through

The AI Act's high-risk categories — Annex III is the canonical list — include credit scoring, hiring and HR tooling, employee evaluation, education and vocational training, biometric identification, critical infrastructure operation, law enforcement, migration and border control, and administration of justice. The implicit framing in most compliance conversations through 2025 was that the model is the high-risk system. The deployer's obligation was to ensure the model met certain properties, and the engineering work centered on model validation.

That framing is incomplete for the agentic deployments that have become the dominant shape of production AI in 2026. When an HR-tooling agent (high-risk) is composed of a frontier LLM, a router, a set of MCP tool integrations, a sandbox runtime, a review queue, and several scheduled workflows — it is the composition that is the high-risk system, not the model alone. The conformity assessment, the audit log, the human oversight, the incident reporting all need to apply at the level of the composition, not at the level of any individual component.

That distinction has four engineering consequences that the standard compliance checklist will not surface.

The audit log has to capture the agent's reasoning, not just its actions. A traditional system log says user X performed action Y at time Z. An agent system log that satisfies the AI Act has to say agent A, on behalf of user X, with instruction I and standing playbook P, decided based on reasoning trace R to take action Y at time Z, using model M with prompt context C, with the output reviewed by reviewer V. That entry, retained for the lifetime of the system, structured well enough to support a regulatory inquiry six years from now, is a different artifact from anything most agent stacks produce today.

Human oversight has to be functionally real, not a checkbox. A human can intervene is satisfied by a button labeled Override on a web page that nobody monitors. A qualified human must be able to intervene — the actual regulatory standard — requires that the human be trained, that the workflow surfaces the decisions requiring oversight loudly enough to be acted on, that the latency of intervention is short enough to matter, and that the interventions themselves are logged and used to improve the system. The discipline is closer to real on-call rotation against high-stakes decisions than to somebody is technically authorized to press the button if they happen to be looking.

Incident reporting has to operate on 15-day latency. Fifteen days from the moment the organization becomes aware of a serious incident, the report has to land on the national supervisory authority's desk, with the facts, the cause, the remediation, and the lessons. That deadline implies an internal incident-response pipeline that can detect serious incidents within a few days of occurrence, escalate them within a day, produce a regulator-grade report within roughly a week, and brief the relevant authority on the published format. Most teams' current incident-response process for AI-shaped failures is somebody pings somebody else in Slack and they figure it out. That doesn't survive contact with a 15-day clock and a regulator with subpoena powers.

Conformity assessment has to apply to system changes, not just first deployment. The high-risk system that passed conformity assessment in July is not the same high-risk system that's running in October if the underlying model changed, the routing policy changed, the MCP tool surface changed, or the playbooks changed. The regulation treats substantial modification as triggering a fresh assessment. For an agentic system whose components are upgraded on a weekly or monthly cadence, the substantial modification threshold is hit constantly, and the team needs a process for which changes trigger reassessment that's defensible to a regulator.

What "compliance-ready by August 2" actually requires at the engineering layer

Four buckets of work, in roughly the order they need to be standing for the August 2 deadline.

Structured audit logging across the full agent path. Every action taken by an agent in a high-risk system needs a log entry with who, what, when, why, by which model, with what inputs, with what outputs, reviewed by whom, with what outcome. The retention is lifetime. The format needs to be structured enough that a regulator inquiry can pull the relevant entries for an investigation six years from now. This is the largest single piece of engineering work for most teams. Done well it takes most of the remaining sixty days.

A real human-in-the-loop checkpoint for every action class that matters. Not all agent actions need human review; the regulation acknowledges appropriate oversight as the standard. The discipline is write down which action classes get reviewed before they ship, which get reviewed within a defined window after they ship, and which run autonomously, with the rationale for each placement. That document is the artifact a regulator will ask for first, and the artifact most teams don't have.

An incident-response pipeline that meets the 15-day clock. Detection, escalation, root-cause analysis, regulator-grade report, submission. The pipeline can be lightweight — most teams can run it on top of their existing incident tooling — but it needs to exist, with the roles defined, the templates ready, the authority's submission interface tested, and a tabletop exercise run before August 2. The night the first real incident lands is not the night to figure out which form to fill in.

Documented risk and quality management aligned with ISO/IEC 42001. The standard is the reference; the documentation of how your organization meets it is the artifact. Most large enterprises will have an existing risk-management framework that can be extended; most startups will need to stand one up from scratch. Either way the artifact is a written, reviewed, approved document — not a process you run informally.

What this does not change

Three honest framings.

It does not apply to all AI systems. The high-risk obligations are scoped to the categories in Annex III. A general-purpose Chat assistant or coding assistant is not high-risk; an HR-screening agent or credit-decision system is. The first decision for most organizations is which of our AI systems fall in scope, and the answer for most B2B SaaS companies is a small subset. Treating the full AI Act as applying to every system is a fast way to spend sixty days on the wrong work.

It does not eliminate the trust gap with end users. A system that satisfies the audit, oversight, and reporting requirements is a legally compliant system. Whether it is a trustworthy system, in the sense end users will rely on it, is a separate question — one answered by the actual performance of the human oversight, the actual quality of the eval discipline, and the actual track record of incident response. Compliance is the floor, not the ceiling.

It does not stop at the EU border. The AI Act's substantive standards — audit trails, human oversight, incident reporting — are the direction every serious jurisdiction is moving. The UK's AI safety framework, Brazil's Marco Legal da IA, Canada's AIDA, the patchwork of US state-level regulations all converge on roughly the same engineering primitives. The discipline you build for the August 2 deadline is the discipline you reuse for every jurisdiction's deadline over the next eighteen months. The team that treats the AI Act as a one-off compliance project will rebuild the same infrastructure four times; the team that treats it as the first call on a recurring requirement will build it once.

Where Sonnet Code fits

A regulatory deadline sixty days out is the easy half of the story. The hard half is the engineering above the obligation — the audit infrastructure, the human-oversight workflow, the incident pipeline, the risk-management documentation — that turns legally required into actually defensible under a regulator inquiry. AI development at Sonnet Code is that engineering: instrumenting the full agent path with structured audit logging that survives a six-year retention requirement, wiring real human-in-the-loop checkpoints into your existing review workflow with the latency and observability the regulation actually demands, standing up the incident-response pipeline that meets the 15-day clock without requiring a war-room each time, and aligning the technical documentation to ISO/IEC 42001 so the auditors can find what they're looking for. AI training is the human-judgment half: senior engineers, domain experts, and regulatory specialists who design the rubrics that distinguish which action classes need pre-action review, calibrate the gold sets that grade safety properties honestly (not just capability properties), run the adversarial review on the prompt-injection and dangerous-capability surface of every high-risk agent path, and stand up the senior-reviewer queue that scales with the throughput of the agent without becoming the bottleneck that breaks the deployment.

The August 2 deadline is sixty days away. The engineering it requires is the same engineering every serious AI deployment needs anyway — auditable, human-overseen, incident-ready, well-documented. The teams that treat the EU AI Act as the first deadline of a recurring requirement, and build the operating model accordingly, are the ones that walk into every subsequent jurisdiction's deadline with the work already done. The teams that treat it as a paperwork exercise, and try to bolt audit logs onto an agent stack that wasn't designed for them, are the ones that will spend Q4 in remediation.