Sonnet Code
← Back to all articles
AI & Machine LearningMay 7, 2026·7 min read

Cursor 3.3 Ships Context Breakdown, Security Review, and an SDK — the IDE Just Became a Programmable Surface

The release, in one paragraph

Anysphere shipped Cursor 3.3 on May 6, 2026 — six weeks after Cursor 3 turned the IDE into an agent runtime, and three weeks after /multitask made fan-out the default execution shape. The 3.3 release lands three things that, taken together, change what "using Cursor" actually means: a Context Usage Breakdown that itemizes how every active agent is spending its window across rules, skills, MCP servers, and subagents; Cursor Security Review in beta on Teams and Enterprise — two always-on agents (Security Reviewer + Vulnerability Scanner) that audit every PR for auth regressions, prompt-injection paths, agent tool auto-approvals, and data-handling risks; and the Cursor SDK in public beta, which exposes the same runtime, harness, and models powering the desktop app behind a few lines of TypeScript, with codebase indexing, MCP, repo-resident skills, and lifecycle hooks built in.

The headline is "three features." The substance is that the IDE just stopped being a single surface. Cursor is now a runtime + policy plane + SDK — and which of those three a team gets the most leverage from depends on whether they treat Cursor as a tool, a platform, or a dependency.

Why context breakdown is the most important of the three (and gets the least attention)

The Security Review beta will get the press. The SDK will get the developer threads. But the feature that quietly changes day-to-day engineering is the Context Usage Breakdown.

For two quarters every team running serious agent workloads has been flying blind on the same question: why did this agent run badly? The honest answer is almost always one of four things — the rules file is stuffed with stale instructions, the skills directory is loading three skills that contradict each other, an MCP server is spamming token-heavy responses into context on every call, or a subagent fanout is pushing the coordinator past the budget where it stops paying attention to early instructions. None of those failure modes were observable before. You'd watch an agent drift, blame the model, swap the model, and the drift would show up again.

A context breakdown turns those four invisible failure modes into a debuggable surface. That alone is the biggest lift Cursor has shipped this year for teams that actually run agents in production — bigger than Tiled Layout, bigger than /multitask, possibly bigger than the SDK. The question "why is my agent spending 40% of its context on rules I don't remember writing?" is one a senior engineer can now answer in a minute, where it used to require half a day of bisecting prompts.

Cursor Security Review and the new shape of the security review queue

Security Reviewer + Vulnerability Scanner running on every PR is the first time an IDE vendor has shipped a security-review surface that's agentic on both sides — the PR was written by an agent, and the review is run by an agent, and the human sits in the middle approving or rejecting. The category that this collapses is the entire pre-merge security tooling tier (Snyk, Semgrep, Checkmarx, etc.) — for teams already inside Cursor, the path of least resistance just got shorter.

Three caveats before any team flips it on for the whole org:

False positives are now a budget line. A scanner that flags every plausible-looking finding ships ten findings per PR. Most of them are not real, and triaging them is real engineering time. Pilot Security Review on one repo first, measure the false-positive rate against findings the human reviewer would have flagged independently, and use that ratio to decide whether to widen the rollout or tighten the rule set first.

The Vulnerability Scanner is checking what the agent runtime told it about the code, not the deployed reality. If your secrets management lives outside the repo, your IAM policies live in Terraform you haven't indexed, or your auth boundary is enforced at a gateway the agent doesn't know about, the scanner can miss real findings and fabricate fake ones. Wire the runtime context (Terraform plans, gateway configs, IAM diffs) in through MCP servers before treating the scanner as authoritative.

Prompt-injection detection is necessary but not sufficient. Catching a malicious comment in a third-party SDK README is useful; it does not replace defense-in-depth (sandboxed tool execution, capability scoping, audit logs of agent tool calls). Treat Cursor's prompt-injection check as the first layer of a multi-layer posture, not the layer.

The SDK is the part that changes who builds what

The Cursor SDK is the most consequential of the three releases for buyers, not because it adds capability — it doesn't, really — but because it changes the build vs buy line for any team that previously chose between writing their own agent loop and using Cursor as-is.

Before 3.3, the menu was binary: use Cursor's IDE, or roll your own agent harness with the Anthropic / OpenAI SDK and bolt MCP, indexing, and skills on yourself. The first option locked you into Cursor's UI; the second option was three to six months of plumbing work before you got to the actual product.

The SDK collapses that gap. A team that wants its own internal agent — a code-reviewer agent on every PR, a documentation-update agent on every merge, a customer-support agent that drafts replies — can now stand it up on Cursor's runtime, with codebase indexing and MCP and skills already wired in, billed on token consumption. The plumbing tier just got commoditized.

The second-order effect is that internal agents become a real product line for engineering teams that weren't shipping them before. A platform team that wouldn't have built a custom code-review agent because the harness was a six-month investment can now build it in a sprint and iterate from there. Expect to see a wave of internal-only agents over the next two quarters that wouldn't have existed without the SDK.

Where we'd push back on the launch narrative

Two gaps worth flagging.

An SDK is not a strategy. Standing up a custom agent on Cursor's runtime is now easy. Knowing which custom agent to build, what success looks like, and how to keep it from drifting after the model updates — none of that gets easier. The teams that win this cycle are the ones with a workload-specific eval suite, a routing layer, and clear ownership of the prompt-and-tool contract; the SDK saves three months of plumbing, not three years of operational discipline.

Cursor Security Review is one signal in the security posture, not the posture. A team that adopts it and decommissions its other security tooling on the strength of the launch is making a single-vendor bet that hasn't been stress-tested in production yet. Run it in shadow mode for a quarter alongside whatever you're using today, compare the findings, and decide on data, not on a launch keynote.

What we'd build differently this week

  • Turn on Context Usage Breakdown for the whole engineering org. This is the highest-leverage flip on the dashboard right now. Make a habit of opening the breakdown the first time an agent run goes sideways; you'll diagnose more drifts in week one than you did in the previous quarter combined.
  • Pilot Security Review on a single high-traffic repo first. Measure false-positive rate, time-to-triage-per-PR, and findings-the-human-would-have-caught. Decide on rollout based on the numbers, not on the demo.
  • Pick one internal agent to build on the SDK. Code reviewer, doc updater, ticket triager — pick one with a cleanly measurable outcome, ship it on the SDK, and let that experience inform whether the SDK becomes a platform commitment or a one-off.
  • Author one MCP server for the gap Security Review can't see. Whatever your scanner is missing because it's outside the repo (Terraform, gateway configs, IAM, secret refs) — wire it in through MCP. The scanner gets dramatically more useful, and the MCP server compounds across every other agent surface you run.
  • Version your rules, skills, and MCP configs as code. Now that Context Usage Breakdown shows what they actually cost, treat them like the production artifacts they are: CHANGELOG, code review, regression tests on the agent behavior they shape.

Sonnet Code's take

Cursor 3.3 is the release that quietly turns the IDE into a platform — runtime, policy plane, and SDK on the same surface. The teams that get the most out of it this year are the ones who treat the SDK as a build-vs-buy escape hatch, the Security Review as one layer in a posture they own end-to-end, and Context Usage Breakdown as the missing observability they've been asking for. We staff that work for clients on two sides: AI development, where we build the internal MCP servers, custom skills, routing layers, and SDK-based agents that turn Cursor's runtime into an internal product line you control; and AI training, where senior engineers author the eval suites, golden-patch comparisons, and red-team prompts that calibrate Security Review and the agents you build on the SDK against your team's actual standards. If your team upgraded to 3.3 yesterday and is now wondering which of the three new surfaces to build on first, the next conversation isn't about the release notes. It's about the agent runway you want to own twelve months from now.