Windsurf 2.0 Put Cloud Agents and Local Agents in the Same Kanban View. The 'Where Do I Run This?' Question Just Became a UI Toggle — and the Orchestration Layer Moved Into the Editor.

What actually shipped

Windsurf 2.0 went generally available on April 15, 2026 with two surfaces that change the daily shape of agentic engineering work, plus a pricing restructure that makes the bundle the headline.

The first surface is the Agent Command Center — a Kanban view inside the editor that shows every agent you have running, local and cloud, IDE-bound and remote VM, organized by status (planning, running, blocked, ready for review, merged). Agents are grouped into Spaces: per-project containers that bundle the agent sessions, the open PRs, the relevant files, and the persistent context for a single piece of work. The UI is what every team's homegrown agent dashboard has been trying to become for a year, shipped as a default surface in the editor.

The second surface is Devin in Windsurf — Cognition's autonomous cloud engineer, the agent with its own VM, browser, file system, and computer-use tools, now bundled directly inside the editor and available on every paid plan (Pro at $20/mo, Max at $200/mo, Teams). Delegate a task to Devin from the same UI you'd use to ask Cascade — Windsurf's in-IDE agent — to refactor a function. The job runs on Devin's VM, keeps running after you close your laptop, and surfaces back in the Agent Command Center when it's ready to review.

The pricing move is the part most analyses missed. Windsurf raised Pro from $15 to $20 and added a $200 Max tier — a deliberate signal that the bundle is no longer competing on price-per-seat against GitHub Copilot's $10 plan. The $200 is what an organization pays to get a developer access to both IDE-bound agents and a cloud autonomous engineer in the same workspace, on a single bill, with a single audit trail.

The local-vs-cloud boundary just collapsed

For most of 2024 and 2025, the question every engineering manager asked when adopting an AI tool was which agent surface do we standardize on? — and the implicit follow-up was which class of work goes where? IDE-bound agents (Cursor, Windsurf Cascade, Copilot's edit modes) were good for the work a developer wanted to watch — refactors, in-line edits, the kind of changes you want to land in the same hour. Cloud autonomous agents (Devin, Claude Code in a sandbox, OpenAI Codex's cloud mode) were good for the work you wanted to not watch — long-running migrations, codebase-scale refactors, tasks measured in hours instead of minutes.

The boundary was real, and managing it was a developer skill. Is this an IDE task or a Devin task? was a decision you made before you started typing.

Windsurf 2.0 collapsed the boundary into a single dashboard. The decision still exists — you still pick local or cloud — but it's now a routing choice inside one tool, not a context switch between two products and two browser tabs. That sounds like a small thing. It is not. The friction of switching surfaces was doing real work in your daily flow — it was the implicit gate that made you stop and ask whether the task was worth a Devin session. With the gate removed, the use of cloud autonomous agents is going to climb sharply, because the activation energy just collapsed to a single click in the same UI where you were already working.

The teams that adopt Windsurf 2.0 in Q2 should expect their Devin (and Devin-equivalent) usage to roughly double in the first month. That's not a forecast about productivity. It's a forecast about what people delegate when the delegation interface is the same color as the chat interface.

What 'agent fleet management' looks like as a skill

The unspoken thing in the Windsurf 2.0 release is that it ships agent fleet management as a primitive that every senior developer is now expected to know how to do. Yesterday's senior dev managed their own attention across two or three IDE windows. Today's senior dev manages a Kanban board of half a dozen agents at varying stages of work — some local, some cloud, some blocked on review, some waiting on test runs, some hitting their VM budget.

That is a different kind of cognitive work, and most teams have not staffed it explicitly. Three patterns that the early-adopter teams are converging on:

A 'review queue' role on every senior dev's calendar. The new bottleneck on agentic work is not generation — Windsurf 2.0, Claude Code, Codex, and Antigravity 2.0 can generate code faster than any team can review it. The bottleneck is the human review surface. Treat it as a scheduled responsibility, not a thing that gets done between meetings. Block thirty minutes after lunch, every day, for clearing the review queue, and the queue stops becoming the reason agents get stuck.

A taxonomy of 'what goes to cloud, what stays local.' Write it down. Local: anything I want to land before the next standup. Cloud: anything I'd otherwise wait until tomorrow to start on. The categories are obvious in retrospect and consistently wrong without the written rule. The rule itself isn't novel; the discipline of having the rule is.

A weekly fleet-cost review. A Max-tier developer with three concurrent Devin sessions can spend a meaningful fraction of an organization's monthly AI budget without noticing. The cost surface is now real, and someone has to look at it. Treat it the way infrastructure teams treat cloud spend, not the way they used to treat developer-tool licenses.

What Windsurf 2.0 does not solve

The interesting limits of the release are worth saying out loud.

It does not solve the review tax. A faster way to generate work is a faster way to generate work that needs to be reviewed. The Agent Command Center makes the queue legible; it does not make the queue shorter. Teams that ship Windsurf 2.0 without simultaneously investing in their review workflow — uncertainty signals from the model, routing rules for what auto-merges vs. what queues, scoped permissions for what the cloud agent can land — will end up with a beautiful dashboard full of changes nobody has time to read.

It does not solve the multi-vendor portability problem. Windsurf 2.0 is opinionated about which agents are first-class (its own Cascade, Devin, and the underlying models like Claude and GPT-5.5). The Agent Command Center is a Windsurf surface. Code that depends on the metadata model of an Agent Command Center Space is code that doesn't trivially migrate to a different IDE. The same lock-in conversation that exists around Antigravity's harness exists around Windsurf's workspace abstraction.

It does not solve the agent-to-production accountability gap. Who signs off on a Devin change merged at 2 AM from a cloud sandbox? The bundling makes the delegation cheap; it does not, and cannot, make the accountability automatic. Whose name is on the PR is still a policy question your team owns, and the velocity gain from Windsurf 2.0 is the velocity gain that exposes the gap if you haven't answered it.

Where Sonnet Code fits

Windsurf 2.0 makes agent fleet management a default surface in the editor. The harder half — the one most teams have not built — is the discipline, the review workflow, and the governance that turn a Kanban full of running agents into a production capability with named accountability. AI development at Sonnet Code is that engineering: redesigning the review queue around the model's own uncertainty signals so the throughput gain doesn't just become a longer queue, wiring scoped permissions and audit trails into cloud-agent delegations so a Devin session can't ship something nobody owns, and building the cost-observability surface that lets engineering leaders see what their agent fleet actually costs. AI training is the human-judgment half: senior engineers and domain experts who design the rubrics that say this class of work auto-merges, that class of work queues, the other class of work always escalates, and run the adversarial review on the cases the rubric is most likely to mishandle.

The local-vs-cloud agent boundary collapsed into a UI toggle. The judgment about which work belongs where, who owns the output, and how the team scales review at the new throughput — that's still yours. The teams that build that layer deliberately in Q2 are the ones that will compound on Windsurf 2.0 in Q3. The teams that don't will have a Kanban board full of agents and a review queue nobody's been at in three days.